Hi,
How can we locate repeated lines in a text file and what is the way to delete the repeated lines from a file?
Regards
anon10055
Hi,
How can we locate repeated lines in a text file and what is the way to delete the repeated lines from a file?
Regards
anon10055
Anon10055,
As with everything in UNIX, there are multiple ways. I believe your lines end with a new line character and you do not care about the order in which you get the output, you can simply do this:
$> cat file_with_duplicate_lines.txt | sort | uniq > file_without_duplicate_lines.txt
If you care about the order, then uniq will not work very well unless of course the duplicates are on consecutive lines.
Anything beyond that you can use scripts etc. Try it out, it will be an interesting exercise. I would say check on hashes in any language which you are picking up. The entire script will be not more than few lines of text. In PERL you can even do it on the command line itself!
Read man page of uniq to see how you can only print *repated* lines.
Cheers,
Anup
Last edited by anup; 09-10-2011 at 11:00 PM.