Welcome to the VirtuQ Forums.
Results 1 to 2 of 2
  1. #1
    Member
    Join Date
    Oct 2011
    Location
    Bangalore
    Posts
    13

    Locating repeated lines in a text file

    Hi,

    How can we locate repeated lines in a text file and what is the way to delete the repeated lines from a file?

    Regards
    anon10055

  2. #2
    VirtuQ™ Moderator
    Join Date
    Jul 2011
    Location
    Bangalore, India
    Posts
    1,044
    Blog Entries
    2
    Anon10055,

    As with everything in UNIX, there are multiple ways. I believe your lines end with a new line character and you do not care about the order in which you get the output, you can simply do this:

    $> cat file_with_duplicate_lines.txt | sort | uniq > file_without_duplicate_lines.txt

    If you care about the order, then uniq will not work very well unless of course the duplicates are on consecutive lines.

    Anything beyond that you can use scripts etc. Try it out, it will be an interesting exercise. I would say check on hashes in any language which you are picking up. The entire script will be not more than few lines of text. In PERL you can even do it on the command line itself!

    Read man page of uniq to see how you can only print *repated* lines.

    Cheers,

    Anup
    Last edited by anup; 09-10-2011 at 11:00 PM.


 

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •