Sérgio Martins February 2016

remove partial duplicate lines from text file notepad++

I have huge list like the example below and need to remove the lines 1,3,6 and 8 because they are partially duplicated, so I need to maintain the longest line.

COMPAQ PRESARIO A940ES NOTEBOOK PC
COMPAQ PRESARIO A940ES NOTEBOOK PC - KU048EAR
HP PAVILION DV7-1210EA NOTEBOOK PC 
HP PAVILION DV7-1210EA NOTEBOOK PC - NG385EA#ABU
HP PAVILION DV7-1210EA NOTEBOOK PC - NG385EAR
HP PAVILION DV7-1210ED NOTEBOOK PC 
HP PAVILION DV7-1210ED NOTEBOOK PC - NA048EA#ABH
HP PAVILION DV7-1210ED NOTEBOOK PC - NA048EA

The final result that I need is:

COMPAQ PRESARIO A940ES NOTEBOOK PC - KU048EAR
HP PAVILION DV7-1210EA NOTEBOOK PC - NG385EA#ABU
HP PAVILION DV7-1210EA NOTEBOOK PC - NG385EAR
HP PAVILION DV7-1210ED NOTEBOOK PC - NA048EA#ABH

Answers


Lars Fischer February 2016

If you dont need to keep the original sequence of your lines, you could try something like this:

  • sort the lines with Edit -> Line Operations -> Sort Lines Lexicographically Ascending
  • be sure that the last line ends with a newline
  • Now we do a Find/Replace:
    • Find What: ^(.*)\r\n(\1.*?\r\n)
    • Replace With: \2
    • Check in the lower left: Regular Expression and . matches newline
    • if your lineendings are only \n: use \n instead of the two \r\n in the Find What.
    • Hit Replace or Replace All, hit it often, as long until there is nothing left to replace, the status bar in the replace dialog will tell you that.

How it works:

  1. The sorting puts the duplicates in sequence and the longest "duplicate" is the last!
  2. The Find/Replace considers two lines, where the first line is part of the second line and then it replaces both lines with the second line. (That means, if you have three duplicates: the first Replace All will leave the second and third line standing and you need another Replace All.)

Post Status

Asked in February 2016
Viewed 3,358 times
Voted 8
Answered 1 times

Search




Leave an answer