no_clue_dude February 2016

Python - Exclude contents of one file from another / removing duplicate lines amongst two files

first off, i'm using python 2.7.9 ..... now, i'm trying to find the most efficient way to compare the lines of one text file (file A) to the lines of another text file (file B) and write all lines that are unique to file A into a new file (file A\B).

actually i've written a short script that does this, but it is beyond slow... i need the script to be able to handle files of up to 70mb(each, A&B), which is unthinkable with this 'bad' boy:

import string
naked = string.strip
kiss = ''.join

def main():
    list1 = raw_input("Enter name of .txt-file to clean!\n")
    list2 = raw_input("Enter name of .txt-file to exclude!\n")
    action(list1, list2)
    raw_input("Done!\nPress [ENTER] to exit!")

def action(list1, list2):
    f = open(kiss([list1, '.txt']), "r")
    g = open(kiss([list2, '.txt']), "r")
    h = open(kiss([list1, '_without_', list2, '.txt']), "w")
    h_w = h.write
    reset = g.seek
    found = False
    for i in f:
        found = [True for j in g if naked(i) == naked(j)]
        if not found:
            h_w(kiss([naked(i), '\n']))
        else:
            found = False
        reset(0)
    f.close()
    g.close()
    h.close()

main()

yeah... does anyone have any idea how to do this more efficiently?! thanks in advance!

Answers


Pavan February 2016

def read_file(filename):
    with open(filename) as src:
        return [line.strip() for line in src.readlines()]


def main():
    list1 = raw_input("Enter name of .txt-file to clean!\n")
    list2 = raw_input("Enter name of .txt-file to exclude!\n")
    file1 = read_file(list1)
    file2 = read_file(list2)
    file3 = open('new_file.txt', 'w')

    for line in file1:
        if line not in file2:
            file3.write(str(line) + '\n')  # writes to a new file

    file3.close()
    print 'Completed'

main()

I am not sure this is fastest way but it will do the trick. you can use "diff" or "comm" linux commands to get the required output.

Post Status

Asked in February 2016
Viewed 3,315 times
Voted 10
Answered 1 times

Search




Leave an answer