Dima Lituiev February 2016

python subprocess popen: Piping stdout messes up the strings

I am trying to concatenate a couple of files together and add a header.

import subprocess
outpath = "output.tab"
with open( outpath, "w" ) as outf :
        "write a header"
        if header is True:
            p1 = subprocess.Popen(["head", "-n1", files[-1] ], stdout= outf, )
        if type(header) is str:
            p1 = subprocess.Popen(["head", "-n1", header ], stdout= outf,)
        for fl in files:
            print(  fl )
            p1 = subprocess.Popen(["tail", "-n+2", fl], stdout= outf, )

for some reason some files (fl) are printed only partially and next file starts amid of a string from a previous file:

 awk '{print NF}' output.tab | uniq -c
    108 11
      1 14
     69 11
      1 10
     35 11
      1 16
    250 11
      1 16

Is there any way to fix it in Python?


An example of messed up lines:

$tail -n+108 output.tab | head -n1

CENPA   chr2    27008881.0  2701ABCD3   chr1    94883932.0  94944260.0  0.0316227766017 0.260698861451  0.277741584016  0.302602378581  0.4352790705329718  56  16


$grep -n A1 'CENPA' file1.tab

109:CENPA   chr2    27008881.0  27017455.0  1.0 0.417081004817  0.0829327365256 0.545205239241  0.7196619496326693  95  3
110-CENPO   chr2    25016174.0  25045245.0  1000.0  0.151090930896  -0.0083671250883    0.50882773122   0.0876177652747541  82  0


$grep -n 'ABCD3' file2.tab
2:ABCD3 chr1    94883932.0  94944260.0  0.0316227766017 0.260698861451  0.277741584016  0.302602378581  0.4352790705329718  56  16

Answers


PopcornArsonist February 2016

I think the issue here is that subprocess.Popen() runs asynchronously by default, and you seem to want it to run synchronously. So really, all of your head and tail commands are running at the same time, directing into the output file.

To fix this, you probably want to just add .wait():

import subprocess
outpath = "output.tab"
with open( outpath, "w" ) as outf :
    "write a header"
    if header is True:
        p1 = subprocess.Popen(["head", "-n1", files[-1] ], stdout= outf, )
        p1.wait()  # Pauses the script until the command finishes
    if type(header) is str:
        p1 = subprocess.Popen(["head", "-n1", header ], stdout= outf,)
        p1.wait()
    for fl in files:
        print(  fl )
        p1 = subprocess.Popen(["tail", "-n+2", fl], stdout= outf, )
        p1.wait()

Post Status

Asked in February 2016
Viewed 3,296 times
Voted 9
Answered 1 times

Search




Leave an answer