FC84 February 2016

Python: how to slice a csv file with respect to a column other than the first?

I have a csv file which displays a number of columns and almost 500000 rows. I need to slice this file with respect to the second column, which displays the year, maintaining all the other columns:

COL1   COL2   COL3   COL4   COL5   COL6   COL7
xxx    1986   xxx    xxx    xxx    xxx    xxx
xxx    1992   xxx    xxx    xxx    xxx    xxx
xxx    1998   xxx    xxx    xxx    xxx    xxx
...    ...    ...    ...    ...    ...    ...
xxx    2015   xxx    xxx    xxx    xxx    xxx
xxx    1984   xxx    xxx    xxx    xxx    xxx

My question: how can I produce another csv file out of this, where the values in the second column are >=1992?

Desired output:

COL1   COL2   COL3   COL4   COL5   COL6   COL7
xxx    1992   xxx    xxx    xxx    xxx    xxx
xxx    1998   xxx    xxx    xxx    xxx    xxx
xxx    2015   xxx    xxx    xxx    xxx    xxx

My attempt is this, but I got stuck at the point where I should insert an if linked to the second column, but I don't know how to do that:

from __future__ import division
import numpy
from numpy import *
import csv
from collections import *
import os
import glob

directoryPath=raw_input('Working directory: ') #Indicates where the csv file is located
for i,file in enumerate(os.listdir(directoryPath)): #Loops over the folder where the csv files are
    if file.endswith(".csv"): #Checks if they are csv files
        filename=os.path.basename(file) #Takes the complete path to the file
        filelabel=file #Takes the filename only
        strPath = os.path.join(directoryPath, file) #Retrieves the complete path to find the csv file
        x=numpy.genfromtxt(strPath, delimiter=',')[:,7] #I GOT STUCK HERE

Answers


Ken February 2016

You can iterate over the rows of the CSV to see if the value in COL2 is >= to the year your are interested in. If it is, just add the row to a new list. Pass in the new list to a CSV writer. You can call the function in a loop to create new CSVs for all files ending with a csv extension.

You will have to pass in the working_directory and the year. This is the folder of CSVs you want to process.

import csv
import os
def make_csv(in_file, out_file, year):
    with open(in_file, 'rb') as csv_in_file:
        csv_row_list = []
        first_row = True
        csv_reader = csv.reader(csv_in_file)
        for row in csv_reader:
            if first_row:
                csv_row_list.append(row)
                first_row = False
            else:
                if int(row[1]) >= year:
                    csv_row_list.append(row)

    with open(out_file, 'wb') as csv_out_file:
        csv_writer = csv.writer(csv_out_file)
        csv_writer.writerows(csv_row_list)

for root, directories, files in os.walk(working_directory):
    for f in files:
        if f.endswith('.csv'):
            in_file = os.path.join(root, f)
            out_file = os.path.join(root, os.path.splitext(f)[0] + '_new' + os.path.splitext(f)[1])
            make_csv(in_file, out_file, year)

Post Status

Asked in February 2016
Viewed 1,660 times
Voted 10
Answered 1 times

Search




Leave an answer