Abraham P. February 2016

Change information in a CSV file using info from the first one in python

I'm trying to edit a CSV file using informations from a first one. That doesn't seem simple to me as I should filter multiple things. Let's explain my problem.

I have two CSV files, let's say patch.csv and origin.csv. Output csv file should have the same pattern as origin.csv, but with corrected values.

I want to replace trip_headsign column fields in origin.csv using forward_line_name column in patch.csv if direction_id field in origin.csv row is 0, or using backward_line_name if direction_id is 1.

I want to do this only if the part of the line_id value in patch.csv between ":" and ":" symbols is the same as the part of route_id value in origin.csv before the ":" symbol.

I know how to replace a whole line, but not only some parts, especially that I sometimes have to look only part of a value.

Here is a sample of origin.csv:

route_id,service_id,trip_id,trip_headsign,direction_id,block_id

210210109:001,2913,70405957139549,70405957,0,
210210109:001,2916,70405961139553,70405961,1,

and a sample of patch.csv:

line_id,line_code,line_name,forward_line_name,forward_direction,backward_line_name,backward_direction,line_color,line_sort,network_id,commercial_mode_id,contributor_id,geometry_id,line_opening_time,line_closing_time

OIF:100110010:10OIF439,10,Boulogne Pont de Saint-Cloud - Gare d'Austerlitz,BOULOGNE / PONT DE ST CLOUD - GARE D'AUSTERLITZ,OIF:SA:8754700,GARE D'AUSTERLITZ - BOULOGNE / PONT DE ST CLOUD,OIF:SA:59400,DFB039,91,OIF:439,metro,OIF,geometry:line:100110010:10,05:30:00,25:47:00
OIF:210210109:001OIF30,001,FFOURCHES LONGUEVILLE PROVINS,Place Mérot - GARE DE LONGUEVILLE,,GARE DE LONGUEVILLE - Place Mérot,OIF:SA:63:49,000000   1,OIF:30,bus,OIF,,05:39:00,19:50:00

Each file has hundred of lines I need to parse and edit this way.

Based on mhopeng answer, I obtained that code:

#!/usr/bin/env python2
from __future__ import print_func        

Answers


Keith Brodie February 2016

pandas is convenient for handling csv files. I would use something like this:

import pandas as pd

origin = pd.read_csv('origin.csv',index_col=None)
patch  = pd.read_csv('patch.csv', index_col=None)

# Create match_keys for matching origin.csv from patch.line_id

patch['match_key'] = [x.split(':')[1] for x in patch.line_id.values]
origin['match_key'] = [x.split(':')[0] for x in origin.route_id.values]

for i,key in enumerate(origin.match_key.values):
    p = patch[patch.match_key == key]
    if len(p) == 1:
        if (origin.direction_id[i] == 0):
            origin.trip_headsign[i] = p.forward_line_name.values[0]
        elif (origin.direction_id[i] == 1):
            origin.trip_headsign[i] = p.backward_line_name.values[0]

origin.to_csv('new_origin.csv',index=False)


mhopeng February 2016

You can use the fileinput module from the standard library for in-place editing of a file. Something like this:

from __future__ import print_function
import fileinput

# first get the route info from patch.csv
f = open('patch.csv')
# discard header line
line = f.readline()
# get line of data
line = f.readline().split('\t')
route_id = line[0].split(':')[1] # '210210109'
route_forward = line[3]
route_backward = line[5]
f.close()

# process origin.csv and replace lines in-place
for line in fileinput.input('origin.csv', inplace=1):
    if line.startswith(route_id):
        newline = line.split('\t')
        if newline[4] == 0:
            newline[3] = route_backward
        else:
            newline[3] = route_forward
        print('\t'.join(newline),end="")
    else:
        print(line,end="")

Post Status

Asked in February 2016
Viewed 3,011 times
Voted 12
Answered 2 times

Search




Leave an answer