Tbb February 2016

How to exclude first word in Pandas header?

I'm importing text files to Pandas data frames. Number of columns can vary and also the names varies.

However, the header line always starts with ~A and read_csv interprets this a s the name of the first column, subsequently all the column names are shifted on step to the right.

Earlier I used np.genfromtxt() with the argument deletechars = 'A__' but I haven't find any equivalent function for pandas. Is there a way to exclude the name when reading or, as an second option, delete the first name but keep the columns intact?

I'm reading file like this:

in_file = pd.read_csv(file_name, header=header_row,delim_whitespace=True)

Now I got this (just as the text file looks):

             ~A     DEPTH    TIME   TX1   TX2  TX3  OUT6
11705  2.94  10525.38  126.14  169.71  353.86   4.59   NaN
11706  2.93  10525.38  NaN  168.29  368.00   4.75   NaN
11707  2.92  10525.38  126.14  166.71  369.86   4.93   NaN

but I want' to get this:

       DEPTH    TIME   TX1   TX2  TX3   OUT6
11705  2.94  10525.38  126.14  169.71  353.86   4.59
11706  2.93  10525.38  NaN  168.29  368.00   4.75
11707  2.92  10525.38  126.14  166.71  369.86   4.93

Answers


Ani February 2016

Why not just post-process?

df = ...
df_modified = df[df.columns[:-1]]
df_modified.columns = df.columns[1:]


Tim Seed February 2016

Choose which columns to import

in_file = pd.read_csv(file_name, header=header_row,
             delim_whitespace=True,
             usecols=['DEPTH','TIME','TX1','TX2','TX3','OUT6')


Dr. Drew February 2016

How about if you read the file twice? First, use pd.read_csv() but skip your header row. Second, use open.readline() to parse the header and drop the first item. This can then be assigned to your dataframe.

in_file = pd.read_csv(file_name, delim_whitespace=True, header = None, skiprows = [0])
with open(file_name,'rt') as h:
    hdrs = h.readline().rstrip('\n').split(',')
in_file.columns = hdrs[1:]


Tim Seed February 2016

Ok so if the number of columns vary and you want to remove the first column (who's name varies) AND you do not want too do this in a Post-cv_read phase... then .... (Drum Roll)

import pandas as pd

#Tim.csv is
#1,2,3
#2,3,4
#3,4,5
headers=['BADCOL','Happy','Sad']
data = pd.read_csv('tim.csv').iloc[:,1:]

Data will now look like

    b   c
    2   3
    3   4
    4   5

Not sure if this counts as Post-CSV processing or not...

Post Status

Asked in February 2016
Viewed 1,911 times
Voted 12
Answered 4 times

Search




Leave an answer