haimen February 2016

Append a list of dataframes into one inside a loop in Python

Suppose I have a dataframe and I want to split the dataframe for performing K fold cross validation. I know that there are packages available to do this. But I am trying to write the code inorder to learn few things. I have tried the following code, where I get the parameter for K and split the data into K parts and save it to df_array. Now for each iteration I want to have one as test and remaining as training data. I am able to substitute one as test in validation_data variable. But the training data is having an list of remaining 9 dataframes. I want to append that into one so that I can apply my model to it. Can anybody help me in doing this?

df=pd.DataFrame(range(0,10))

def k_fold_cross_validation(data,K):
    data=data.sample(frac=1)
    df_array = [ data[i::K]for i in xrange(K)]
    print df_array
    for i,val in enumerate(df_array):
        validation_data = pd.DataFrame(df_array[i])
        print "validation "
        print validation_data
        training_data_list = df_array[:i] + df_array[i+1:]
        print "training"
        print training_data_list

 k_fold_cross_validation(df,10)

My output should by validation 0 training as a dataframe with 1,2,3,...9 values. and for the next iteration, validation 1 and training as a dataframe with 0,2,3,...9 and it goes on.

Can anybody help me in doing this?

Answers


howMuchCheeseIsTooMuchCheese February 2016

  training_data_list = df_array[:i] + df_array[i+1:]
  mDF=pd.DataFrame()
  for df in training_data_list:
        mDF.appened(df)

mDF will now have all the data in the list of DF's

Post Status

Asked in February 2016
Viewed 2,835 times
Voted 4
Answered 1 times

Search




Leave an answer