Home Ask Login Register

Developers Planet

Your answer is one click away!

Emily February 2016

how to divide an input file into different output files based on information of the first column at input in R?

I want to make saparated output files based on unique values in input file. so the input looks like:

input.txt:
1 23 
1 22
1 2
1 45
1 33
2 22
2 1
2 1
3 22
3 45
3 44

I want to have separated output files base on uniq values in first column of input. so:

out1.txt:

1 23 
1 22
1 2
1 45
1 33

out2.txt

 2 22
 2 1
 2 1

out3.txt

3 22
3 45
3 44

Any suggestion? my real input is a huge file

Answers


HubertL February 2016

This is not R... but since you are supposed to have a very big file, this simple bash scrip won't load the whole thing into memory

for i in {1..5}
do
   grep "^$i .*" input.txt > out$i.txt
done


Hadd E. Nuff February 2016

Since you tagged this R, I will provide an R answer. Here is a base R method first, with a data.table method down below.

## read the data into R
df <- read.table("input.txt")
## split the data frame by the first column
s <- split(df, df[[1L]])
## write each table in 's' to file 'out*.txt'
invisible(
    Map(write.table, x = s, file = sprintf("out%s.txt", seq_along(s)))
)

Now you should have three new files "out1.txt", "out2.txt", and "out3.txt" based on your example data.

Alternatively, we can speed this up with the data.table package.

library(data.table)
## read the data
dt <- fread("input.txt")
## write each chunk of 'dt' to file 'out*.txt' by group
dt[, write.table(cbind(V1 = .GRP, .SD), sprintf("out%s.txt", .BY)), by = V1]

Obviously this makes some assumptions about column names, but it is easy to change those values accordingly.

Post Status

Asked in February 2016
Viewed 3,859 times
Voted 11
Answered 2 times

Search




Leave an answer


Quote of the day: live life