Emily February 2016

how to divide an input file into different output files based on information of the first column at input in R?

I want to make saparated output files based on unique values in input file. so the input looks like:

input.txt:
1 23 
1 22
1 2
1 45
1 33
2 22
2 1
2 1
3 22
3 45
3 44

I want to have separated output files base on uniq values in first column of input. so:

out1.txt:

1 23 
1 22
1 2
1 45
1 33

out2.txt

 2 22
 2 1
 2 1

out3.txt

3 22
3 45
3 44

Any suggestion? my real input is a huge file

Answers


HubertL February 2016

This is not R... but since you are supposed to have a very big file, this simple bash scrip won't load the whole thing into memory

for i in {1..5}
do
   grep "^$i .*" input.txt > out$i.txt
done


Hadd E. Nuff February 2016

Since you tagged this R, I will provide an R answer. Here is a base R method first, with a data.table method down below.

## read the data into R
df <- read.table("input.txt")
## split the data frame by the first column
s <- split(df, df[[1L]])
## write each table in 's' to file 'out*.txt'
invisible(
    Map(write.table, x = s, file = sprintf("out%s.txt", seq_along(s)))
)

Now you should have three new files "out1.txt", "out2.txt", and "out3.txt" based on your example data.

Alternatively, we can speed this up with the data.table package.

library(data.table)
## read the data
dt <- fread("input.txt")
## write each chunk of 'dt' to file 'out*.txt' by group
dt[, write.table(cbind(V1 = .GRP, .SD), sprintf("out%s.txt", .BY)), by = V1]

Obviously this makes some assumptions about column names, but it is easy to change those values accordingly.

Post Status

Asked in February 2016
Viewed 3,859 times
Voted 11
Answered 2 times

Search




Leave an answer