DeltaIV February 2016

Read in file with complicated format and tidy it

I would like to read a file with a complex format into a data frame or data table. I simplified the format to have the simplest example which can still convey all of the complexity of the real case.

TITLE = "SomeTitleHere"
VARIABLES = "n","q[m3/hr]","gf[-]","pe[bar]","eff[%]",
ZONE DATAPACKING=POINT T="Design GF= 0.000 Q= 818.96 rpm=4800.",I=  4
  0    818.96002      0.00000      13.00000    
     61.92762
  1    818.96002      0.00000      29.86776    
     61.92762
 ZONE DATAPACKING=POINT T="Offdesign GF= 0.000 Q= 200.00 rpm=4800.",I=  4
  0    200.00000      0.00000      13.00000    
      0.00000
  1    200.00000      0.00000      37.79360    
     27.12768
 ZONE DATAPACKING=POINT T="Offdesign GF=  0.000 Q=1200.00 rpm=4800.",I=  4
  0   1200.00000      0.00000      13.00000
      0.00000
  1   1200.00000      0.00000      17.17662
     28.08889
 ZONE DATAPACKING=POINT T="Offdesign GF=  0.100 Q= 200.00 rpm=4800.",I=  4
  0    200.00000      0.10000     13.00000
      0.00000
  1    188.40880      0.04463      30.91997
     22.54672
 ZONE DATAPACKING=POINT T="Offdesign GF= 0.100 Q=1200.00 rpm=4800.",I=  4
  0   1200.00000      0.10000    13.00000    
      0.00000
  1   1177.85608      0.08308     15.94177
     13.05620

Format explanation: the first line (TITLE = "SomeTitleHere") is some kind of comment and can be skipped. The second line contains prefixes for some variable names and their measurement units. Since I know which are the names of the variables, this line can also be skipped. Then, there are 2*n+1 "data blocks". Each data block is 5 lines long: the first is a title line, which contains the values of four variables, Point, GFin, Qin and rpm (thus it must be parsed). For example, for the first block the title line is

ZONE DATAPACKING=POINT T="Design GF= 0.000 Q= 818.96 rpm=4800.",I=  4

which corresponds to

<        

Answers


kdopen February 2016

You need to think a little bit sideways here. You have a list of text lines, which you want to process 5 at a time. So, pass lapply a list of indices into the list of data

lapply(seq(1,length(s), 5), function (x) { parse_point(s[x:x+4]) })

This will call parse_point with each group of 5 lines in the source file.

You could also modify parse_point to take the array index x instead of a list of lines. Then it's just

lapply(seq(1,length(s), 5), parse_point)

You may need to either unlist the result of the lapply or consider using sapply instead.

Post Status

Asked in February 2016
Viewed 1,898 times
Voted 6
Answered 1 times

Search




Leave an answer