Gabriel Hernandez February 2016

Function to evaluate columns with different names within a data frame

I have a data frame (total) like so:

            ID      pos  ori cont mA1 nmA1 bdA1 mA2 nmA2 bdA2 mB1 nmB1 bdB1 mB2
         1: ChrM      5   +  CCG   0    1    2   0    1    2   0    4    5   0
         2: ChrM      6   +  CGT   0    1    2   2    0    0   2    2    2   1
         3: ChrM      7   -  CGG   0    1    2   0    6    7   0    3    4   1
         4: ChrM     10   +  CGA   0    2    3   2    1    2   2    3    2   1
         5: ChrM     11   -  CGA   0    1    2   2    6    2   0    3    4   1
        ---                                                                   
    164264: ChrM 366914   +  CAA   0    1    2   0    2    3   0    1    2   0
    164265: ChrM 366918   +  CCG   0    1    2   0    2    3   0    0    1   0
    164266: ChrM 366919   +  CGG   0    1    2   0    2    3   0    0    1   0
    164267: ChrM 366920   -  CGG   1    2    2   0    5    6   0    1    2   0
    164268: ChrM 366921   -  CCG   0    3    4   0    3    4   0    0    1   0
            nmB2 bdB2
         1:    5    6
         2:    6    3
         3:    3    2
         4:    7    3
         5:    8    3
        ---          
    164264:    8    9
    164265:    7    8
    164266:    7    8
    164267:    4    5
    164268:    4    5

And I want a function to evaluate a couple of criteria. When doing it one by one I used

total$critA <- as.numeric((total$mA1+total$nmA1>=4)&(total$nmA1>=bdA1))

So I get a 0 if True or 1 if false. I'd like to apply this to all treatments (A1 (m, nm and bd), A2, A3, etc.)

I'm really new to R, and haven't figured out how to do a bunch of stuff just yet, so any help is greatly appreciated. Thanks!

Answers


Gregor February 2016

I think something like this: (If you share data with dput I'd copy/paste it and test... see here for other tips on writing good, reproducible R questions.

add_crit = function(data, treatment) {
    m_name = paste0("m", treatment)
    nm_name = paste0("nm", treatment)
    bd_name = paste0("bd", treatment)
    crit_name = paste0("crit", treatment)

    data[crit_name] = as.numeric(
      (data[m_name] + data[nm_name] >= 4) & (data[nm_name] >= data[bd_name])
    )
    return(data)
}

treatments = c("A1", "A2", "B1", "B2")
data_with_crit = total

for (trt in treatments) {
    data_with_crit = add_crit(data_with_crit, trt)
}

I build up the column names you need as strings with paste. When you have column names stored in variables, you need to use [ rather than $, but otherwise they work just as well.

fortunes::fortune(343)

Sooner or later most R beginners are bitten by this all too convenient shortcut. As an R newbie, think of R as your bank account: overuse of $-extraction can lead to undesirable consequences. It's best to acquire the '[[' and '[' habit early. -- Peter Ehlers (about the use of $-extraction) R-help (March 2013)

The other (more generalizable) way to handle this problem would be to "melt" your data into long format - you would have a single treatment column with values A1, A2, ... and then single columns for m, nm, bd, crit. Multiple rows per id (one row per treatment per id). That would lend itself to a data.table or dplyr solution. Perhaps someone else will post an example.

Post Status

Asked in February 2016
Viewed 2,694 times
Voted 6
Answered 1 times

Search




Leave an answer