Dinesh February 2016

Summing values after every third position in data frame in R

I am new to R. I have a data frame like following

>df=data.frame(Id=c("Entry_1","Entry_1","Entry_1","Entry_2","Entry_2","Entry_2","Entry_3","Entry_4","Entry_4","Entry_4","Entry_4"),Start=c(20,20,20,37,37,37,68,10,10,10,10),End=c(50,50,50,78,78,78,200,94,94,94,94),Pos=c(14,34,21,50,18,70,101,35,2,56,67),Hits=c(12,34,17,89,45,87,1,5,6,3,26))

Id       Start End Pos Hits
Entry_1    20  50  14   12
Entry_1    20  50  34   34
Entry_1    20  50  21   17
Entry_2    37  78  50   89
Entry_2    37  78  18   45
Entry_2    37  78  70   87
Entry_3    68 200 101    1
Entry_4    10  94  35    5
Entry_4    10  94   2    6
Entry_4    10  94  56    3
Entry_4    10  94  67   26

For each entry I would like to iterate the data.frame in 3 different modes. For an example, for Entry_1 mode_1 =seq(20,50,3)and mode_2=seq(21,50,3) and mode_3=seq(22,50,3). I would like sum all the Values in Column "Hits" whose corresponding values in Column "Pos" that falls in mode_1 or_mode_2 or mode_3 and generate a data.frame like follow:

Id       Mode_1   Mode_2   Mode_3
Entry_1   0        17       34
Entry_2   87       89        0
Entry_3   1         0        0
Entry_4   26        8        0 

I tried the following code:

    mode_1=0
    mode_2=0
    mode_3=0
    mode_1_sum=0
    mode_2_sum=0
    mode_3_sum=0
    for(i in dim(df)[1])
    {
      if(df$Pos[i] %in% seq(df$Start[i],df$End[i],3))
      {
      mode_1_sum=mode_1_sum+df$Hits[i]
      print(mode_1_sum)
      }
      mode_1=mode_1_sum+counts
      print(mode_1)

      ifelse(df$Pos[i] %in% seq(df$Start[i]+1,df$End[i],3))
      {
      mode_2_sum=mode_2_sum+df$Hits[i]
      print(mode_2_sum)
      }
      mode_2_sum=mode_2_sum+counts
      print(mode_2)

      ifelse(df$Pos[i] %in% seq(df$Start[i]+2,df$End[i],3))
      {
      mode_3_sum=mode_3_sum+df$Hits[i]
      print(mode_3_sum)
      }
      mode_3_sum=mode_3_sum+cou        

Answers


Laterow February 2016

It's not an elegant solution, but it works.

m <- 3 # Number of modes you want

foo <- ((df$Pos - df$Start)%%m + 1) * (df$Start < df$Pos) * (df$End > df$Pos)
tab <- matrix(0,nrow(df),m)
for(i in 1:m) tab[foo==i,i] <- df$Hits[foo==i]

aggregate(tab,list(df$Id),FUN=sum)
#   Group.1 V1 V2 V3
# 1 Entry_1  0 17 34
# 2 Entry_2 87 89  0
# 3 Entry_3  1  0  0
# 4 Entry_4 26  8  0

-- EXPLANATION --

First, we find the indices of df$Pos That are both bigger than df$Start and smaller than df$End. These should return 1 if TRUE and 0 if FALSE. Next, we take the difference between df$Pos and df$Start, we take mod 3 (which will give a vector of 0s, 1s and 2s), and then we add 1 to get the right mode. We multiply these two things together, so that the values that fall within the interval retain the right mode, and the values that fall outside the interval become 0.

Next, we create an empty matrix that will contain the values. Then, we use a for-loop to fill in the matrix. Finally, we aggregate the matrix.

I tried looking for a quicker solution, but the main problem I cannot work around is the varying intervals for each row.

Post Status

Asked in February 2016
Viewed 2,894 times
Voted 12
Answered 1 times

Search




Leave an answer