Hima February 2016

Grouping by a column in R

I have the data in the following format :

 RouteId, StopOrder, StopType 
101, 1, Load 
101, 2, Unload
102, 1, Load
102, 2, Load
102, 3, Unload
102, 4, Unload
103, 1, Load
103, 2, Unload
103, 3, Load
103, 4, Unload

Given this data, I want to identify such routeIds which have a Load stop after unload stop.

Expected Output:
103

Answers


akrun February 2016

We can try data.table. Convert the 'data.frame' to 'data.table' (setDT(df2), grouped by 'RouteId', if any of the run-length-type id of the logical vector (StopType=='Load') is greater than 2, we get the Subset of Data.table (.SD). This will give the rows with 'RouteId' 103.

library(data.table)
setDT(df2)[,if(any(rleid(StopType=='Load') >2)) .SD ,.(RouteId)]
#    RouteId StopOrder StopType
#1:     103         1     Load
#2:     103         2   Unload
#3:     103         3     Load
#4:     103         4   Unload

If we need only the 'RouteId', just extract it, by subsetting from the logical vector.

setDT(df2)[, .GRP[any(rleid(StopType=='Load') >2)] ,
   .(RouteId)]$RouteId
#[1] 103

Or a base R option would be

 v1 <-  with(df2, tapply(StopType=='Load', RouteId, 
             FUN= function(x) {i1 <- which(x)
                  i1>1 || any(diff(i1)>1)}))

 names(v1)[v1]
 #[1] "103"


steveb February 2016

Here is a possible dplyr solution. This is based on your comment that you expect RouteId values as output.

library(dplyr)

# assuming your data is loaded into "df"
(df %>%
    arrange(RouteId, StopOrder) %>%
    group_by(RouteId) %>%
    filter(StopType == 'Unload' & lead(StopType) == 'Load') %>%
    ungroup)$RouteId

Post Status

Asked in February 2016
Viewed 2,231 times
Voted 8
Answered 2 times

Search




Leave an answer