Keeley Seymour February 2016

How to delete groups containing less than 3 rows of data in R?

I'm using the dplyr package in R and have grouped my data by 3 variables (Year, Site, Brood).

I want to get rid of groups made up of less than 3 rows. For example in the following sample I would like to remove the rows for brood '2'. I have a lot of data to do this with so while I could painstakingly do it by hand it would be so helpful to automate it using R.

Year Site Brood Parents
1996 A    1     1  
1996 A    1     1  
1996 A    1     0  
1996 A    1     0  
1996 A    2     1      
1996 A    2     0  
1996 A    3     1  
1996 A    3     1  
1996 A    3     1  
1996 A    3     0  
1996 A    3     1  

I hope this makes sense and thank you very much in advance for your help! I'm new to R and stackoverflow so apologies if the way I've worded this question isn't very good! Let me know if I need to provide any other information.

Answers


drhagen February 2016

One way to do it is to use the magic n() function within filter:

library(dplyr)

my_data <- data.frame(Year=1996, Site="A", Brood=c(1,1,2,2,2))

my_data %>% 
  group_by(Year, Site, Brood) %>% 
  filter(n() >= 3)

The n() function gives the number of rows in the current group (or the number of rows total if there is no grouping).


pluke February 2016

You can also do this using base R:

temp <- read.csv(paste(folder,"test.csv", sep=""), head=TRUE, sep=",")
matches <- aggregate(Parents ~ Year + Site + Brood, temp, FUN="length")
temp <- merge(temp, matches, by=c("Year","Site","Brood"))
temp <- temp[temp$Parents.y >= 3, c(1,2,3,4)]


MichaelChirico February 2016

Throwing the data.table approach here to join the party:

library(data.table)
setDT(my_data)
my_data[ , if (.N >= 3) .SD, by = .(Year, Site, Brood)]

Post Status

Asked in February 2016
Viewed 1,322 times
Voted 8
Answered 3 times

Search




Leave an answer