Developers Planet

user43395 February 2016

Delete particular rows in R

In general, I know how to delete rows in R. However, for this particular requirement, I am unsure how to proceed. Here is an idea of what I need to do with data:

``````   ID        MONTH   INCOME
1. 00000012    6        60
2. 00000012    8        65
3. 00000015    12       70
4. 00000025    4        45
5. 00000025    8        60
6. 00000032    6        10
7. 00000035    6        30
``````

Quick explanation of each column:

The first 7 digits of ID identify an agent. So, in row one, 00000012 means agent 1. The last digit is the interview number. So, in row three, 00000015 means agent 1, interview 5.

Month and income are straightforward.

What Must Be Done

I need to delete every ID that does not include both a 2nd and 5th interview.

I need to only have the max. month for the 2nd interview, and 5th interview for each ID.

So, if I cleaned the data properly, I would have:

``````   ID        MONTH   INCOME
2. 00000012    8        65
3. 00000015    12       70
6. 00000032    6        10
7. 00000035    6        30
``````

Notice row 4,5 are gone because there was no 2nd interview for agent 2. Row 1 is gone because there was a higher month for agent 1, interview 2.

My current thoughts how to do this seem overly complex. I am thinking of breaking ID into two columns, one with the first 7 digits, another column with the last digit. Then, loop through the entire data, and at each row, run another loop to see if the ID that corresponds to the row has both an interview 2 and interview 5. If it does, fine. If it doesn't, I then have to delete all rows with that ID.

Next, I have to do a similar thing for deleting non-max months.

I feel like I could do the above, but it is very cumbersome. Is there a better way to do this? Thank you.

HubertL February 2016

You can do something like that:

``````library(stringi)
Agents <- substr(df\$ID,1,nchar(df\$ID)-1 )
A2 <- stri_endswith_fixed(df\$ID,"2", fixed = T)
A5 <- stri_endswith_fixed(df\$ID,"5", fixed = T)
A2and5 <- intersect(Agents[A5], Agents[A2])
df[Agents %in% A2and5,]
``````