BartekKSD February 2016

Finding pairs in a table in R

I need to identfy pairs of numbers in a table in R. The structure of the table is as follows:

   user_A   user_B   counter
1       1        2         5
2       1        3         3
3       2        1        10
4       2        4         8

I want to check if for each pair, an opposite one exists, e.g. pair (1,2) and pair (2,1), and if so - sum the counter value for both pairs. As a result, I want to have an output in a form of a table like this one:

   user_A   user_B   sum   bi_directional
1       1        2    15             TRUE
2       1        3     3            FALSE
3       2        4     8            FALSE

Thank you in advance!

Answers


akrun February 2016

We can sort the first two column row wise with apply (MARGIN=1), cbind with the third column ('d1'), get an index of duplicates of the first two column ('i1'). Convert to 'data.table' (setDT(d2)), grouped by 'user_A', and 'user_B', get the sum of 'Counter' and the first row of 'i1'.

 d1 <- setNames(cbind(t(apply(df[1:2], 1, sort)), df[3]), names(df))
  i1 <- duplicated(d1[1:2])|duplicated(d1[1:2], fromLast=TRUE)

 d2 <- cbind(d1, i1)
 library(data.table)
 setDT(d2)[, list(counter=sum(counter), bi_directional=i1[1L]) ,.(user_A, user_B)]
 #  user_A user_B counter bi_directional
 #1:      1      2      15           TRUE
 #2:      1      3       3          FALSE
 #3:      2      4       8          FALSE

Or another option is

setDT(df)[user_A > user_B, c('user_B', 'user_A') := 
       list(user_A, user_B)]
df[, list(counter= sum(counter), bi_directional= .N>1), 
                                   by = .(user_A, user_B)]
#     user_A user_B counter bi_directional
#1:      1      2      15           TRUE
#2:      1      3       3          FALSE
#3:      2      4       8          FALSE


Stibu February 2016

This is a dplyr solution:

df %>%
  mutate(user_A2 = pmin(user_A, user_B),
         user_B = pmax(user_A, user_B),
         user_A = user_A2) %>%
  select(-user_A2) %>%
  group_by(user_A, user_B) %>%
  summarise(sum = sum(counter), bi_directional = n() > 1) %>%
  as.data.frame
##   user_A user_B sum bi_directional
## 1      1      2  15           TRUE
## 2      1      3   3          FALSE
## 3      2      4   8          FALSE

mutate() is used to redefine user_A and user_B such that the smaller of the two values is always in the first column. Then the auxiliary column user_A2 is dropped. The data is grouped by user_A and user_B and two summaries are calculated for each group: The sum over counter and whether there was more than one value. This latter uses n() which counts the number of rows in the group.

The last line uses as.data.frame to transform back to a data frame. This is only needed, if you insist on having a data frame as your result.

Post Status

Asked in February 2016
Viewed 3,559 times
Voted 8
Answered 2 times

Search




Leave an answer