Nick C February 2016

Creating a Data Frame of the Number of Pairings based on an Event Column

I have a data frame that contains one column which indicates an event ID. There is another column that indicates the products used in that event. Each product would only be used one time for an event and each event contains at least one product. I would like to know how many times each product is used with every other product. Some sample data is below:

set.seed(1)
events <- paste('Event ', sample(1:4, size = 15, replace = TRUE), sep = '')
events <- events[order(events)]

prods <- paste('Product ', c(1, 2, 3, 4, 1, 5, 6, 2, 4, 6, 7, 1, 2, 3, 5))

test_data <- data.frame(events, prods)
test_data
  events      prods
1  Event 1 Product  1
2  Event 1 Product  2
3  Event 1 Product  3
4  Event 1 Product  4
5  Event 2 Product  1
6  Event 2 Product  5
7  Event 2 Product  6
8  Event 3 Product  2
9  Event 3 Product  4
10 Event 3 Product  6
11 Event 3 Product  7
12 Event 4 Product  1
13 Event 4 Product  2
14 Event 4 Product  3
15 Event 4 Product  5

Product 1 and Product 2 occur in the same event twice (Event 1 and Event 4). So I would want to return a '2' for that match. Product 1 and Product 7 never occur in the same event, so I'd want to return a 0 for that pair. For 'matches' between the same item, I am comfortable returning the total number of times that product is used.

There are two formats that are possible and I don't have a preference for which I'd like to see returned.

  1. A short and fat data frame that has the products running across the tops as column headers and the side as row headers. The body of this data frame would be populated by the number of matches.
  2. A long, narrow data frame where there are two columns that will serve to represent all possible combinations of Product pairings and then a third column representing the number of times they match.

I have been experimenting with expand.grid with nothing to show for it.

Thank you!

Answers


lukeA February 2016

Maybe this is using a sledgehammer to crack a nut, but you could mine (frequent) item sets, which comes with other fancy stuff. It could work like this:

library(arules)
library(reshape2)
mat <- as(sapply(dcast(test_data, events~prods, fun.aggregate = length, value.var="prods")[, -1], as.logical), "transactions")
sets <- apriori(trans, parameter = list(supp = 0, conf = 0, minlen = 2, maxlen = 2, target = "frequent itemsets"))
df <- as(sets, "data.frame")
subset(transform(df, n=support*nrow(trans)), n>0, -support)
#                      items n
# 2  {Product  6,Product  7} 1
# 4  {Product  4,Product  7} 1
# 6  {Product  2,Product  7} 1
# 7  {Product  5,Product  6} 1
# 8  {Product  3,Product  5} 1
# 10 {Product  1,Product  5} 2
# 11 {Product  2,Product  5} 1
# 13 {Product  4,Product  6} 1
# 14 {Product  1,Product  6} 1
# 15 {Product  2,Product  6} 1
# 16 {Product  3,Product  4} 1
# 17 {Product  1,Product  3} 2
# 18 {Product  2,Product  3} 2
# 19 {Product  1,Product  4} 1
# 20 {Product  2,Product  4} 2
# 21 {Product  1,Product  2} 2

The support value shows you the percentage of events in which both products were included. I multiplied it with the number of transactions to get your frequency count.


thelatemail February 2016

Split prods by events and then calculate all the combn-inations, then aggregate to get the count of each combination.

out <- t(do.call(cbind,
  lapply(split(as.character(test_data$prods), test_data$events), combn, 2))
)
aggregate(count ~ . , data=transform(out,count=1), FUN=sum)

#           X1         X2 count
#1  Product  1 Product  2     2
#2  Product  1 Product  3     2
#3  Product  2 Product  3     2
#4  Product  1 Product  4     1
#5  Product  2 Product  4     2
#6  Product  3 Product  4     1
#7  Product  1 Product  5     2
#8  Product  2 Product  5     1
#9  Product  3 Product  5     1
#10 Product  1 Product  6     1
#11 Product  2 Product  6     1
#12 Product  4 Product  6     1
#13 Product  5 Product  6     1
#14 Product  2 Product  7     1
#15 Product  4 Product  7     1
#16 Product  6 Product  7     1

Post Status

Asked in February 2016
Viewed 2,866 times
Voted 4
Answered 2 times

Search




Leave an answer