chas February 2016
### create count matrix in R

I have large dataframe as shown below with few rows and column:

```
ID1 ID2 ID3 ID4
S1 2 4 2 6
S2 2 1 3 2
S3 2 2 2 2
S4 3 0 2 2
```

For each row i would need a matrix with count of each number in the range of ID value. Since the largest is 6 in ID values, it creates a matrix with 7 columns i.e. 0 to 6 and fill with the count values.

Sample Output:

```
0 1 2 3 4 5 6
S1 0 0 2 0 1 0 1
S2 0 1 2 1 0 0 0
S3 0 0 4 0 0 0 0
S4 1 0 2 1 0 0 0
```

Is there a way of doing this in R.

akrun February 2016

We can use `table`

```
table(c(row(df1)), unlist(df1))
# 0 1 2 3 4 6
# 1 0 0 2 0 1 1
# 2 0 1 2 1 0 0
# 3 0 0 4 0 0 0
# 4 1 0 2 1 0 0
```

If we need `0`

and `5`

also

```
tbl <- table(c(row(df1)), factor(unlist(df1), levels=0:6))
dimnames(tbl)[[1]] <- row.names(df1)
tbl
#
# 0 1 2 3 4 5 6
# S1 0 0 2 0 1 0 1
# S2 0 1 2 1 0 0 0
# S3 0 0 4 0 0 0 0
# S4 1 0 2 1 0 0 0
```

Another option is `mtabulate`

from `qdapTools`

```
library(qdapTools)
mtabulate(as.data.frame(t(df1)))
```

A Handcart And Mohair February 2016

This is actually a perfect situation to use `apply`

+ `tabulate`

, except for the inclusion of zeroes in your data and the need to include them.

Since you need to include tabulation of zeroes, you make a small modification to `tabulate`

to start at zero instead of 1.

Here's a function that puts the approach in place:

```
DFTabulate <- function(indf) {
nbins <- max(indf)
`colnames<-`(t(apply(indf + 1, 1, tabulate, nbins = nbins + 1)), 0:nbins)
}
```

Here it is applied to your sample data.

```
DFTabulate(mydf)
# 0 1 2 3 4 5 6
# S1 0 0 2 0 1 0 1
# S2 0 1 2 1 0 0 0
# S3 0 0 4 0 0 0 0
# S4 1 0 2 1 0 0 0
```

You specify that you have a "large" `data.frame`

but don't describe how large, so I'm not sure how relevant the following benchmark is.

However, just to share the logic behind using this approach: `tabulate`

is generally a very fast function, so I thought I would make use of its efficiency.

Here's the benchmark:

```
set.seed(1)
nrow = 10000
ncol = 100
min = 0
max = 500
mydf <- data.frame(
matrix(sample(min:max, nrow*ncol, TRUE),
nrow = nrow, ncol = ncol,
dimnames = list(paste0("S", 1:nrow), paste0("ID", 1:ncol))))
fun2 <- function(df1 = mydf) {
tbl <- table(c(row(df1)), factor(unlist(df1), levels=0:max))
dimnames(tbl)[[1]] <- row.names(df1)
tbl
}
fun3 <- function(df1 = mydf) mtabulate(as.data.frame(t(df1)))
system.time(DFTabulate(mydf))
# user system elapsed
# 0.000 0.000 0.154
system.time(fun2(mydf))
# user system elapsed
# 0.000 0.000 1.018
system.time(fun3(mydf))
# user system elapsed
# 4.560 0.000 3.081
```

Asked in February 2016

Viewed 2,844 times

Voted 14

Answered 2 times

Viewed 2,844 times

Voted 14

Answered 2 times