m0h3n February 2016

Find a vector in matrix when order matters in R

I have a matrix say m as follows:

> m
[,1] [,2] [,3] [,4] [,5]
[1,]    4    1    5    2    3
[2,]    5    2    3    4    1
[3,]    3    4    1    5    2
[4,]    1    5    2    3    4
[5,]    2    3    4    1    5
[6,]    4    1    5    2    3
[7,]    5    2    3    4    1
[8,]    3    4    1    5    2

and a vector named vec as follows:

> vec
[1] 3 1

I'd like to find all rows in m containing vec in the same order. e.g. the result should be like (note that the first, fourth and sixth rows are not of interest):

> res
[2,]    5    2    3    4    1
[3,]    3    4    1    5    2
[5,]    2    3    4    1    5
[7,]    5    2    3    4    1
[8,]    3    4    1    5    2

Would you please tell me how can I do so in R? Thanks

Answers


Pierre Lafortune February 2016

Here's a general solution. We can create a regex pattern for vec then check it against the data combined into a set of strings for each row:

v2 <- paste(vec, collapse=".*?")
df.vec <- do.call(paste, as.data.frame(m))
m[grep(v2, df.vec),]
#      [,1] [,2] [,3] [,4] [,5]
# [2,]    5    2    3    4    1
# [3,]    3    4    1    5    2
# [5,]    2    3    4    1    5
# [7,]    5    2    3    4    1
# [8,]    3    4    1    5    2


docendo discimus February 2016

Here's an option using apply:

> m[apply(m, 1, function(x) all(c(3,1) %in% x) & which(x == 3) < which(x == 1)),]
#  V2 V3 V4 V5 V6
#2  5  2  3  4  1
#3  3  4  1  5  2
#5  2  3  4  1  5
#7  5  2  3  4  1
#8  3  4  1  5  2

Here's a general solution to it for any vectors:

> vec <- c(3,4,1,5)
> m[apply(m, 1, function(x) all(vec %in% x) & all(diff(sapply(vec, function(y) which(x == y))) > 0)),]
#  V2 V3 V4 V5 V6
#3  3  4  1  5  2
#5  2  3  4  1  5
#8  3  4  1  5  2

I'd put it in a function for more convenient use:

f <- function(m, vec) m[apply(m, 1, function(x) all(vec %in% x) & all(diff(sapply(vec, function(y) which(x == y))) > 0)),]
f(m, c(3,1,5))
#  V2 V3 V4 V5 V6
#3  3  4  1  5  2
#5  2  3  4  1  5
#8  3  4  1  5  2


akrun February 2016

We can try

m1[apply(m1, 1, function(x) {n1 <- match(vec,x)
    n1[1] <n1[2]}),]
#      v1 v2 v3 v4 v5
#[1,]  5  2  3  4  1
#[2,]  3  4  1  5  2
#[3,]  2  3  4  1  5
#[4,]  5  2  3  4  1
#[5,]  3  4  1  5  2

Or

m1[apply(m1, 1, function(x) all(diff(match(vec, x))>0)),]


alexis_laz February 2016

Yet another attempt.

Since (1) m[i, ] and vec are repetition free and, (2) m being permutations so ncol(m) << nrow(m) you could test whether each following column matches a following element of vec than its previous column(s).

ff = function(mat, vec) 
{
    matched = array(match(mat, vec, 0L), dim(mat))

    ans = seq_len(nrow(mat))
    for(j in 2:ncol(mat)) {
        zeroj = ans[matched[ans, j] == 0L]
        matched[zeroj, j] = matched[zeroj, j - 1L]
        ans = ans[matched[ans, j] >= matched[ans, j - 1L]]
    }

    ans
}
ff(m, c(3, 1))
#[1] 2 3 5 7 8

And benchmarking on larger data using a modified version of akrun's (fastest of all) answer:

akrun = function(mat, vec)
    which(!apply(mat, 1L, function(x) is.unsorted(match(vec, x))))

set.seed(007); MAT = do.call(rbind, replicate(1e6, sample(15), simplify = FALSE)); VEC = sample(15, 8)

system.time({ ansff = ff(MAT, VEC) })
#   user  system elapsed 
#   0.70    0.08    0.78 
system.time({ ansakrun = akrun(MAT, VEC) })
#   user  system elapsed 
#   5.28    0.06    5.35 
all.equal(ansff, ansakrun)
#[1] TRUE

Post Status

Asked in February 2016
Viewed 3,148 times
Voted 14
Answered 4 times

Search




Leave an answer