Chris February 2016

### Finding patterns of values within a table in R

My data is a time series with columns of Date, and a Value sequence between 1 and 4. I'm trying to find patterns of consecutive values that run from 1-3 or 4-2 and retrieve the date on which the pattern is completed (ie. when 3 is hit in the first scenario, and 2 in the second).

The input data is as follows:

``````data.frame(Date=seq(as.Date("2010/1/1"), as.Date("2010/1/20"), "day"),
Value=c(1,2,3,4,3,4,3,4,3,2,1,2,1,2,3,4,3,4,3,2))

Date         Value
1  2010-01-01     1
2  2010-01-02     2
3  2010-01-03     3
4  2010-01-04     4
5  2010-01-05     3
6  2010-01-06     4
7  2010-01-07     3
8  2010-01-08     4
9  2010-01-09     3
10 2010-01-10     2
...
``````

The output I desire is, for example:

``````data.frame(Date=as.Date(c("2010/1/3","2010/1/10","2010/1/15","2010/1/20")),
Value=c("Win","Loss","Win","Loss"))

Date       Value
2010-01-03   Win
2010-01-10  Loss
2010-01-15   Win
2010-01-20  Loss
``````

Where the former sequence (Win) can be distinguished from the latter (loss).

Many thanks!

Alex February 2016

Multiple ways to do this obviously, here's one:

``````#' @param d a vector of dates
#' @param v a vector of numeric values
win_lose <- function(d,v) {
l <- list()
for (i in 3:length(v)) {
if (v[i] == 3 & v[i-1] == 2 & v[i-2] == 1) {
l[[length(l)+1]] <- data.frame(Date= d[i], Value= "Win")
} else if (v[i] == 2 & v[i-1] == 3 & v[i-2] == 4) {
l[[length(l)+1]] <- data.frame(Date= d[i], Value= "Loss")
}
}
return(data.frame(do.call("rbind", l)))
}

R> win_lose(df\$Date, df\$Value)
Date Value
1 2010-01-03   Win
2 2010-01-10  Loss
3 2010-01-15   Win
4 2010-01-20  Loss
``````

jalapic February 2016

You could do this with sequence matching. Assumes your numbers don't go above 9. The basic idea is to create a string sequence out of your Value variable. Then search for the end indexes of your "win" and "loss" sequence (i.e. 123 or 432).

``````library(stringr)
sequence <- paste(df\$Value, collapse = "")
wins <- str_locate_all(sequence, "123")
losses <- str_locate_all(sequence, "432")

dfwin <- df[wins[[1]][,2],]
dfwin\$Value <- "Win"

dfloss <-df[losses[[1]][,2],]
dfloss\$Value <- "Loss"

rbind(dfwin,dfloss)

Date Value
3  2010-01-03   Win
15 2010-01-15   Win
10 2010-01-10  Loss
20 2010-01-20  Loss
``````

