Home Ask Login Register

Developers Planet

Your answer is one click away!

Richard Lusch February 2016

Fill elements of column based on multiple criteria

I have a data frame that I want to remove any week that contains an outlier. I will be happy if I can indicate the entire week as an outlier, as I understand how to do the subset from there. I have not been able to come up with an appropriate solution. I keep thinking that I am going to need to loop through subsets of weeks to achieve the desired goal, or create a separate function to handle the individual outlier week and use sapply. I have yet to make either of these solutions viable.

date <- seq(as.Date("2015-01-01"), length=365, by="1 day")
dow <- as.factor(weekdays(as.Date(date))
df <- data.frame(cbind(date, dow))
df$date <- as.Date(df$date,format="%m/%d/%Y",origin="01/01/1970")
df$dow <- as.factor(weekdays(as.Date(df$date)))
set.seed(1115)
df$var1 <- rnorm(365, 1912, 40795)
stdev <- sd(df$var1, na.rm=TRUE)
avg <- mean(df$var1, na.rm=TRUE)
df$LB <- avg-(2.75*stdev)
df$UB <- avg+(2.75*stdev)
df$outlier <- ifelse(df$var1<df$LB | df$var1>df$UB, 1,0)
df$weeknum <- as.numeric(format(df$date, "%U"))
head(df, 17)

> head(df, 17)
         date       dow       var1        LB       UB outlier weeknum
1  2015-01-01  Thursday  -7828.412 -114675.6 120479.8       0       0
2  2015-01-02    Friday  25674.456 -114675.6 120479.8       0       0
3  2015-01-03  Saturday -33588.871 -114675.6 120479.8       0       0
4  2015-01-04    Sunday -54418.175 -114675.6 120479.8       0       1
5  2015-01-05    Monday -10002.002 -114675.6 120479.8       0       1
6  2015-01-06   Tuesday  34050.390 -114675.6 120479.8       0       1
7  2015-01-07 Wednesday -37584.648 -114675.6 120479.8       0       1
8  2015-01-08  Thursday  84048.878 -114675.6 120479.8       0       1
9  2015-01-09    Friday -24801.346 -114675.6 120479.8       0       1
10 2015-01-10  Saturday  33974.637 -114675.6 120479.8       0       1
11 2015-01-11    Sunday  77432.088 -114675.6 120479.8       0       2
12 2015-01-12    Monday 128196.236 -114675.6 1204        

Answers


mshum February 2016

You say "the desired output would be a 1 the outlier column in each row that corresponds with weeknum = 2." Do you actually need an outlier column, then? Seems like you can simply subset your data.frame based on the values of the weeknum column, as follows:

df <- df[!(df$weeknum==2),]


Richard Lusch February 2016

The answer involves testing two vectors. Once I realized this I was able to refine my search and find a suitable answer here.

The code necessary to properly identify each element was:

out.df <- df[which(df$outlier==1),]#Create a subset of only outlier rows
df$outlier <- ifelse(df$weeknum %in% out.df$weeknum, 1, 0)#Compare the new data frame
#weeknum against the old with the %in% operator, if they are equal leave 1, else 0.

This gave the result:

> head(df, 17)
         date       dow       var1        LB       UB outlier weeknum
1  2015-01-01  Thursday  -7828.412 -114675.6 120479.8       0       0
2  2015-01-02    Friday  25674.456 -114675.6 120479.8       0       0
3  2015-01-03  Saturday -33588.871 -114675.6 120479.8       0       0
4  2015-01-04    Sunday -54418.175 -114675.6 120479.8       0       1
5  2015-01-05    Monday -10002.002 -114675.6 120479.8       0       1
6  2015-01-06   Tuesday  34050.390 -114675.6 120479.8       0       1
7  2015-01-07 Wednesday -37584.648 -114675.6 120479.8       0       1
8  2015-01-08  Thursday  84048.878 -114675.6 120479.8       0       1
9  2015-01-09    Friday -24801.346 -114675.6 120479.8       0       1
10 2015-01-10  Saturday  33974.637 -114675.6 120479.8       0       1
11 2015-01-11    Sunday  77432.088 -114675.6 120479.8       1       2
12 2015-01-12    Monday 128196.236 -114675.6 120479.8       1       2
13 2015-01-13   Tuesday   9740.418 -114675.6 120479.8       1       2
14 2015-01-14 Wednesday  26539.887 -114675.6 120479.8       1       2
15 2015-01-15  Thursday  12172.834 -114675.6 120479.8       1       2
16 2015-01-16    Friday   1032.544 -114675.6 120479.8       1       2
17 2015-01-17  Saturday  76870.095 -114675.6 120479.8       1       2

Which is satisfactory.

Post Status

Asked in February 2016
Viewed 2,801 times
Voted 14
Answered 2 times

Search




Leave an answer


Quote of the day: live life