zhyan February 2016

Decision tree in r

my dataset is :

x=data.frame(v1=c(97 ,  97 ,  85 ,  84 ,  90 ,  80 ,  81 ,  90 ,  80,    70,    90 ,   90,    90    ,95  ,  88 ,   99),
+ v2=c(99  , 91  , 91   ,83  , 99  , 95  , 74  , 88  , 82   , 80   , 96  ,  87  ,  92 ,   96  ,  88,    95),
+ v3=c( 89   ,93  , 87  , 80  , 96  , 96  , 75  , 90  , 78,    86  ,  92    ,88  ,  80,    88   , 98    ,98),
+ v4=c( 89  , 97   ,91  , 86  , 95 ,  95  , 89 ,  88  , 75,    82   , 99,    92  ,  95,    92   , 90,    98),
+ v5=c( 99   ,90  , 93   ,91  , 90  , 90  , 77  , 92  , 85,    76  ,  90,    96  ,  90,    90   , 90,    92))
> x
   v1 v2 v3 v4 v5
1  97 99 89 89 99
2  97 91 93 97 90
3  85 91 87 91 93
4  84 83 80 86 91
5  90 99 96 95 90
6  80 95 96 95 90
7  81 74 75 89 77
8  90 88 90 88 92
9  80 82 78 75 85
10 70 80 86 82 76
11 90 96 92 99 90
12 90 87 88 92 96
13 90 92 80 95 90
14 95 96 88 92 90
15 88 88 98 90 90
16 99 95 98 98 92

I used rpart package to apply decision tree as follows :

# Classification Tree with rpart
library(rpart)
fit <- rpart(v5 ~ v1+v2+v3+v4,
              method="class", data=x)

printcp(fit) # display the results 

Classification tree:
rpart(formula = v5 ~ v1 + v2 + v3 + v4, data = x, method = "class")

Variables actually used in tree construction:
character(0)

Root node error: 9/16 = 0.5625

n= 16 

    CP nsplit rel error xerror xstd
1 0.01      0         1      0    0


> summary(fit) # detailed summary of splits

Call:
rpart(formula = v5 ~ v1 + v2 + v3 + v4, data = x, method = "class")
  n= 16 

    CP nsplit rel error xerror xstd
1 0.01      0         1      0    0

Node number 1: 16 observations
  predicted class=90  expected loss=0.5625  P(node) =1
    class counts:     1     1     1     7     1     2     1     1     1
   probabilities: 0.062 0.062 0.062 0.438 0.062 0.125 0.062 0.062 0.062 

plot tree

 # plot tree 
 plot(fit, uniform=TRUE, 
+      main="Classification Tr        

Answers


Ryan Caldwell February 2016

You use method="class" if you are building a classification tree and method="anova" if you are building a regression tree. It looks like you have a continuous response, so you should be building a regression tree (i.e. method="anova").

Post Status

Asked in February 2016
Viewed 3,044 times
Voted 10
Answered 1 times

Search




Leave an answer