I'm trying out a machine learning task (binary classification) using caret and was wondering if there is a way to incorporate information about "uncertain" class, or to weight the classes differently.
As an illustration, I've cut and paste some of the code from the caret homepage working with the Sonar dataset (placeholder code - could be anything):
testdat <- get(data(Sonar))
testdat$Source<-as.factor(sample(c(LETTERS[1:6],LETTERS[1:3]),nrow(testdat),replace = T))
A B C D E F
49 51 44 17 28 19
after which I would continue with a typical train,tune, and test routine once I decide on a model.
What I've added here is another factor column of a source, or where the corresponding "Class" came from. As an arbitrary example, say these were 6 different people who made their designation of "Class" using slightly different methods and I want to put greater importance on A's classification method than B's but less than C's and so forth.
The actual data are something like this, where there are class imbalances, both among the true/false, M/R, or whatever class, and among these Sources. From the vignettes and examples I have found, at least the former I would address by using a metric like ROC during tuning, but as to how to even incorporate the latter, I'm not sure.
separating the original data by Source and cycling through the factor
levels one at a time, using the current level to build a model and
the rest of the data to test it
instead of classification, turn it into a hybrid classification/regression problem, where I use the ranks of the sources as what I want to model. If A is considered best, then an "A positive" would get a score of +6, "A negative", a score of -6 and so on. Then perform a regression fit on these values, ignoring the Class column.