jc1012 February 2016

Scikit-learn: Predicting new raw and unscaled instance using models trained with scaled data

I am new to the scikit-learn library of Python. As of now, I have produced different classifier models using the library and this has been smooth-sailing. Due to differences of units in the data (I got the data from different sensors labeled by their corresponding categories), I opted to scale the features using the StandardScale module.

Resulting accuracy scores of the different machine learning classifiers were fine. However, when I try to use the model to predict a raw instance (meaning unscaled) of sensor values, the models output wrong classification.

Should this really be the case because of the scaling done to the training data? If so, is there an easy way to scale the raw values too? I would like to use model persistence for this using joblib and it would be appreciated if there is a way to make this as modular as possible. Meaning to say, not to record mean and standard variation for each feature every time the training data changes.

Thank you very much!

Answers


lejlot February 2016

Should this really be the case because of the scaling done to the training data?

Yes, this is expected behavior. You trained your model on scaled data, thus it will only work with scaled data.

If so, is there an easy way to scale the raw values too?

Yes, just save your scaler.

# Training
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
...
# do some training, probably save classifier, and save scaler too!

then

# Testing
# load scaler
scaled_instances = scaler.transform(raw_instances)

Meaning to say, not to record mean and standard variation for each feature every time the training data changes

This is exactly what you have to do, although not by hand (as this is what scaler computes), but essentialy "under the hood" this is what happens - you have to store means/stds for each feature.

Post Status

Asked in February 2016
Viewed 3,667 times
Voted 4
Answered 1 times

Search




Leave an answer