Your answer is one click away!

bobo32 February 2016
### model.predictProbabilities() for LogisticRegression in Spark?

I'm running a multi-class **Logistic Regression (withLBFGS)** with Spark 1.6.

given **x** and possible labels **{1.0,2.0,3.0}**
the final model will **only** output what is the best prediction, say **2.0**.

If I'm interested to know what was the second best prediction, say **3.0**, how could I retrieve that information?

In NaiveBayes I would use the model.predictProbabilities() function which for each sample would output a vector with all the probabilities for each possible outcome.

Daniel Darabos February 2016

There are two ways to do logistic regression in Spark: `spark.ml`

and `spark.mllib`

.

With DataFrames you can use `spark.ml`

:

```
import org.apache.spark
import sqlContext.implicits._
def p(label: Double, a: Double, b: Double) =
new spark.mllib.regression.LabeledPoint(
label, new spark.mllib.linalg.DenseVector(Array(a, b)))
val data = sc.parallelize(Seq(p(1.0, 0.0, 0.5), p(0.0, 0.5, 1.0)))
val df = data.toDF
val model = new spark.ml.classification.LogisticRegression().fit(df)
model.transform(df).show
```

You get the raw predictions and probabilities:

```
+-----+---------+--------------------+--------------------+----------+
|label| features| rawPrediction| probability|prediction|
+-----+---------+--------------------+--------------------+----------+
| 1.0|[0.0,0.5]|[-19.037302860930...|[5.39764620520461...| 1.0|
| 0.0|[0.5,1.0]|[18.9861466274786...|[0.99999999431904...| 0.0|
+-----+---------+--------------------+--------------------+----------+
```

With RDDs you can use `spark.mllib`

:

```
val model = new spark.mllib.classification.LogisticRegressionWithLBFGS().run(data)
```

This model does not expose the raw predictions and probabilities. You can take a look at `predictPoint`

. It multiplies the vectors and picks the class with the highest prediction. The weights are publicly accessible, so you could copy that algorithm and save the predictions instead of just returning the highest one.

bobo32 February 2016

Following the suggestions from @Daniel Darabos:

- I tried to use the LogisticRegression function from
**ml**instead of**mllib**Unfortunately it doesn't support the multi-class logistic regression but only the binary one. - I took a look at PredictedPoint and modified it so that it prints all the probabilities for each class. Here it is what it looks like:

```
def predictPointForMulticlass(featurizedVector:Vector,weightsArray:Vector,intercept:Double,numClasses:Int,numFeatures:Int) : Seq[(String, Double)] = {
val weightsArraySize = weightsArray.size
val dataWithBiasSize = weightsArraySize / (numClasses - 1)
val withBias = false
var bestClass = 0
var maxMargin = 0.0
var margins = new Array[Double](numClasses - 1)
var temp_marginMap = new HashMap[Int, Double]()
var res = new HashMap[Int, Double]()
(0 until numClasses - 1).foreach { i =>
var margin = 0.0
var index = 0
featurizedVector.toArray.foreach(value => {
if (value != 0.0) {
margin += value * weightsArray((i * dataWithBiasSize) + index)
}
index += 1
}
)
// Intercept is required to be added into margin.
if (withBias) {
margin += weightsArray((i * dataWithBiasSize) + featurizedVector.size)
}
val prob = 1.0 / (1.0 + Math.exp(-margin))
margins(i) = margin
temp_marginMap += (i -> margin)
if(margin > maxMargin) {
maxMargin = margin
bestClass = i + 1
}
}
for ((k,v) <- temp_marginMap){
val calc =probCalc(maxMargin,v)
res += (k
```

```
```

```
```

```
```

```
```

```
```

```
```#### Post Status

Asked in February 2016

Viewed 1,775 times

Voted 9

Answered 2 times
#### Search

## Leave an answer

```
```

```
```

```
```# Quote of the day: live life