Mayank Raghav February 2016

How costly is unpersist operation on spark RDDs?

I was wondering, how costly can berdd.unpersist() operation on spark RDDs ?, and does Storage Level setting impact performance of this operation ? Any benchmark(results/technique) will be very help full.

Answers


Daniel Darabos February 2016

unpersist releases the RDD from cache (memory and disk) and deletes the shuffle files it depends on. For this it needs to send a message to the executors. It should be the cheapest operation you can do with an RDD — probably not worth benchmarking.

Note also that when an RDD is garbage collected, unpersist is automatically called on it. So you cannot avoid this cost anyway.

Post Status

Asked in February 2016
Viewed 1,180 times
Voted 12
Answered 1 times

Search




Leave an answer