I was wondering, how costly can berdd.unpersist() operation on spark RDDs ?, and does Storage Level setting impact performance of this operation ? Any benchmark(results/technique) will be very help full.
unpersist releases the RDD from cache (memory and disk) and deletes the shuffle files it depends on. For this it needs to send a message to the executors. It should be the cheapest operation you can do with an RDD — probably not worth benchmarking.
Note also that when an RDD is garbage collected, unpersist is automatically called on it. So you cannot avoid this cost anyway.
Asked in February 2016Viewed 1,180 timesVoted 12Answered 1 times