Each partition is written to it's own file. Empty partitions will be written as empty files.
In order to avoid writing the empty files you can either coalesce or repartition your RDD into a smaller number of partitions.
If you didn't expect to have empty partitions, it may be worth investigating why you have them. Empty partitions can happen either due to a filtering step which removed all the elements from some partitions, or due to a bad hash function. If the hashCode() for your RDD's elements doesn't distribute the elements well, it's possible to end up with an unbalanced RDD that has empty partitions.
Asked in February 2016Viewed 2,106 timesVoted 7Answered 1 times