Yura P. February 2016

SQL queries in RDD

I must operate with RDD by Scala/Spark methods and by SQL queries.

Is it possible to operate with RDD directly via SQL queries?

The proposed ways (schemaRDD or DataFrame) require extra memory leakage.

After such a transformation I have in the memory two identical huge objects.

Answers


eliasah February 2016

Yes, in a way, you may be able to do so. But you'll need to create your own version of DataFrame.

DataFrame is an abstraction over RDDs. Nevertheless, joins, filters, etc. the features that you find with Spark-SQL are optimized with DataFrames but they were made on RDDs first.

Post Status

Asked in February 2016
Viewed 2,343 times
Voted 12
Answered 1 times

Search




Leave an answer