From what I understand, streaming text files from a directory requires a key of type
LongWritable, a value of
Text, and a format of
TextInputFormat. These are passed automatically in the
Is the key in that case the line number, with the value being the text on that line?
What should the key and value types be for
ParquetInputFormat - and more generally, how can I figure this out for myself regarding other file types?
Also, how do these types relate to the
DStream that is returned by the method? If I pass a parquet file which has rows of, say, 100 columns, how will this be parsed into RDDs and DStreams by spark?