Skip to content

RDDConversions Helper Object

RDDConversions is a Scala object that is used to <> and <> methods.

=== [[productToRowRdd]] productToRowRdd Method

[source, scala]

productToRowRddA <: Product: RDD[InternalRow]

productToRowRdd...FIXME

NOTE: productToRowRdd is used when...FIXME

=== [[rowToRowRdd]] Converting Scala Objects In Rows to Values Of Catalyst Types -- rowToRowRdd Method

[source, scala]

rowToRowRdd(data: RDD[Row], outputTypes: Seq[DataType]): RDD[InternalRow]

rowToRowRdd maps over partitions of the input RDD[Row] (using RDD.mapPartitions operator) that creates a MapPartitionsRDD with a "map" function.

TIP: Use RDD.toDebugString to see the additional MapPartitionsRDD in an RDD lineage.

The "map" function takes a Scala Iterator of Row objects and does the following:

  1. Creates a GenericInternalRow (of the size that is the number of columns per the input Seq[DataType])

  2. Creates a converter function for every DataType in Seq[DataType]

  3. For every Row object in the partition (iterator), applies the converter function per position and adds the result value to the GenericInternalRow

  4. In the end, returns a GenericInternalRow for every row

rowToRowRdd is used when DataSourceStrategy execution planning strategy is executed (and requested to toCatalystRDD).


Last update: 2021-04-10