Skip to content

ShuffledRowRDD

ShuffledRowRDD is an RDD of internal binary rows (RDD[InternalRow]) for execution of CollectLimitExec, CustomShuffleReaderExec, ShuffleExchangeExec and TakeOrderedAndProjectExec physical operators.

Note

ShuffledRowRDD is similar to Spark Core's ShuffledRDD, with the difference of the type of the values to process, i.e. InternalRow and (K, C) key-value pairs, respectively.

Creating Instance

ShuffledRowRDD takes the following to be created:

When created, ShuffledRowRDD reads the spark.sql.adaptive.fetchShuffleBlocksInBatch configuration property, and when enabled, sets the __fetch_continuous_blocks_in_batch_enabled local property to true.

ShuffledRowRDD is created when CollectLimitExec, CustomShuffleReaderExec, ShuffleExchangeExec and TakeOrderedAndProjectExec physical operators are executed.

Optional Partition Specs

ShuffledRowRDD is given an Optional Partition Specs when created.

When not given, it is assumed to use as many CoalescedPartitionSpecs as the number of partitions of ShuffleDependency (based on the Partitioner).

RDD Dependencies

getDependencies: Seq[Dependency[_]]

A single-element collection with ShuffleDependency[Int, InternalRow, InternalRow].

getDependencies is part of Spark Core's RDD abstraction.

Partitioner

partitioner: Option[Partitioner]

CoalescedPartitioner (with the Partitioner of the dependency)

partitioner is part of Spark Core's RDD abstraction.

Partitions

getPartitions: Array[Partition]

getPartitions...FIXME

getPartitions is part of Spark Core's RDD abstraction.

Computing Partition

compute(
  split: Partition,
  context: TaskContext): Iterator[InternalRow]

compute...FIXME

compute is part of Spark Core's RDD abstraction.

Preferred Locations of Partition

getPreferredLocations(
  partition: Partition): Seq[String]

getPreferredLocations...FIXME

getPreferredLocations is part of Spark Core's RDD abstraction.

Clearing Dependencies

clearDependencies(): Unit

clearDependencies simply requests the parent RDD to clearDependencies followed by clear the given <> (i.e. set to null).

clearDependencies is part of Spark Core's RDD abstraction.


Last update: 2020-11-07
Back to top