Skip to content

AQEShuffleReadExec Unary Physical Operator

AQEShuffleReadExec is a unary physical operator in Adaptive Query Execution.

Creating Instance

AQEShuffleReadExec takes the following to be created:

AQEShuffleReadExec is created when the following adaptive physical optimizations are executed:

Performance Metrics

Key Name (in web UI) Description
numPartitions number of partitions
partitionDataSize partition data size
numSkewedPartitions number of skewed partitions
numSkewedSplits number of skewed partition splits
numCoalescedPartitions number of coalesced partitions
Lazy Value

metrics is a Scala lazy value to guarantee that the code to initialize it is executed once only (when accessed for the first time) and the computed value never changes afterwards.

Learn more in the Scala Language Specification.

metrics is part of the SparkPlan abstraction.

Child ShuffleQueryStageExec

shuffleStage: Option[ShuffleQueryStageExec]

AQEShuffleReadExec is given a child physical operator when created.

When requested for a ShuffleQueryStageExec, AQEShuffleReadExec returns the child physical operator (if that is its type or returns None).

shuffleStage is used when:

Shuffle RDD

shuffleRDD: RDD[_]

shuffleRDD updates the performance metrics and requests the shuffleStage for the ShuffleExchangeLike that in turn is requested for the shuffle RDD (with the ShufflePartitionSpecs).

Lazy Value

shuffleRDD is a Scala lazy value to guarantee that the code to initialize it is executed once only (when accessed for the first time) and the computed value never changes afterwards.

Learn more in the Scala Language Specification.

shuffleRDD is used when:

Updating Performance Metrics

sendDriverMetrics(): Unit

sendDriverMetrics posts a SparkListenerDriverAccumUpdates (with the query execution id and performance metrics).

Partition Data Sizes

partitionDataSizes: Option[Seq[Long]]

partitionDataSizes...FIXME

Lazy Value

partitionDataSizes is a Scala lazy value to guarantee that the code to initialize it is executed once only (when accessed for the first time) and the computed value never changes afterwards.

Learn more in the Scala Language Specification.

Executing Physical Operator

doExecute(): RDD[InternalRow]

doExecute returns the Shuffle RDD.

doExecute is part of the SparkPlan abstraction.

Columnar Execution

doExecuteColumnar(): RDD[ColumnarBatch]

doExecuteColumnar returns the Shuffle RDD.

doExecuteColumnar is part of the SparkPlan abstraction.

Node Arguments

stringArgs: Iterator[Any]

stringArgs is one of the following:

stringArgs is part of the TreeNode abstraction.

isLocalRead

isLocalRead: Boolean

isLocalRead indicates whether either PartialMapperPartitionSpec or CoalescedMapperPartitionSpec are among the partition specs or not.

isLocalRead is used when:

isCoalescedRead

isCoalescedRead: Boolean

isCoalescedRead indicates coalesced shuffle read and is whether the partition specs are all CoalescedPartitionSpecs pair-wise (with the endReducerIndex and startReducerIndex being adjacent) or not.

isCoalescedRead is used when:

Back to top