ContinuousDataSourceRDD is a specialized
RDD[InternalRow]) that is used exclusively for the only input RDD (with the input rows) of
DataSourceV2ScanExec leaf physical operator with a <
ContinuousDataSourceRDD is <
DataSourceV2ScanExec leaf physical operator is requested for the input RDDs (which there is only one actually).
ContinuousDataSourceRDD uses spark.sql.streaming.continuous.executorQueueSize configuration property for the <
ContinuousDataSourceRDD uses spark.sql.streaming.continuous.executorPollIntervalMs configuration property for the <
ContinuousDataSourceRDD takes the following to be created:
- [[dataQueueSize]] Size of the data queue
InputPartition (of a
ContinuousDataSourceRDDPartition) for preferred host locations (where the input partition reader can run faster).
=== [[compute]] Computing Partition --
compute( split: Partition, context: TaskContext): Iterator[InternalRow]
compute is part of the RDD Contract to compute a given partition.
getPartitions is part of the
RDD Contract to specify the partitions to <