Skip to content

DataSourceV2Strategy Execution Planning Strategy

DataSourceV2Strategy is an execution planning strategy that SparkPlanner uses to <> (from the DataSource V2).

[[logical-operators]] .DataSourceV2Strategy's Execution Planning [cols="1,1",options="header",width="100%"] |=== | Logical Operator | Physical Operator

| <> | <>

| <> | <>

| <> | <>

| <> with <> | <>

| <> | WriteToContinuousDataSourceExec

| <> with a StreamingDataSourceV2Relation and a ContinuousReader | ContinuousCoalesceExec |===

[[logging]] [TIP] ==== Enable INFO logging level for org.apache.spark.sql.execution.datasources.v2.DataSourceV2Strategy logger to see what happens inside.

Add the following line to conf/log4j.properties:

log4j.logger.org.apache.spark.sql.execution.datasources.v2.DataSourceV2Strategy=INFO

Refer to spark-logging.md[Logging].

Applying DataSourceV2Strategy Strategy to Logical Plan

apply(
  plan: LogicalPlan): Seq[SparkPlan]

apply branches off per the given <>.

apply is part of GenericStrategy abstraction.

==== [[apply-DataSourceV2Relation]] DataSourceV2Relation Logical Operator

For a <> logical operator, apply...FIXME

apply then <> followed by <>.

apply prints out the following INFO message to the logs:

Pushing operators to [ClassName of DataSourceV2]
Pushed Filters: [pushedFilters]
Post-Scan Filters: [postScanFilters]
Output: [output]

apply uses the DataSourceV2Relation to create a <> physical operator.

If there are any postScanFilters, apply creates a <> physical operator with the DataSourceV2ScanExec physical operator as the child.

In the end, apply creates a <> physical operator with the FilterExec with the DataSourceV2ScanExec or directly with the DataSourceV2ScanExec physical operator.

==== [[apply-StreamingDataSourceV2Relation]] StreamingDataSourceV2Relation Logical Operator

For a StreamingDataSourceV2Relation logical operator, apply...FIXME

==== [[apply-WriteToDataSourceV2]] WriteToDataSourceV2 Logical Operator

For a <> logical operator, apply simply creates a <> physical operator.

==== [[apply-AppendData]] AppendData Logical Operator

For a <> logical operator with a <>, apply requests the <> to <> that is used to create a <> physical operator.

==== [[apply-WriteToContinuousDataSource]] WriteToContinuousDataSource Logical Operator

For a WriteToContinuousDataSource logical operator, apply...FIXME

==== [[apply-Repartition]] Repartition Logical Operator

For a Repartition logical operator, apply...FIXME

=== [[pushFilters]] pushFilters Internal Method

[source, scala]

pushFilters( reader: DataSourceReader, filters: Seq[Expression]): (Seq[Expression], Seq[Expression])


pushFilters...FIXME

In the end, pushFilters returns a pair of filters pushed and not.

NOTE: pushFilters is used exclusively when DataSourceV2Strategy execution planning strategy is <> (applied to a <> logical operator).

=== [[pruneColumns]] Column Pruning -- pruneColumns Internal Method

[source, scala]

pruneColumns( reader: DataSourceReader, relation: DataSourceV2Relation, exprs: Seq[Expression]): Seq[AttributeReference]


pruneColumns...FIXME

NOTE: pruneColumns is used when...FIXME


Last update: 2020-11-13