Dynamic Partition Pruning¶
New in 3.0.0
Dynamic Partition Pruning (DPP) is an optimization of JOIN queries of partitioned tables using partition columns in a join condition. The idea is to push filter conditions down to the large fact table and reduce the number of rows to scan.
The best results are expected in JOIN queries between a large fact table and a much smaller dimension table (star-schema queries).
Dynamic Partition Pruning is applied to a query at logical optimization phase using PartitionPruning and CleanupDynamicPruningFilters optimization rules.
Dynamic Partition Pruning optimization is controlled by spark.sql.optimizer.dynamicPartitionPruning.enabled configuration property.
References¶
Articles¶
Videos¶
- Dynamic Partition Pruning in Apache Spark
- Apache Spark 3 | New Feature | Performance Optimization | Dynamic Partition Pruning
- Dynamic Partition Pruning | Spark Performance Tuning by Harjeet (aka Data Savvy)
Last update: 2020-11-07