Skip to content

EliminateResolvedHint Logical Optimization

EliminateResolvedHint is a default logical optimization.

Non-Excludable Rule

EliminateResolvedHint is a non-excludable rule.

Executing Rule

apply(
  plan: LogicalPlan): LogicalPlan

apply is part of the Rule abstraction.

apply transforms Join logical operators with no hints defined in the given LogicalPlan:

  1. Extracts hints from the left and right sides of the join (that gives new operators and JoinHints for either side)

  2. Creates a new JoinHint with the hints merged for the left and right sides

  3. Creates a new Join logical operator with the new left and right operators and the new JoinHint

In the end, apply finds ResolvedHints and, if found, requests the HintErrorHandler to joinNotFoundForJoinHint and ignores the hint (returns the child of the ResolvedHint).

HintErrorHandler

hintErrorHandler: HintErrorHandler

hintErrorHandler is the default HintErrorHandler.

Extracting Hints from Logical Plan

extractHintsFromPlan(
  plan: LogicalPlan): (LogicalPlan, Seq[HintInfo])

extractHintsFromPlan collects (extracts) HintInfos from the ResolvedHint unary logical operators in the given LogicalPlan and gives:

While collecting, extractHintsFromPlan removes the ResolvedHint unary logical operators.

Note

It is possible (yet still unclear) that some ResolvedHints won't get extracted.

extractHintsFromPlan is used when:

Merging Hints

mergeHints(
  hints: Seq[HintInfo]): Option[HintInfo]

mergeHints...FIXME

Demo

Create a logical plan using Catalyst DSL.

import org.apache.spark.sql.catalyst.dsl.plans._
import org.apache.spark.sql.catalyst.plans.logical.{SHUFFLE_HASH, SHUFFLE_MERGE}
import org.apache.spark.sql.catalyst.plans.logical.LocalRelation
val t1 = LocalRelation('id.long, 'name.string).hint(SHUFFLE_HASH.displayName)
val t2 = LocalRelation('id.long, 'age.int).hint(SHUFFLE_MERGE.displayName)
val logical = t1.join(t2)
scala> println(logical.numberedTreeString)
00 'Join Inner
01 :- 'UnresolvedHint shuffle_hash
02 :  +- LocalRelation <empty>, [id#0L, name#1]
03 +- 'UnresolvedHint merge
04    +- LocalRelation <empty>, [id#2L, age#3]

Analyze the plan.

val analyzed = logical.analyze
scala> println(analyzed.numberedTreeString)
00 Join Inner
01 :- ResolvedHint (strategy=shuffle_hash)
02 :  +- LocalRelation <empty>, [id#0L, name#1]
03 +- ResolvedHint (strategy=merge)
04    +- LocalRelation <empty>, [id#2L, age#3]

Optimize the plan (using EliminateResolvedHint only).

import org.apache.spark.sql.catalyst.optimizer.EliminateResolvedHint
val optimizedPlan = EliminateResolvedHint(analyzed)
scala> println(optimizedPlan.numberedTreeString)
00 Join Inner, leftHint=(strategy=shuffle_hash), rightHint=(strategy=merge)
01 :- LocalRelation <empty>, [id#0L, name#1]
02 +- LocalRelation <empty>, [id#2L, age#3]

Last update: 2021-05-11