Skip to content

HashPartitioning

[[Partitioning]] HashPartitioning is a Partitioning in which rows are distributed across partitions based on the <> of <> (modulo the <>).

[[creating-instance]] HashPartitioning takes the following to be created:

  • [[expressions]] Partitioning Expression.md[expressions]
  • [[numPartitions]] Number of partitions

[[Unevaluable]][[Expression]] HashPartitioning is an Unevaluable Expression that Expression.md#[cannot be evaluated] (and produce a value given an internal row).

HashPartitioning uses the spark-sql-Expression-Murmur3Hash.md[MurMur3 Hash] to compute the <> for data distribution (consistent for shuffling and bucketing that is crucial for joins of bucketed and regular tables).

[source, scala]

val nums = spark.range(5) val numParts = 200 // the default number of partitions val partExprs = Seq(nums("id"))

val partitionIdExpression = pmod(hash(partExprs: _*), lit(numParts)) scala> partitionIdExpression.explain(extended = true) pmod(hash(id#32L, 42), 200)

val q = nums.withColumn("partitionId", partitionIdExpression) scala> q.show +---+-----------+ | id|partitionId| +---+-----------+ | 0| 5| | 1| 69| | 2| 128| | 3| 107| | 4| 140| +---+-----------+


=== [[satisfies0]] satisfies0 Method

[source, scala]

satisfies0( required: Distribution): Boolean


satisfies0 is positive (true) when the following conditions all hold:

Otherwise, satisfies0 is negative (false).

satisfies0 is part of the Partitioning abstraction.

=== [[partitionIdExpression]] partitionIdExpression Method

[source, scala]

partitionIdExpression: Expression

partitionIdExpression creates (returns) a Pmod expression of a spark-sql-Expression-Murmur3Hash.md[Murmur3Hash] (with the <>) and a spark-sql-Expression-Literal.md[Literal] (with the <>).

[NOTE]

partitionIdExpression is used when:

  • BucketingUtils utility is used for getBucketIdFromValue (for bucketing support)

  • FileFormatWriter utility is used for write out a query result (for bucketing support)

* ShuffleExchangeExec utility is used to ShuffleExchangeExec.md#prepareShuffleDependency[prepare a ShuffleDependency]


Last update: 2020-11-07