Skip to content


[[Partitioning]] HashPartitioning is a Partitioning in which rows are distributed across partitions based on the <> of <> (modulo the <>).

[[creating-instance]] HashPartitioning takes the following to be created:

  • [[expressions]] Partitioning[expressions]
  • [[numPartitions]] Number of partitions

[[Unevaluable]][[Expression]] HashPartitioning is an Unevaluable Expression that[cannot be evaluated] (and produce a value given an internal row).

HashPartitioning uses the[MurMur3 Hash] to compute the <> for data distribution (consistent for shuffling and bucketing that is crucial for joins of bucketed and regular tables).

[source, scala]

val nums = spark.range(5) val numParts = 200 // the default number of partitions val partExprs = Seq(nums("id"))

val partitionIdExpression = pmod(hash(partExprs: _*), lit(numParts)) scala> partitionIdExpression.explain(extended = true) pmod(hash(id#32L, 42), 200)

val q = nums.withColumn("partitionId", partitionIdExpression) scala> +---+-----------+ | id|partitionId| +---+-----------+ | 0| 5| | 1| 69| | 2| 128| | 3| 107| | 4| 140| +---+-----------+

=== [[satisfies0]] satisfies0 Method

[source, scala]

satisfies0( required: Distribution): Boolean

satisfies0 is positive (true) when the following conditions all hold:

Otherwise, satisfies0 is negative (false).

satisfies0 is part of the Partitioning abstraction.

=== [[partitionIdExpression]] partitionIdExpression Method

[source, scala]

partitionIdExpression: Expression

partitionIdExpression creates (returns) a Pmod expression of a[Murmur3Hash] (with the <>) and a[Literal] (with the <>).


partitionIdExpression is used when:

  • BucketingUtils utility is used for getBucketIdFromValue (for bucketing support)

  • FileFormatWriter utility is used for write out a query result (for bucketing support)

* ShuffleExchangeExec utility is used to[prepare a ShuffleDependency]

Last update: 2020-11-07
Back to top