HashPartitioning¶
[[Partitioning]] HashPartitioning
is a Partitioning in which rows are distributed across partitions based on the <
[[creating-instance]] HashPartitioning
takes the following to be created:
- [[expressions]] Partitioning Expression.md[expressions]
- [[numPartitions]] Number of partitions
[[Unevaluable]][[Expression]] HashPartitioning
is an Unevaluable Expression that Expression.md#[cannot be evaluated] (and produce a value given an internal row).
HashPartitioning
uses the spark-sql-Expression-Murmur3Hash.md[MurMur3 Hash] to compute the <
[source, scala]¶
val nums = spark.range(5) val numParts = 200 // the default number of partitions val partExprs = Seq(nums("id"))
val partitionIdExpression = pmod(hash(partExprs: _*), lit(numParts)) scala> partitionIdExpression.explain(extended = true) pmod(hash(id#32L, 42), 200)
val q = nums.withColumn("partitionId", partitionIdExpression) scala> q.show +---+-----------+ | id|partitionId| +---+-----------+ | 0| 5| | 1| 69| | 2| 128| | 3| 107| | 4| 140| +---+-----------+
=== [[satisfies0]] satisfies0
Method
[source, scala]¶
satisfies0( required: Distribution): Boolean
satisfies0
is positive (true
) when the following conditions all hold:
-
The base satisfies0 holds
-
For an input HashClusteredDistribution, the number of the given <
> and the HashClusteredDistribution's are the same and semantically equal pair-wise -
For an input ClusteredDistribution, the given <
> are among the ClusteredDistribution's clustering expressions and they are semantically equal pair-wise
Otherwise, satisfies0
is negative (false
).
satisfies0
is part of the Partitioning abstraction.
=== [[partitionIdExpression]] partitionIdExpression
Method
[source, scala]¶
partitionIdExpression: Expression¶
partitionIdExpression
creates (returns) a Pmod
expression of a spark-sql-Expression-Murmur3Hash.md[Murmur3Hash] (with the <
[NOTE]¶
partitionIdExpression
is used when:
-
BucketingUtils
utility is used forgetBucketIdFromValue
(for bucketing support) -
FileFormatWriter
utility is used for write out a query result (for bucketing support)