Skip to content

Partitioning (Catalyst)

Partitioning is an abstraction of partitioning specifications that hint the Spark Physical Optimizer about the number of partitions and data distribution of the output of a physical operator.

Contract

Number of Partitions

numPartitions: Int

Used when:

Implementations

BroadcastPartitioning

BroadcastPartitioning(
  mode: BroadcastMode)

numPartitions: 1

satisfies:

Created when:

DataSourcePartitioning

An adapter for the Connector Partitioning to this Catalyst Partitioning

Created with:

numPartitions:

satisfies:

One of the following:

  1. satisfies0
  2. FIXME

Created when:

HashPartitioning

HashPartitioning

numPartitions: the given numPartitions

satisfies:

PartitioningCollection

PartitioningCollection(
  partitionings: Seq[Partitioning])

compatibleWith: Any Partitioning that is compatible with one of the input partitionings

guarantees: Any Partitioning that is guaranteed by any of the input partitionings

numPartitions: Number of partitions of the first Partitioning in the input partitionings

satisfies: Any Distribution that is satisfied by any of the input partitionings

RangePartitioning

RangePartitioning

compatibleWith: RangePartitioning when semantically equal (i.e. underlying expressions are deterministic and canonically equal)

guarantees: RangePartitioning when semantically equal (i.e. underlying expressions are deterministic and canonically equal)

numPartitions: the given numPartitions

satisfies:

RoundRobinPartitioning

RoundRobinPartitioning(
  numPartitions: Int)

compatibleWith: Always false

guarantees: Always false

numPartitions: the given numPartitions

satisfies: UnspecifiedDistribution

SinglePartition

compatibleWith: Any Partitioning with one partition

guarantees: Any Partitioning with one partition

numPartitions: 1

satisfies: Any Distribution except BroadcastDistribution

UnknownPartitioning

UnknownPartitioning(
  numPartitions: Int)

compatibleWith: Always false

guarantees: Always false

numPartitions: the given numPartitions

satisfies: UnspecifiedDistribution

Satisfying Distribution

satisfies(
  required: Distribution): Boolean

satisfies is true when all the following hold:

  1. The optional required number of partitions of the given Distribution is the number of partitions of this Partitioning
  2. satisfies0 holds
Final Method

satisfies is a Scala final method and may not be overridden in subclasses.

Learn more in the Scala Language Specification.

satisfies is used when:

satisfies0

satisfies0(
  required: Distribution): Boolean

satisfies0 is true when either holds:

Note

satisfies0 can be overriden by subclasses if needed (to influence the final satisfies).

Back to top