Skip to content

BucketSpec

[[creating-instance]] BucketSpec is the bucketing specification of a table, i.e. the metadata of the bucketing of a table.

BucketSpec includes the following:

  • [[numBuckets]] Number of buckets
  • [[bucketColumnNames]] Bucket column names - the names of the columns used for buckets (at least one)
  • [[sortColumnNames]] Sort column names - the names of the columns used to sort data in buckets

The <> has to be between 0 and 100000 exclusive (or an AnalysisException is thrown).

BucketSpec is <> when:

. DataFrameWriter is requested to saveAsTable (and does getBucketSpec)

. HiveExternalCatalog is requested to getBucketSpecFromTableProperties and tableMetaToTableProps

. HiveClientImpl is requested to hive/HiveClientImpl.md#getTableOption[retrieve a table metadata]

. SparkSqlAstBuilder is requested to spark-sql-SparkSqlAstBuilder.md#visitBucketSpec[visitBucketSpec] (for CREATE TABLE SQL statement with CLUSTERED BY and INTO n BUCKETS with optional SORTED BY clauses)

[[toString]] BucketSpec uses the following text representation (i.e. toString):

[numBuckets] buckets, bucket columns: [[bucketColumnNames]], sort columns: [[sortColumnNames]]

[source, scala]

import org.apache.spark.sql.catalyst.catalog.BucketSpec val bucketSpec = BucketSpec( numBuckets = 8, bucketColumnNames = Seq("col1"), sortColumnNames = Seq("col2")) scala> println(bucketSpec) 8 buckets, bucket columns: [col1], sort columns: [col2]


=== [[toLinkedHashMap]] Converting Bucketing Specification to LinkedHashMap -- toLinkedHashMap Method

[source, scala]

toLinkedHashMap: mutable.LinkedHashMap[String, String]

toLinkedHashMap converts the bucketing specification to a collection of pairs (LinkedHashMap[String, String]) with the following fields and their values:

  • Num Buckets with the <>
  • Bucket Columns with the <>
  • Sort Columns with the <>

toLinkedHashMap quotes the column names.

[source, scala]

scala> println(bucketSpec.toLinkedHashMap) Map(Num Buckets -> 8, Bucket Columns -> [col1], Sort Columns -> [col2])


toLinkedHashMap is used when:

  • CatalogTable is requested for toLinkedHashMap

  • DescribeTableCommand logical command is <> with a non-empty <> and the <> flag on (that uses <>).


Last update: 2021-03-18
Back to top