Skip to content


CatalogImpl is the Catalog in Spark SQL that...FIXME

.CatalogImpl uses SessionCatalog (through SparkSession) image::images/spark-sql-CatalogImpl.png[align="center"]

NOTE: CatalogImpl is in org.apache.spark.sql.internal package.

=== [[createTable]] Creating Table -- createTable Method

[source, scala]

createTable( tableName: String, source: String, schema: StructType, options: Map[String, String]): DataFrame


createTable is part of Catalog Contract to...FIXME.

=== [[getTable]] getTable Method

[source, scala]

getTable(tableName: String): Table getTable(dbName: String, tableName: String): Table

NOTE: getTable is part of Catalog Contract to...FIXME.


=== [[getFunction]] getFunction Method

[source, scala]

getFunction( functionName: String): Function getFunction( dbName: String, functionName: String): Function

NOTE: getFunction is part of Catalog Contract to...FIXME.


=== [[functionExists]] functionExists Method

[source, scala]

functionExists( functionName: String): Boolean functionExists( dbName: String, functionName: String): Boolean

NOTE: functionExists is part of Catalog Contract to...FIXME.


=== [[cacheTable]] Caching Table or View In-Memory -- cacheTable Method

[source, scala]

cacheTable(tableName: String): Unit

Internally, cacheTable first[creates a DataFrame for the table] followed by requesting CacheManager to cache it.

NOTE: cacheTable uses the[session-scoped SharedState] to access the CacheManager.

NOTE: cacheTable is part of Catalog contract.

=== [[clearCache]] Removing All Cached Tables From In-Memory Cache -- clearCache Method

[source, scala]

clearCache(): Unit

clearCache requests CacheManager to remove all cached tables from in-memory cache.

NOTE: clearCache is part of Catalog contract.

=== [[createExternalTable]] Creating External Table From Path -- createExternalTable Method

[source, scala]

createExternalTable(tableName: String, path: String): DataFrame createExternalTable(tableName: String, path: String, source: String): DataFrame createExternalTable( tableName: String, source: String, options: Map[String, String]): DataFrame createExternalTable( tableName: String, source: String, schema: StructType, options: Map[String, String]): DataFrame

createExternalTable creates an external table tableName from the given path and returns the corresponding[DataFrame].

[source, scala]

import org.apache.spark.sql.SparkSession val spark: SparkSession = ...

val readmeTable = spark.catalog.createExternalTable("readme", "", "text") readmeTable: org.apache.spark.sql.DataFrame = [value: string]

scala> spark.catalog.listTables.filter( == "readme").show +------+--------+-----------+---------+-----------+ | name|database|description|tableType|isTemporary| +------+--------+-----------+---------+-----------+ |readme| default| null| EXTERNAL| false| +------+--------+-----------+---------+-----------+

scala> sql("select count(*) as count from readme").show(false) +-----+ |count| +-----+ |99 | +-----+

The source input parameter is the name of the data source provider for the table, e.g. parquet, json, text. If not specified, createExternalTable uses spark.sql.sources.default setting to know the data source format.

NOTE: source input parameter must not be hive as it leads to a AnalysisException.

createExternalTable sets the mandatory path option when specified explicitly in the input parameter list.

createExternalTable parses tableName into TableIdentifier (using[SparkSqlParser]). It creates a CatalogTable and then[executes] (by toRDD) a[CreateTable] logical plan. The result[DataFrame] is a Dataset[Row] with the QueryExecution after executing[SubqueryAlias] logical plan and RowEncoder.

.CatalogImpl.createExternalTable image::images/spark-sql-CatalogImpl-createExternalTable.png[align="center"]

NOTE: createExternalTable is part of Catalog contract.

=== [[listTables]] Listing Tables in Database (as Dataset) -- listTables Method

[source, scala]

listTables(): Dataset[Table] listTables(dbName: String): Dataset[Table]

NOTE: listTables is part of Catalog Contract.

Internally, listTables requests <> to list all tables in the specified dbName database and <>.

In the end, listTables <> with the tables.

=== [[listColumns]] Listing Columns of Table (as Dataset) -- listColumns Method

[source, scala]

listColumns(tableName: String): Dataset[Column] listColumns(dbName: String, tableName: String): Dataset[Column]

NOTE: listColumns is part of Catalog Contract.

listColumns requests <> for the table metadata.

listColumns takes the schema from the table metadata and creates a Column for every field (with the optional comment as the description).

In the end, listColumns <> with the columns.

=== [[makeTable]] Converting TableIdentifier to Table -- makeTable Internal Method

[source, scala]

makeTable(tableIdent: TableIdentifier): Table

makeTable creates a Table using the input TableIdentifier and the table metadata (from the current SessionCatalog) if available.

NOTE: makeTable uses <> to access[SessionState] that is then used to access[SessionCatalog].

NOTE: makeTable is used when CatalogImpl is requested to <> or <>.

=== [[makeDataset]] Creating Dataset from DefinedByConstructorParams Data -- makeDataset Method

[source, scala]

makeDatasetT <: DefinedByConstructorParams: Dataset[T]

makeDataset creates an ExpressionEncoder (from DefinedByConstructorParams) and encodes elements of the input data to internal binary rows.

makeDataset then creates a[LocalRelation] logical operator. makeDataset requests SessionState to[execute the plan] and[creates] the result Dataset.

NOTE: makeDataset is used when CatalogImpl is requested to <>, <>, <> and <>

=== [[refreshTable]] Refreshing Analyzed Logical Plan of Table Query and Re-Caching It -- refreshTable Method

[source, scala]

refreshTable(tableName: String): Unit

refreshTable requests SessionState for the[SQL parser] to[parse a TableIdentifier given the table name].

NOTE: refreshTable uses <> to access the[SessionState].

refreshTable requests <> for the table metadata.

refreshTable then[creates a DataFrame for the table name].

For a temporary or persistent VIEW table, refreshTable requests the analyzed logical plan of the DataFrame (for the table) to[refresh] itself.

For other types of table, refreshTable requests <> for refreshing the table metadata (i.e. invalidating the table).

If the table <>, refreshTable requests CacheManager to uncache and cache the table DataFrame again.

NOTE: refreshTable uses <> to access the[SharedState] that is used to access CacheManager.

refreshTable is part of the Catalog abstraction.

=== [[refreshByPath]] refreshByPath Method

[source, scala]

refreshByPath(resourcePath: String): Unit


refreshByPath is part of the Catalog abstraction.

=== [[dropGlobalTempView]] dropGlobalTempView Method

[source, scala]

dropGlobalTempView( viewName: String): Boolean


dropGlobalTempView is part of the Catalog abstraction.

=== [[listColumns-internal]] listColumns Internal Method

[source, scala]

listColumns(tableIdentifier: TableIdentifier): Dataset[Column]


NOTE: listColumns is used exclusively when CatalogImpl is requested to <>.

Last update: 2020-11-15