CatalogImpl¶
CatalogImpl
is the Catalog in Spark SQL that...FIXME
.CatalogImpl uses SessionCatalog (through SparkSession) image::images/spark-sql-CatalogImpl.png[align="center"]
NOTE: CatalogImpl
is in org.apache.spark.sql.internal
package.
=== [[createTable]] Creating Table -- createTable
Method
[source, scala]¶
createTable( tableName: String, source: String, schema: StructType, options: Map[String, String]): DataFrame
createTable
...FIXME
createTable
is part of Catalog Contract to...FIXME.
=== [[getTable]] getTable
Method
[source, scala]¶
getTable(tableName: String): Table getTable(dbName: String, tableName: String): Table
NOTE: getTable
is part of Catalog Contract to...FIXME.
getTable
...FIXME
=== [[getFunction]] getFunction
Method
[source, scala]¶
getFunction( functionName: String): Function getFunction( dbName: String, functionName: String): Function
NOTE: getFunction
is part of Catalog Contract to...FIXME.
getFunction
...FIXME
=== [[functionExists]] functionExists
Method
[source, scala]¶
functionExists( functionName: String): Boolean functionExists( dbName: String, functionName: String): Boolean
NOTE: functionExists
is part of Catalog Contract to...FIXME.
functionExists
...FIXME
=== [[cacheTable]] Caching Table or View In-Memory -- cacheTable
Method
[source, scala]¶
cacheTable(tableName: String): Unit¶
Internally, cacheTable
first SparkSession.md#table[creates a DataFrame for the table] followed by requesting CacheManager
to cache it.
NOTE: cacheTable
uses the SparkSession.md#sharedState[session-scoped SharedState] to access the CacheManager
.
NOTE: cacheTable
is part of Catalog contract.
=== [[clearCache]] Removing All Cached Tables From In-Memory Cache -- clearCache
Method
[source, scala]¶
clearCache(): Unit¶
clearCache
requests CacheManager
to remove all cached tables from in-memory cache.
NOTE: clearCache
is part of Catalog contract.
=== [[createExternalTable]] Creating External Table From Path -- createExternalTable
Method
[source, scala]¶
createExternalTable(tableName: String, path: String): DataFrame createExternalTable(tableName: String, path: String, source: String): DataFrame createExternalTable( tableName: String, source: String, options: Map[String, String]): DataFrame createExternalTable( tableName: String, source: String, schema: StructType, options: Map[String, String]): DataFrame
createExternalTable
creates an external table tableName
from the given path
and returns the corresponding DataFrame.
[source, scala]¶
import org.apache.spark.sql.SparkSession val spark: SparkSession = ...
val readmeTable = spark.catalog.createExternalTable("readme", "README.md", "text") readmeTable: org.apache.spark.sql.DataFrame = [value: string]
scala> spark.catalog.listTables.filter(_.name == "readme").show +------+--------+-----------+---------+-----------+ | name|database|description|tableType|isTemporary| +------+--------+-----------+---------+-----------+ |readme| default| null| EXTERNAL| false| +------+--------+-----------+---------+-----------+
scala> sql("select count(*) as count from readme").show(false) +-----+ |count| +-----+ |99 | +-----+
The source
input parameter is the name of the data source provider for the table, e.g. parquet, json, text. If not specified, createExternalTable
uses spark.sql.sources.default setting to know the data source format.
NOTE: source
input parameter must not be hive
as it leads to a AnalysisException
.
createExternalTable
sets the mandatory path
option when specified explicitly in the input parameter list.
createExternalTable
parses tableName
into TableIdentifier
(using sql/SparkSqlParser.md[SparkSqlParser]). It creates a CatalogTable and then SessionState.md#executePlan[executes] (by toRDD) a CreateTable.md[CreateTable] logical plan. The result DataFrame is a Dataset[Row]
with the QueryExecution after executing SubqueryAlias.md[SubqueryAlias] logical plan and RowEncoder.
.CatalogImpl.createExternalTable image::images/spark-sql-CatalogImpl-createExternalTable.png[align="center"]
NOTE: createExternalTable
is part of Catalog contract.
=== [[listTables]] Listing Tables in Database (as Dataset) -- listTables
Method
[source, scala]¶
listTables(): Dataset[Table] listTables(dbName: String): Dataset[Table]
NOTE: listTables
is part of Catalog Contract.
Internally, listTables
requests <dbName
database and <
In the end, listTables
<
=== [[listColumns]] Listing Columns of Table (as Dataset) -- listColumns
Method
[source, scala]¶
listColumns(tableName: String): Dataset[Column] listColumns(dbName: String, tableName: String): Dataset[Column]
NOTE: listColumns
is part of Catalog Contract.
listColumns
requests <
listColumns
takes the schema from the table metadata and creates a Column
for every field (with the optional comment as the description).
In the end, listColumns
<
=== [[makeTable]] Converting TableIdentifier to Table -- makeTable
Internal Method
[source, scala]¶
makeTable(tableIdent: TableIdentifier): Table¶
makeTable
creates a Table
using the input TableIdentifier
and the table metadata (from the current SessionCatalog) if available.
NOTE: makeTable
uses <
NOTE: makeTable
is used when CatalogImpl
is requested to <
=== [[makeDataset]] Creating Dataset from DefinedByConstructorParams Data -- makeDataset
Method
[source, scala]¶
makeDatasetT <: DefinedByConstructorParams: Dataset[T]
makeDataset
creates an ExpressionEncoder (from DefinedByConstructorParams) and encodes elements of the input data
to internal binary rows.
makeDataset
then creates a LocalRelation.md#creating-instance[LocalRelation] logical operator. makeDataset
requests SessionState
to SessionState.md#executePlan[execute the plan] and Dataset.md#creating-instance[creates] the result Dataset
.
NOTE: makeDataset
is used when CatalogImpl
is requested to <
=== [[refreshTable]] Refreshing Analyzed Logical Plan of Table Query and Re-Caching It -- refreshTable
Method
[source, scala]¶
refreshTable(tableName: String): Unit¶
refreshTable
requests SessionState
for the SessionState.md#sqlParser[SQL parser] to spark-sql-ParserInterface.md#parseTableIdentifier[parse a TableIdentifier given the table name].
NOTE: refreshTable
uses <
refreshTable
requests <
refreshTable
then SparkSession.md#table[creates a DataFrame for the table name].
For a temporary or persistent VIEW
table, refreshTable
requests the analyzed logical plan of the DataFrame (for the table) to spark-sql-LogicalPlan.md#refresh[refresh] itself.
For other types of table, refreshTable
requests <
If the table <refreshTable
requests CacheManager
to uncache and cache the table DataFrame
again.
NOTE: refreshTable
uses <
refreshTable
is part of the Catalog abstraction.
=== [[refreshByPath]] refreshByPath
Method
[source, scala]¶
refreshByPath(resourcePath: String): Unit¶
refreshByPath
...FIXME
refreshByPath
is part of the Catalog abstraction.
=== [[dropGlobalTempView]] dropGlobalTempView
Method
[source, scala]¶
dropGlobalTempView( viewName: String): Boolean
dropGlobalTempView
...FIXME
dropGlobalTempView
is part of the Catalog abstraction.
=== [[listColumns-internal]] listColumns
Internal Method
[source, scala]¶
listColumns(tableIdentifier: TableIdentifier): Dataset[Column]¶
listColumns
...FIXME
NOTE: listColumns
is used exclusively when CatalogImpl
is requested to <