Skip to content

HiveTableRelation Leaf Logical Operator -- Representing Hive Tables in Logical Plan

HiveTableRelation is a ../LeafNode.md[leaf logical operator] that represents a Hive table in a ../spark-sql-LogicalPlan.md[logical query plan].

HiveTableRelation is <> when FindDataSourceTable logical evaluation rule is requested to resolve UnresolvedCatalogRelations in a logical plan (for Hive tables).

NOTE: HiveTableRelation can be RelationConversions.md#convert[converted to a HadoopFsRelation] based on spark.sql.hive.convertMetastoreParquet and spark.sql.hive.convertMetastoreOrc properties (and "disappears" from a logical plan when enabled).

HiveTableRelation is <> when it has at least one <>.

[[MultiInstanceRelation]] HiveTableRelation is a MultiInstanceRelation.

HiveTableRelation is converted (resolved) to as follows:

  • HiveTableScanExec.md[HiveTableScanExec] physical operator in HiveTableScans.md[HiveTableScans] execution planning strategy

  • InsertIntoHiveTable.md[InsertIntoHiveTable] command in HiveAnalysis.md[HiveAnalysis] logical resolution rule

val tableName = "h1"

// Make the example reproducible
val db = spark.catalog.currentDatabase
import spark.sharedState.{externalCatalog => extCatalog}
extCatalog.dropTable(
  db, table = tableName, ignoreIfNotExists = true, purge = true)

// sql("CREATE TABLE h1 (id LONG) USING hive")
import org.apache.spark.sql.types.StructType
spark.catalog.createTable(
  tableName,
  source = "hive",
  schema = new StructType().add($"id".long),
  options = Map.empty[String, String])

val h1meta = extCatalog.getTable(db, tableName)
scala> println(h1meta.provider.get)
hive

// Looks like we've got the testing space ready for the experiment
val h1 = spark.table(tableName)

import org.apache.spark.sql.catalyst.dsl.plans._
val plan = table(tableName).insertInto("t2", overwrite = true)
scala> println(plan.numberedTreeString)
00 'InsertIntoTable 'UnresolvedRelation `t2`, true, false
01 +- 'UnresolvedRelation `h1`

// ResolveRelations logical rule first to resolve UnresolvedRelations
import spark.sessionState.analyzer.ResolveRelations
val rrPlan = ResolveRelations(plan)
scala> println(rrPlan.numberedTreeString)
00 'InsertIntoTable 'UnresolvedRelation `t2`, true, false
01 +- 'SubqueryAlias h1
02    +- 'UnresolvedCatalogRelation `default`.`h1`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

// FindDataSourceTable logical rule next to resolve UnresolvedCatalogRelations
import org.apache.spark.sql.execution.datasources.FindDataSourceTable
val findTablesRule = new FindDataSourceTable(spark)
val planWithTables = findTablesRule(rrPlan)

// At long last...
// Note HiveTableRelation in the logical plan
scala> println(planWithTables.numberedTreeString)
00 'InsertIntoTable 'UnresolvedRelation `t2`, true, false
01 +- SubqueryAlias h1
02    +- HiveTableRelation `default`.`h1`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [id#13L]

The metadata of a HiveTableRelation (in a catalog) has to meet the requirements:

[[output]] HiveTableRelation has the output attributes made up of <> followed by <> columns.

=== [[computeStats]] Computing Statistics -- computeStats Method

[source, scala]

computeStats(): Statistics

NOTE: computeStats is part of ../LeafNode.md#computeStats[LeafNode Contract] to compute statistics for ../spark-sql-cost-based-optimization.md[cost-based optimizer].

computeStats takes the table statistics from the <> if defined and ../spark-sql-CatalogStatistics.md#toPlanStats[converts them to Spark statistics] (with <>).

If the table statistics are not available, computeStats reports an IllegalStateException.

table stats must be specified.

Creating Instance

HiveTableRelation takes the following when created:

  • [[tableMeta]] Table metadata
  • [[dataCols]] Columns (as a collection of AttributeReferences)
  • [[partitionCols]] Partition columns (as a collection of AttributeReferences)

=== [[partition-columns]] Partition Columns

When created, HiveTableRelation is given the <>.

FindDataSourceTable.mdFindDataSourceTable logical evaluation rule creates a HiveTableRelation based on a table specification (from a catalog).

The <> are exactly partitions of the table specification.

=== [[isPartitioned]] isPartitioned Method

[source, scala]

isPartitioned: Boolean

isPartitioned is true when there is at least one <>.

[NOTE]

isPartitioned is used when:

  • HiveMetastoreCatalog is requested to HiveMetastoreCatalog.md#convertToLogicalRelation[convert a HiveTableRelation to a LogicalRelation over a HadoopFsRelation]

  • RelationConversions.md[RelationConversions] logical posthoc evaluation rule is executed (on a RelationConversions.md#apply-InsertIntoTable[InsertIntoTable])

* HiveTableScanExec physical operator is HiveTableScanExec.md#doExecute[executed]


Last update: 2021-03-11