HiveTableRelation Leaf Logical Operator -- Representing Hive Tables in Logical Plan

HiveTableRelation is a ../[leaf logical operator] that represents a Hive table in a ../[logical query plan].

HiveTableRelation is <> when FindDataSourceTable logical evaluation rule is requested to resolve UnresolvedCatalogRelations in a logical plan (for Hive tables).

NOTE: HiveTableRelation can be[converted to a HadoopFsRelation] based on spark.sql.hive.convertMetastoreParquet and spark.sql.hive.convertMetastoreOrc properties (and "disappears" from a logical plan when enabled).

HiveTableRelation is <> when it has at least one <>.

[[MultiInstanceRelation]] HiveTableRelation is a ../[MultiInstanceRelation].

HiveTableRelation is converted (resolved) to as follows:

  •[HiveTableScanExec] physical operator in[HiveTableScans] execution planning strategy

  •[InsertIntoHiveTable] command in[HiveAnalysis] logical resolution rule

val tableName = "h1"

// Make the example reproducible
val db = spark.catalog.currentDatabase
import spark.sharedState.{externalCatalog => extCatalog}
  db, table = tableName, ignoreIfNotExists = true, purge = true)

// sql("CREATE TABLE h1 (id LONG) USING hive")
import org.apache.spark.sql.types.StructType
  source = "hive",
  schema = new StructType().add($"id".long),
  options = Map.empty[String, String])

val h1meta = extCatalog.getTable(db, tableName)
scala> println(h1meta.provider.get)

// Looks like we've got the testing space ready for the experiment
val h1 = spark.table(tableName)

import org.apache.spark.sql.catalyst.dsl.plans._
val plan = table(tableName).insertInto("t2", overwrite = true)
scala> println(plan.numberedTreeString)
00 'InsertIntoTable 'UnresolvedRelation `t2`, true, false
01 +- 'UnresolvedRelation `h1`

// ResolveRelations logical rule first to resolve UnresolvedRelations
import spark.sessionState.analyzer.ResolveRelations
val rrPlan = ResolveRelations(plan)
scala> println(rrPlan.numberedTreeString)
00 'InsertIntoTable 'UnresolvedRelation `t2`, true, false
01 +- 'SubqueryAlias h1
02    +- 'UnresolvedCatalogRelation `default`.`h1`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

// FindDataSourceTable logical rule next to resolve UnresolvedCatalogRelations
import org.apache.spark.sql.execution.datasources.FindDataSourceTable
val findTablesRule = new FindDataSourceTable(spark)
val planWithTables = findTablesRule(rrPlan)

// At long last...
// Note HiveTableRelation in the logical plan
scala> println(planWithTables.numberedTreeString)
00 'InsertIntoTable 'UnresolvedRelation `t2`, true, false
01 +- SubqueryAlias h1
02    +- HiveTableRelation `default`.`h1`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [id#13L]

The metadata of a HiveTableRelation (in a catalog) has to meet the requirements:

[[output]] HiveTableRelation has the output attributes made up of <> followed by <> columns.

=== [[computeStats]] Computing Statistics -- computeStats Method

[source, scala]

computeStats(): Statistics

NOTE: computeStats is part of ../[LeafNode Contract] to compute statistics for ../[cost-based optimizer].

computeStats takes the table statistics from the <> if defined and ../[converts them to Spark statistics] (with <>).

If the table statistics are not available, computeStats reports an IllegalStateException.

table stats must be specified.

Creating Instance

HiveTableRelation takes the following when created:

  • [[tableMeta]] Table metadata
  • [[dataCols]] Columns (as a collection of AttributeReferences)
  • [[partitionCols]] Partition columns (as a collection of AttributeReferences)

=== [[partition-columns]] Partition Columns

When created, HiveTableRelation is given the <>.

FindDataSourceTable.mdFindDataSourceTable logical evaluation rule creates a HiveTableRelation based on a table specification (from a catalog).

The <> are exactly partitions of the table specification.

=== [[isPartitioned]] isPartitioned Method

[source, scala]

isPartitioned: Boolean

isPartitioned is true when there is at least one <>.


isPartitioned is used when:

  • HiveMetastoreCatalog is requested to[convert a HiveTableRelation to a LogicalRelation over a HadoopFsRelation]

  •[RelationConversions] logical posthoc evaluation rule is executed (on a[InsertIntoTable])

* HiveTableScanExec physical operator is[executed]

