Skip to content


:hive-version: 2.3.6 :hadoop-version: 2.10.0 :url-hive-javadoc:{hive-version}/api :url-hadoop-javadoc:{hadoop-version}/api

HiveClientImpl is a[HiveClient] that uses a <> (for meta data/DDL operations using calls to a Hive metastore).

HiveClientImpl is <> exclusively when IsolatedClientLoader is requested to[create a new Hive client]. When created, HiveClientImpl is given the location of the default database for the Hive metastore warehouse (i.e. <> that is the value of ../[hive.metastore.warehouse.dir] Hive-specific Hadoop configuration property).

NOTE: The location of the default database for the Hive metastore warehouse is /user/hive/warehouse by default.

NOTE: The Hadoop configuration is what HiveExternalCatalog was given when created (which is the default Hadoop configuration from Spark Core's SparkContext.hadoopConfiguration with the Spark properties with spark.hadoop prefix).

[[logging]] [TIP] ==== Enable ALL logging level for org.apache.spark.sql.hive.client.HiveClientImpl logger to see what happens inside.

Add the following line to conf/

Refer to ../[Logging].

=== [[creating-instance]] Creating HiveClientImpl Instance

HiveClientImpl takes the following to be created:

  • [[version]] HiveVersion
  • [[warehouseDir]] Location of the default database for the Hive metastore warehouse if defined (aka warehouseDir)
  • [[sparkConf]] SparkConf
  • [[hadoopConf]] Hadoop configuration
  • [[extraConfig]] Extra configuration
  • [[initClassLoader]] Initial ClassLoader
  • [[clientLoader]][IsolatedClientLoader]

HiveClientImpl initializes the <>.

=== [[client]] Hive Metastore Client -- client Internal Method

[source, scala]

client: Hive

client is a Hive {url-hive-javadoc}/org/apache/hadoop/hive/ql/metadata/Hive.html[metastore client] (for meta data/DDL operations using calls to the metastore).

=== [[getTableOption]] Retrieving Table Metadata From Hive Metastore -- getTableOption Method

[source, scala]

getTableOption( dbName: String, tableName: String): Option[CatalogTable]

NOTE: getTableOption is part of[HiveClient] contract.

getTableOption prints out the following DEBUG message to the logs:

Looking up [dbName].[tableName]

getTableOption <> and converts the Hive table metadata to Spark's CatalogTable

=== [[renamePartitions]] renamePartitions Method

[source, scala]

renamePartitions( db: String, table: String, specs: Seq[TablePartitionSpec], newSpecs: Seq[TablePartitionSpec]): Unit

NOTE: renamePartitions is part of[HiveClient Contract] to...FIXME.


=== [[alterPartitions]] alterPartitions Method

[source, scala]

alterPartitions( db: String, table: String, newParts: Seq[CatalogTablePartition]): Unit

NOTE: alterPartitions is part of[HiveClient Contract] to...FIXME.


=== [[getPartitions]] getPartitions Method

[source, scala]

getPartitions( table: CatalogTable, spec: Option[TablePartitionSpec]): Seq[CatalogTablePartition]

NOTE: getPartitions is part of[HiveClient Contract] to...FIXME.


=== [[getPartitionsByFilter]] getPartitionsByFilter Method

[source, scala]

getPartitionsByFilter( table: CatalogTable, predicates: Seq[Expression]): Seq[CatalogTablePartition]

NOTE: getPartitionsByFilter is part of[HiveClient Contract] to...FIXME.


=== [[getPartitionOption]] getPartitionOption Method

[source, scala]

getPartitionOption( table: CatalogTable, spec: TablePartitionSpec): Option[CatalogTablePartition]

NOTE: getPartitionOption is part of[HiveClient Contract] to...FIXME.


=== [[readHiveStats]] Creating Table Statistics from Hive's Table or Partition Parameters -- readHiveStats Internal Method

[source, scala]

readHiveStats(properties: Map[String, String]): Option[CatalogStatistics]

readHiveStats creates a ../[CatalogStatistics] from the input Hive table or partition parameters (if available and greater than 0).

.Table Statistics and Hive Parameters [cols="1,2",options="header",width="100%"] |=== | Hive Parameter | Table Statistics

| totalSize | ../[sizeInBytes]

| rawDataSize | ../[sizeInBytes]

| numRows | ../[rowCount] |===

NOTE: totalSize Hive parameter has a higher precedence over rawDataSize for ../[sizeInBytes] table statistic.

NOTE: readHiveStats is used when HiveClientImpl is requested for the metadata of a <> or <>.

=== [[fromHivePartition]] Retrieving Table Partition Metadata (Converting Table Partition Metadata from Hive Format to Spark SQL Format) -- fromHivePartition Method

[source, scala]

fromHivePartition(hp: HivePartition): CatalogTablePartition

fromHivePartition simply creates a ../[CatalogTablePartition] with the following:

NOTE: fromHivePartition is used when HiveClientImpl is requested for <>, <> and <>.

Converting Native Table Metadata to Hive's Table

  table: CatalogTable,
  userName: Option[String] = None): HiveTable

toHiveTable simply creates a new Hive Table and copies the properties from the input CatalogTable.

toHiveTable is used when:

  • HiveUtils is requested to inferSchema

  • HiveClientImpl is requested to <>, <>, <>, <>, <>, <> and <>

  • HiveTableScanExec physical operator is requested for the <>

  •[InsertIntoHiveDirCommand] and[InsertIntoHiveTable] logical commands are executed

=== [[getSparkSQLDataType]] getSparkSQLDataType Internal Utility

[source, scala]

getSparkSQLDataType(hc: FieldSchema): DataType


NOTE: getSparkSQLDataType is used when...FIXME

=== [[toHivePartition]] Converting CatalogTablePartition to Hive Partition -- toHivePartition Utility

[source, scala]

toHivePartition( p: CatalogTablePartition, ht: Table): Partition

toHivePartition creates a Hive org.apache.hadoop.hive.ql.metadata.Partition for the input ../[CatalogTablePartition] and the Hive org.apache.hadoop.hive.ql.metadata.Table.


toHivePartition is used when:

  • HiveClientImpl is requested to <> or <>

* HiveTableScanExec physical operator is requested for the[raw Hive partitions]

=== [[newSession]] Creating New HiveClientImpl -- newSession Method

[source, scala]

newSession(): HiveClientImpl

NOTE: newSession is part of the[HiveClient] contract to...FIXME.


=== [[getRawTableOption]] getRawTableOption Internal Method

[source, scala]

getRawTableOption( dbName: String, tableName: String): Option[Table]

getRawTableOption requests the <> for the Hive's {url-hive-javadoc}/org/apache/hadoop/hive/ql/metadata/Table.html[metadata] of the input table.

NOTE: getRawTableOption is used when HiveClientImpl is requested to <> and <>.

Last update: 2020-10-07