InsertIntoHiveTable Logical Command

InsertIntoHiveTable is a[logical command] that writes the result of executing a <> to a <>.

InsertIntoHiveTable is <> when:

  •[HiveAnalysis] logical resolution rule is executed and resolves a ../[InsertIntoTable] logical operator with a[Hive table]

  •[CreateHiveTableAsSelectCommand] logical command is executed

Creating Instance

InsertIntoHiveTable takes the following to be created:

  • [[table]] CatalogTable
  • [[partition]] Partition keys with optional values (Map[String, Option[String]])
  • [[query]] Structured query (as a LogicalPlan)
  • [[overwrite]] overwrite Flag
  • [[ifPartitionNotExists]] ifPartitionNotExists Flag
  • [[outputColumnNames]] Names of the output columns

=== [[run]] Executing Data-Writing Logical Command -- run Method

[source, scala]

run( sparkSession: SparkSession, child: SparkPlan): Seq[Row]

NOTE: run is part of ../[DataWritingCommand] contract.

run requests the input ../[SparkSession] for ../[SharedState] that is then requested for the ../[ExternalCatalog].

run requests the ../[SessionState] for a new ../[Hadoop Configuration].

run[converts the CatalogTable metadata to Hive's].


run <> (and[deleteExternalTmpPath]).

run requests the input ../[SparkSession] for ../[Catalog] that is requested to uncache the table.

run un-caches the Hive table. run requests the input ../[SparkSession] for ../[SessionState]. run requests the SessionState for the ../[SessionCatalog] that is requested to invalidate the cache for the table.

In the end, run update the table statistics.

=== [[processInsert]] processInsert Internal Method

[source, scala]

processInsert( sparkSession: SparkSession, externalCatalog: ExternalCatalog, hadoopConf: Configuration, tableDesc: TableDesc, tmpLocation: Path, child: SparkPlan): Unit


NOTE: processInsert is used when InsertIntoHiveTable logical command is <>.

Last update: 2020-11-07