Skip to content


:spark-version: 2.4.5 :hive-version: 2.3.6 :hadoop-version: 2.10.0 :url-hive-javadoc:{hive-version}/api :url-hadoop-docs:{hadoop-version} :url-hadoop-javadoc: {url-hadoop-docs}/api

SaveAsHiveFile is an extension of the ../[DataWritingCommand] contract for <> that can <> (and <>).


SaveAsHiveFile supports viewfs:// URI scheme for <>.

Read up on ViewFs in the {url-hadoop-docs}/hadoop-project-dist/hadoop-hdfs/ViewFs.html[Hadoop official documentation].

[[implementations]] .SaveAsHiveFiles [cols="30,70",options="header",width="100%"] |=== | SaveAsHiveFile | Description

|[InsertIntoHiveDirCommand] | [[InsertIntoHiveDirCommand]]

|[InsertIntoHiveTable] | [[InsertIntoHiveTable]]



  sparkSession: SparkSession,
  plan: SparkPlan,
  hadoopConf: Configuration,
  fileSinkConf: FileSinkDesc,
  outputLocation: String,
  customPartitionLocations: Map[TablePartitionSpec, String] = Map.empty,
  partitionAttributes: Seq[Attribute] = Nil): Set[String]

saveAsHiveFile sets Hadoop configuration properties when a compressed file output format is used (based on hive.exec.compress.output configuration property).

saveAsHiveFile uses FileCommitProtocol utility to instantiate a committer for the input outputLocation based on the spark.sql.sources.commitProtocolClass configuration property.

saveAsHiveFile uses FileFormatWriter utility to write out the result of executing the input physical operator (with a HiveFileFormat for the input FileSinkDesc, the new FileCommitProtocol committer, and the input arguments).

saveAsHiveFile is used when InsertIntoHiveDirCommand and InsertIntoHiveTable logical commands are executed.

=== [[getExternalTmpPath]] getExternalTmpPath Method

[source, scala]

getExternalTmpPath( sparkSession: SparkSession, hadoopConf: Configuration, path: Path): Path

getExternalTmpPath finds the Hive version used. getExternalTmpPath requests the input ../[SparkSession] for the ../[ExternalCatalog] (that is expected to be a HiveExternalCatalog). getExternalTmpPath requests it for the underlying HiveClient that is in turn requested for the[Hive version].

getExternalTmpPath divides (splits) the supported Hive versions into the ones (old versions) that use[hive.exec.scratchdir] directory (0.12.0 to 1.0.0) and the ones (new versions) that use[hive.exec.stagingdir] directory (1.1.0 to 2.3.3).

getExternalTmpPath <> for the old Hive versions and <> for the new Hive versions.

getExternalTmpPath throws an IllegalStateException for unsupported Hive version:

Unsupported hive version: [hiveVersion]

NOTE: getExternalTmpPath is used when[InsertIntoHiveDirCommand] and[InsertIntoHiveTable] logical commands are executed.

=== [[deleteExternalTmpPath]] deleteExternalTmpPath Method

[source, scala]

deleteExternalTmpPath( hadoopConf: Configuration): Unit


NOTE: deleteExternalTmpPath is used when...FIXME

=== [[oldVersionExternalTempPath]] oldVersionExternalTempPath Internal Method

[source, scala]

oldVersionExternalTempPath( path: Path, hadoopConf: Configuration, scratchDir: String): Path


NOTE: oldVersionExternalTempPath is used when SaveAsHiveFile is requested to <>.

=== [[newVersionExternalTempPath]] newVersionExternalTempPath Internal Method

[source, scala]

newVersionExternalTempPath( path: Path, hadoopConf: Configuration, stagingDir: String): Path


NOTE: newVersionExternalTempPath is used when SaveAsHiveFile is requested to <>.

=== [[getExtTmpPathRelTo]] getExtTmpPathRelTo Internal Method

[source, scala]

getExtTmpPathRelTo( path: Path, hadoopConf: Configuration, stagingDir: String): Path


NOTE: getExtTmpPathRelTo is used when SaveAsHiveFile is requested to <>.

=== [[getExternalScratchDir]] getExternalScratchDir Internal Method

[source, scala]

getExternalScratchDir( extURI: URI, hadoopConf: Configuration, stagingDir: String): Path


NOTE: getExternalScratchDir is used when SaveAsHiveFile is requested to <>.

=== [[getStagingDir]] getStagingDir Internal Method

[source, scala]

getStagingDir( inputPath: Path, hadoopConf: Configuration, stagingDir: String): Path


NOTE: getStagingDir is used when SaveAsHiveFile is requested to <> and <>.

=== [[executionId]] executionId Internal Method

[source, scala]

executionId: String


NOTE: executionId is used when...FIXME

=== [[createdTempDir]] createdTempDir Internal Registry

[source, scala]

createdTempDir: Option[Path] = None

createdTempDir is a Hadoop {url-hadoop-javadoc}/org/apache/hadoop/fs/Path.html[Path] of a staging directory.

createdTempDir is initialized when SaveAsHiveFile is requested to <> and <>.

createdTempDir is the[hive.exec.stagingdir] configuration property.

createdTempDir is deleted when SaveAsHiveFile is requested to <> and at the normal termination of VM (since deleteOnExit is used).

Last update: 2020-11-07