Skip to content

InsertIntoHadoopFsRelationCommand Logical Command

InsertIntoHadoopFsRelationCommand is a logical command that writes the result of executing a query to an output path in the given format.

Creating Instance

InsertIntoHadoopFsRelationCommand takes the following to be created:

InsertIntoHadoopFsRelationCommand is created when:

Static Partitions

type TablePartitionSpec = Map[String, String]
staticPartitions: TablePartitionSpec

InsertIntoHadoopFsRelationCommand is given a specification of a table partition (as a mapping of column names to column values) when created.

Partitions can only be given when created for DataSourceAnalysis posthoc logical resolution rule when executed for a InsertIntoStatement over a LogicalRelation with a HadoopFsRelation

There will be no partitions when created for the following:

Dynamic Partition Inserts and dynamicPartitionOverwrite Flag

dynamicPartitionOverwrite: Boolean

InsertIntoHadoopFsRelationCommand defines a dynamicPartitionOverwrite flag to indicate whether dynamic partition inserts is enabled or not.

dynamicPartitionOverwrite is based on the following (in the order of precedence):

dynamicPartitionOverwrite is used when:

  • DataSourceAnalysis logical resolution rule is executed (for dynamic partition overwrite)
  • InsertIntoHadoopFsRelationCommand is executed

Executing Command

  sparkSession: SparkSession,
  child: SparkPlan): Seq[Row]

run uses the spark.sql.hive.manageFilesourcePartitions configuration property to...FIXME

CAUTION: FIXME When is the catalogTable defined?

CAUTION: FIXME When is tracksPartitionsInCatalog of CatalogTable enabled?

run gets the partitionOverwriteMode option...FIXME

run uses FileCommitProtocol utility to instantiate a committer based on the spark.sql.sources.commitProtocolClass and the outputPath, the dynamicPartitionOverwrite, and random jobId.

For insertion, run simply uses the FileFormatWriter utility to write and then...FIXME (does some table-specific "tasks").

Otherwise (for non-insertion case), run simply prints out the following INFO message to the logs and finishes.

Skipping insertion into a relation that already exists.

run makes sure that there are no duplicates in the outputColumnNames.

run is part of the DataWritingCommand abstraction.

Last update: 2020-11-16