Skip to content

SQLConf — Internal Configuration Store

SQLConf is an internal configuration store for configuration properties and hints used in Spark SQL.

Important

SQLConf is an internal part of Spark SQL and is not supposed to be used directly. Spark SQL configuration is available through the developer-facing RuntimeConfig.

SQLConf offers methods to get, set, unset or clear values of the configuration properties and hints as well as to read the current values.

Accessing SQLConf

You can access a SQLConf using:

  • SQLConf.get (preferred) - the SQLConf of the current active SparkSession

  • SessionState - direct access through SessionState of the SparkSession of your choice (that gives more flexibility on what SparkSession is used that can be different from the current active SparkSession)

import org.apache.spark.sql.internal.SQLConf

// Use type-safe access to configuration properties
// using SQLConf.get.getConf
val parallelFileListingInStatsComputation = SQLConf.get.getConf(SQLConf.PARALLEL_FILE_LISTING_IN_STATS_COMPUTATION)

// or even simpler
SQLConf.get.parallelFileListingInStatsComputation
scala> :type spark
org.apache.spark.sql.SparkSession

// Direct access to the session SQLConf
val sqlConf = spark.sessionState.conf
scala> :type sqlConf
org.apache.spark.sql.internal.SQLConf

scala> println(sqlConf.offHeapColumnVectorEnabled)
false

// Or simply import the conf value
import spark.sessionState.conf

// accessing properties through accessor methods
scala> conf.numShufflePartitions
res1: Int = 200

// Prefer SQLConf.get (over direct access)
import org.apache.spark.sql.internal.SQLConf
val cc = SQLConf.get
scala> cc == conf
res4: Boolean = true

// setting properties using aliases
import org.apache.spark.sql.internal.SQLConf.SHUFFLE_PARTITIONS
conf.setConf(SHUFFLE_PARTITIONS, 2)
scala> conf.numShufflePartitions
res2: Int = 2

// unset aka reset properties to the default value
conf.unsetConf(SHUFFLE_PARTITIONS)
scala> conf.numShufflePartitions
res3: Int = 200

ADAPTIVE_EXECUTION_FORCE_APPLY

spark.sql.adaptive.forceApply configuration property

Used when InsertAdaptiveSparkPlan physical optimization is executed

adaptiveExecutionEnabled

The value of spark.sql.adaptive.enabled configuration property

Used when:

adaptiveExecutionLogLevel

The value of spark.sql.adaptive.logLevel configuration property

Used when AdaptiveSparkPlanExec physical operator is executed

ADVISORY_PARTITION_SIZE_IN_BYTES

spark.sql.adaptive.advisoryPartitionSizeInBytes configuration property

Used when CoalesceShufflePartitions and OptimizeSkewedJoin physical optimizations are executed

autoBroadcastJoinThreshold

The value of spark.sql.autoBroadcastJoinThreshold configuration property

Used in JoinSelection execution planning strategy

autoSizeUpdateEnabled

The value of spark.sql.statistics.size.autoUpdate.enabled configuration property

Used when:

avroCompressionCodec

The value of spark.sql.avro.compression.codec configuration property

Used when AvroOptions is requested for the compression configuration property (and it was not set explicitly)

broadcastTimeout

The value of spark.sql.broadcastTimeout configuration property

Used in BroadcastExchangeExec (for broadcasting a table to executors)

bucketingEnabled

The value of spark.sql.sources.bucketing.enabled configuration property

Used when FileSourceScanExec physical operator is requested for the input RDD and to determine output partitioning and ordering

cacheVectorizedReaderEnabled

The value of spark.sql.inMemoryColumnarStorage.enableVectorizedReader configuration property

Used when InMemoryTableScanExec physical operator is requested for supportsBatch flag.

caseSensitiveAnalysis

The value of spark.sql.caseSensitive configuration property

cboEnabled

The value of spark.sql.cbo.enabled configuration property

Used in:

coalesceShufflePartitionsEnabled

The value of spark.sql.adaptive.coalescePartitions.enabled configuration property

Used when CoalesceShufflePartitions and EnsureRequirements physical optimizations are executed

columnBatchSize

The value of spark.sql.inMemoryColumnarStorage.batchSize configuration property

Used when:

constraintPropagationEnabled

The value of spark.sql.constraintPropagation.enabled configuration property

Used when:

CONVERT_METASTORE_ORC

The value of spark.sql.hive.convertMetastoreOrc configuration property

Used when RelationConversions logical post-hoc evaluation rule is executed (and requested to isConvertible)

CONVERT_METASTORE_PARQUET

The value of spark.sql.hive.convertMetastoreParquet configuration property

Used when RelationConversions logical post-hoc evaluation rule is executed (and requested to isConvertible)

dataFramePivotMaxValues

The value of spark.sql.pivotMaxValues configuration property

Used in pivot operator.

dataFrameRetainGroupColumns

The value of spark.sql.retainGroupColumns configuration property

Used in RelationalGroupedDataset when creating the result Dataset (after agg, count, mean, max, avg, min, and sum operators).

DEFAULT_CATALOG

The value of spark.sql.defaultCatalog configuration property

Used when CatalogManager is requested for the current CatalogPlugin

defaultDataSourceName

The value of spark.sql.sources.default configuration property

Used when:

  • FIXME

dynamicPartitionPruningEnabled

The value of spark.sql.optimizer.dynamicPartitionPruning.enabled configuration property

Used when:

dynamicPartitionPruningFallbackFilterRatio

The value of spark.sql.optimizer.dynamicPartitionPruning.fallbackFilterRatio configuration property

Used when PartitionPruning logical optimization rule is executed.

dynamicPartitionPruningUseStats

The value of spark.sql.optimizer.dynamicPartitionPruning.useStats configuration property

Used when PartitionPruning logical optimization rule is executed.

dynamicPartitionPruningReuseBroadcastOnly

The value of spark.sql.optimizer.dynamicPartitionPruning.reuseBroadcastOnly configuration property

Used when PartitionPruning logical optimization is executed

defaultSizeInBytes

The value of spark.sql.defaultSizeInBytes configuration property

Used when:

exchangeReuseEnabled

The value of spark.sql.exchange.reuse configuration property

Used when:

enableRadixSort

spark.sql.sort.enableRadixSort

Used when SortExec physical operator is requested for a <>.

fallBackToHdfsForStatsEnabled

spark.sql.statistics.fallBackToHdfs

Used when DetermineTableStats logical resolution rule is executed.

fetchShuffleBlocksInBatch

The value of spark.sql.adaptive.fetchShuffleBlocksInBatch configuration property

Used when ShuffledRowRDD is created

fileCommitProtocolClass

spark.sql.sources.commitProtocolClass

Used (to instantiate a FileCommitProtocol) when:

fileCompressionFactor

The value of spark.sql.sources.fileCompressionFactor configuration property

Used when:

filesMaxPartitionBytes

spark.sql.files.maxPartitionBytes

Used when <> leaf physical operator is requested to <>

filesOpenCostInBytes

spark.sql.files.openCostInBytes

Used when <> leaf physical operator is requested to <>

histogramEnabled

The value of spark.sql.statistics.histogram.enabled configuration property

Used when AnalyzeColumnCommand logical command is executed.

histogramNumBins

spark.sql.statistics.histogram.numBins

Used when AnalyzeColumnCommand is AnalyzeColumnCommand.md#run[executed] with configuration-properties.md#spark.sql.statistics.histogram.enabled[spark.sql.statistics.histogram.enabled] turned on (and AnalyzeColumnCommand.md#computePercentiles[calculates percentiles]).

hugeMethodLimit

spark.sql.codegen.hugeMethodLimit

Used when WholeStageCodegenExec unary physical operator is requested to <> (and generate a RDD[InternalRow]), i.e. when the compiled function exceeds this threshold, the whole-stage codegen is deactivated for this subtree of the query plan.

ignoreCorruptFiles

The value of spark.sql.files.ignoreCorruptFiles configuration property

Used when:

  • AvroUtils utility is requested to inferSchema
  • OrcFileFormat is requested to inferSchema and buildReader
  • FileScanRDD is created (and then to compute a partition)
  • SchemaMergeUtils utility is requested to mergeSchemasInParallel
  • OrcUtils utility is requested to readSchema
  • FilePartitionReader is requested to ignoreCorruptFiles

ignoreMissingFiles

The value of spark.sql.files.ignoreMissingFiles configuration property

Used when:

inMemoryPartitionPruning

spark.sql.inMemoryColumnarStorage.partitionPruning

Used when InMemoryTableScanExec physical operator is requested for filtered cached column batches (as a RDD[CachedBatch]).

isParquetBinaryAsString

spark.sql.parquet.binaryAsString

isParquetINT96AsTimestamp

spark.sql.parquet.int96AsTimestamp

isParquetINT96TimestampConversion

spark.sql.parquet.int96TimestampConversion

Used when ParquetFileFormat is requested to build a data reader with partition column values appended.

joinReorderEnabled

spark.sql.cbo.joinReorder.enabled

Used in CostBasedJoinReorder logical plan optimization

limitScaleUpFactor

spark.sql.limit.scaleUpFactor

Used when a physical operator is requested the first n rows as an array.

manageFilesourcePartitions

spark.sql.hive.manageFilesourcePartitions

Used when:

maxRecordsPerFile

The value of spark.sql.files.maxRecordsPerFile configuration property

Used when:

maxToStringFields

The value of spark.sql.debug.maxToStringFields configuration property

metastorePartitionPruning

spark.sql.hive.metastorePartitionPruning

Used when HiveTableScanExec physical operator is executed with a partitioned table (and requested for rawPartitions)

minNumPostShufflePartitions

spark.sql.adaptive.minNumPostShufflePartitions

Used when EnsureRequirements physical optimization is executed (for Adaptive Query Execution).

nestedSchemaPruningEnabled

The value of spark.sql.optimizer.nestedSchemaPruning.enabled configuration property

Used when SchemaPruning, ColumnPruning and V2ScanRelationPushDown logical optimizations are executed

nonEmptyPartitionRatioForBroadcastJoin

The value of spark.sql.adaptive.nonEmptyPartitionRatioForBroadcastJoin configuration property

Used when DemoteBroadcastHashJoin logical optimization is executed

numShufflePartitions

The value of spark.sql.shuffle.partitions configuration property

rangeExchangeSampleSizePerPartition

The value of spark.sql.execution.rangeExchange.sampleSizePerPartition configuration property

Used when ShuffleExchangeExec physical operator is executed

SKEW_JOIN_SKEWED_PARTITION_FACTOR

spark.sql.adaptive.skewJoin.skewedPartitionFactor configuration property

Used when OptimizeSkewedJoin physical optimization is executed

SKEW_JOIN_SKEWED_PARTITION_THRESHOLD

spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes configuration property

Used when OptimizeSkewedJoin physical optimization is executed

SKEW_JOIN_ENABLED

spark.sql.adaptive.skewJoin.enabled configuration property

Used when OptimizeSkewedJoin physical optimization is executed

offHeapColumnVectorEnabled

spark.sql.columnVector.offheap.enabled

Used when:

optimizerExcludedRules

The value of spark.sql.optimizer.excludedRules configuration property

Used when Optimizer is requested for the batches

optimizerInSetConversionThreshold

spark.sql.optimizer.inSetConversionThreshold

Used when OptimizeIn logical query optimization is executed

ORC_IMPLEMENTATION

Supported values:

  • native for OrcFileFormat
  • hive for org.apache.spark.sql.hive.orc.OrcFileFormat

parallelFileListingInStatsComputation

spark.sql.statistics.parallelFileListingInStatsComputation.enabled

Used when CommandUtils helper object is requested to calculate the total size of a table (with partitions) (for AnalyzeColumnCommand and AnalyzeTableCommand commands)

parquetFilterPushDown

spark.sql.parquet.filterPushdown

Used when ParquetFileFormat is requested to build a data reader with partition column values appended.

parquetFilterPushDownDate

spark.sql.parquet.filterPushdown.date

Used when ParquetFileFormat is requested to build a data reader with partition column values appended.

parquetRecordFilterEnabled

spark.sql.parquet.recordLevelFilter.enabled

Used when ParquetFileFormat is requested to build a data reader with partition column values appended.

parquetVectorizedReaderBatchSize

spark.sql.parquet.columnarReaderBatchSize

Used when ParquetFileFormat is requested for a data reader (and creates a VectorizedParquetRecordReader for Vectorized Parquet Decoding)

parquetVectorizedReaderEnabled

spark.sql.parquet.enableVectorizedReader

Used when:

partitionOverwriteMode

The value of spark.sql.sources.partitionOverwriteMode configuration property

Used when InsertIntoHadoopFsRelationCommand logical command is executed

preferSortMergeJoin

spark.sql.join.preferSortMergeJoin

Used in JoinSelection execution planning strategy to prefer sort merge join over shuffle hash join.

replaceDatabricksSparkAvroEnabled

spark.sql.legacy.replaceDatabricksSparkAvro.enabled

replaceExceptWithFilter

spark.sql.optimizer.replaceExceptWithFilter

Used when ReplaceExceptWithFilter logical optimization is executed

runSQLonFile

spark.sql.runSQLOnFiles

Used when:

sessionLocalTimeZone

The value of spark.sql.session.timeZone configuration property

sortBeforeRepartition

The value of spark.sql.execution.sortBeforeRepartition configuration property

Used when ShuffleExchangeExec physical operator is executed

starSchemaDetection

spark.sql.cbo.starSchemaDetection

Used in ReorderJoin logical optimization (and indirectly in StarSchemaDetection)

stringRedactionPattern

spark.sql.redaction.string.regex

Used when:

subexpressionEliminationEnabled

spark.sql.subexpressionElimination.enabled

Used when SparkPlan is requested for subexpressionEliminationEnabled flag.

supportQuotedRegexColumnName

spark.sql.parser.quotedRegexColumnNames

Used when:

targetPostShuffleInputSize

spark.sql.adaptive.shuffle.targetPostShuffleInputSize

Used when EnsureRequirements physical optimization is executed (for Adaptive Query Execution)

truncateTableIgnorePermissionAcl

spark.sql.truncateTable.ignorePermissionAcl.enabled

Used when TruncateTableCommand logical command is executed

useCompression

The value of spark.sql.inMemoryColumnarStorage.compressed configuration property

Used when CacheManager is requested to cache a structured query

useObjectHashAggregation

spark.sql.execution.useObjectHashAggregateExec

Used when Aggregation execution planning strategy is executed (and uses AggUtils to create an aggregation physical operator).

wholeStageEnabled

spark.sql.codegen.wholeStage

Used in:

wholeStageFallback

spark.sql.codegen.fallback

Used when WholeStageCodegenExec physical operator is executed.

wholeStageMaxNumFields

spark.sql.codegen.maxFields

Used in:

wholeStageSplitConsumeFuncByOperator

spark.sql.codegen.splitConsumeFuncByOperator

Used when CodegenSupport is requested to consume

wholeStageUseIdInClassName

spark.sql.codegen.useIdInClassName

Used when WholeStageCodegenExec is requested to generate the Java source code for the child physical plan subtree (when created)

windowExecBufferInMemoryThreshold

spark.sql.windowExec.buffer.in.memory.threshold

Used when WindowExec unary physical operator is executed

windowExecBufferSpillThreshold

spark.sql.windowExec.buffer.spill.threshold

Used when WindowExec unary physical operator is executed


Last update: 2020-12-29