Skip to content

SQLConf — Internal Configuration Store

SQLConf is an internal configuration store for configuration properties and hints used in Spark SQL.

Important

SQLConf is an internal part of Spark SQL and is not supposed to be used directly. Spark SQL configuration is available through the developer-facing RuntimeConfig.

SQLConf offers methods to get, set, unset or clear values of the configuration properties and hints as well as to read the current values.

Accessing SQLConf

You can access a SQLConf using:

  • SQLConf.get (preferred) - the SQLConf of the current active SparkSession

  • SessionState - direct access through SessionState of the SparkSession of your choice (that gives more flexibility on what SparkSession is used that can be different from the current active SparkSession)

import org.apache.spark.sql.internal.SQLConf

// Use type-safe access to configuration properties
// using SQLConf.get.getConf
val parallelFileListingInStatsComputation = SQLConf.get.getConf(SQLConf.PARALLEL_FILE_LISTING_IN_STATS_COMPUTATION)

// or even simpler
SQLConf.get.parallelFileListingInStatsComputation
scala> :type spark
org.apache.spark.sql.SparkSession

// Direct access to the session SQLConf
val sqlConf = spark.sessionState.conf
scala> :type sqlConf
org.apache.spark.sql.internal.SQLConf

scala> println(sqlConf.offHeapColumnVectorEnabled)
false

// Or simply import the conf value
import spark.sessionState.conf

// accessing properties through accessor methods
scala> conf.numShufflePartitions
res1: Int = 200

// Prefer SQLConf.get (over direct access)
import org.apache.spark.sql.internal.SQLConf
val cc = SQLConf.get
scala> cc == conf
res4: Boolean = true

// setting properties using aliases
import org.apache.spark.sql.internal.SQLConf.SHUFFLE_PARTITIONS
conf.setConf(SHUFFLE_PARTITIONS, 2)
scala> conf.numShufflePartitions
res2: Int = 2

// unset aka reset properties to the default value
conf.unsetConf(SHUFFLE_PARTITIONS)
scala> conf.numShufflePartitions
res3: Int = 200

ADAPTIVE_AUTO_BROADCASTJOIN_THRESHOLD

spark.sql.adaptive.autoBroadcastJoinThreshold

Used when:

ADAPTIVE_EXECUTION_FORCE_APPLY

spark.sql.adaptive.forceApply configuration property

Used when:

adaptiveExecutionEnabled

The value of spark.sql.adaptive.enabled configuration property

Used when:

adaptiveExecutionLogLevel

The value of spark.sql.adaptive.logLevel configuration property

Used when AdaptiveSparkPlanExec physical operator is executed

ADAPTIVE_MAX_SHUFFLE_HASH_JOIN_LOCAL_MAP_THRESHOLD

spark.sql.adaptive.maxShuffledHashJoinLocalMapThreshold configuration property

Used when:

ADAPTIVE_OPTIMIZER_EXCLUDED_RULES

The value of spark.sql.adaptive.optimizer.excludedRules configuration property

Used when:

ADVISORY_PARTITION_SIZE_IN_BYTES

spark.sql.adaptive.advisoryPartitionSizeInBytes configuration property

Used when:

autoBroadcastJoinThreshold

The value of spark.sql.autoBroadcastJoinThreshold configuration property

Used when:

autoBucketedScanEnabled

The value of spark.sql.sources.bucketing.autoBucketedScan.enabled configuration property

Used when:

allowStarWithSingleTableIdentifierInCount

spark.sql.legacy.allowStarWithSingleTableIdentifierInCount

Used when:

  • ResolveReferences logical resolution rule is executed

arrowPySparkSelfDestructEnabled

spark.sql.execution.arrow.pyspark.selfDestruct.enabled

Used when:

  • PandasConversionMixin is requested to toPandas

allowAutoGeneratedAliasForView

spark.sql.legacy.allowAutoGeneratedAliasForView

Used when:

  • ViewHelper utility is used to verifyAutoGeneratedAliasesNotExists

allowNonEmptyLocationInCTAS

spark.sql.legacy.allowNonEmptyLocationInCTAS

Used when:

allowNonEmptyLocationInCTAS

spark.sql.adaptive.optimizeSkewsInRebalancePartitions.enabled

Used when:

  • OptimizeSkewInRebalancePartitions physical optimization is executed

ADAPTIVE_CUSTOM_COST_EVALUATOR_CLASS

spark.sql.adaptive.customCostEvaluatorClass

Used when:

  • AdaptiveSparkPlanExec leaf physical operator is requested for the costEvaluator

autoSizeUpdateEnabled

The value of spark.sql.statistics.size.autoUpdate.enabled configuration property

Used when:

avroCompressionCodec

The value of spark.sql.avro.compression.codec configuration property

Used when AvroOptions is requested for the compression configuration property (and it was not set explicitly)

broadcastTimeout

The value of spark.sql.broadcastTimeout configuration property

Used in BroadcastExchangeExec (for broadcasting a table to executors)

bucketingEnabled

The value of spark.sql.sources.bucketing.enabled configuration property

Used when FileSourceScanExec physical operator is requested for the input RDD and to determine output partitioning and ordering

cacheVectorizedReaderEnabled

The value of spark.sql.inMemoryColumnarStorage.enableVectorizedReader configuration property

Used when InMemoryTableScanExec physical operator is requested for supportsBatch flag.

CAN_CHANGE_CACHED_PLAN_OUTPUT_PARTITIONING

spark.sql.optimizer.canChangeCachedPlanOutputPartitioning

Used when:

caseSensitiveAnalysis

The value of spark.sql.caseSensitive configuration property

cboEnabled

The value of spark.sql.cbo.enabled configuration property

Used in:

cliPrintHeader

spark.sql.cli.print.header

Used when:

  • SparkSQLCLIDriver is requested to processCmd

coalesceBucketsInJoinEnabled

The value of spark.sql.bucketing.coalesceBucketsInJoin.enabled configuration property

Used when:

COALESCE_PARTITIONS_MIN_PARTITION_SIZE

spark.sql.adaptive.coalescePartitions.minPartitionSize configuration property

Used when:

COALESCE_PARTITIONS_PARALLELISM_FIRST

spark.sql.adaptive.coalescePartitions.parallelismFirst configuration property

Used when:

coalesceShufflePartitionsEnabled

The value of spark.sql.adaptive.coalescePartitions.enabled configuration property

Used when:

columnBatchSize

The value of spark.sql.inMemoryColumnarStorage.batchSize configuration property

Used when:

  • CacheManager is requested to cache a structured query
  • RowToColumnarExec physical operator is requested to doExecuteColumnar

constraintPropagationEnabled

The value of spark.sql.constraintPropagation.enabled configuration property

Used when:

CONVERT_METASTORE_ORC

The value of spark.sql.hive.convertMetastoreOrc configuration property

Used when RelationConversions logical post-hoc evaluation rule is executed (and requested to isConvertible)

CONVERT_METASTORE_PARQUET

The value of spark.sql.hive.convertMetastoreParquet configuration property

Used when RelationConversions logical post-hoc evaluation rule is executed (and requested to isConvertible)

csvExpressionOptimization

spark.sql.optimizer.enableCsvExpressionOptimization

Used when:

  • OptimizeCsvJsonExprs logical optimization is executed

dataFramePivotMaxValues

The value of spark.sql.pivotMaxValues configuration property

Used in pivot operator.

dataFrameRetainGroupColumns

The value of spark.sql.retainGroupColumns configuration property

Used in RelationalGroupedDataset when creating the result Dataset (after agg, count, mean, max, avg, min, and sum operators).

decorrelateInnerQueryEnabled

spark.sql.optimizer.decorrelateInnerQuery.enabled

Used when:

DEFAULT_CATALOG

The value of spark.sql.defaultCatalog configuration property

Used when CatalogManager is requested for the current CatalogPlugin

defaultDataSourceName

The value of spark.sql.sources.default configuration property

Used when:

  • FIXME

dynamicPartitionPruningEnabled

The value of spark.sql.optimizer.dynamicPartitionPruning.enabled configuration property

Used when:

dynamicPartitionPruningFallbackFilterRatio

The value of spark.sql.optimizer.dynamicPartitionPruning.fallbackFilterRatio configuration property

Used when:

dynamicPartitionPruningPruningSideExtraFilterRatio

The value of spark.sql.optimizer.dynamicPartitionPruning.pruningSideExtraFilterRatio configuration property

Used when:

dynamicPartitionPruningUseStats

The value of spark.sql.optimizer.dynamicPartitionPruning.useStats configuration property

Used when PartitionPruning logical optimization rule is executed.

dynamicPartitionPruningReuseBroadcastOnly

The value of spark.sql.optimizer.dynamicPartitionPruning.reuseBroadcastOnly configuration property

Used when PartitionPruning logical optimization is executed

defaultSizeInBytes

The value of spark.sql.defaultSizeInBytes configuration property

Used when:

exchangeReuseEnabled

spark.sql.exchange.reuse

Used when:

enableRadixSort

spark.sql.sort.enableRadixSort

Used when:

fallBackToHdfsForStatsEnabled

spark.sql.statistics.fallBackToHdfs

Used when DetermineTableStats logical resolution rule is executed.

fetchShuffleBlocksInBatch

The value of spark.sql.adaptive.fetchShuffleBlocksInBatch configuration property

Used when ShuffledRowRDD is created

fileCommitProtocolClass

spark.sql.sources.commitProtocolClass

Used (to instantiate a FileCommitProtocol) when:

fileCompressionFactor

The value of spark.sql.sources.fileCompressionFactor configuration property

Used when:

filesMaxPartitionBytes

spark.sql.files.maxPartitionBytes

Used when <> leaf physical operator is requested to <>

filesOpenCostInBytes

spark.sql.files.openCostInBytes

Used when <> leaf physical operator is requested to <>

histogramEnabled

The value of spark.sql.statistics.histogram.enabled configuration property

Used when AnalyzeColumnCommand logical command is executed.

histogramNumBins

spark.sql.statistics.histogram.numBins

Used when AnalyzeColumnCommand is AnalyzeColumnCommand.md#run[executed] with configuration-properties.md#spark.sql.statistics.histogram.enabled[spark.sql.statistics.histogram.enabled] turned on (and AnalyzeColumnCommand.md#computePercentiles[calculates percentiles]).

HIVE_TABLE_PROPERTY_LENGTH_THRESHOLD

spark.sql.hive.tablePropertyLengthThreshold

Used when:

hugeMethodLimit

spark.sql.codegen.hugeMethodLimit

Used when WholeStageCodegenExec unary physical operator is requested to <> (and generate a RDD[InternalRow]), i.e. when the compiled function exceeds this threshold, the whole-stage codegen is deactivated for this subtree of the query plan.

ignoreCorruptFiles

The value of spark.sql.files.ignoreCorruptFiles configuration property

Used when:

  • AvroUtils utility is requested to inferSchema
  • OrcFileFormat is requested to inferSchema and buildReader
  • FileScanRDD is created (and then to compute a partition)
  • SchemaMergeUtils utility is requested to mergeSchemasInParallel
  • OrcUtils utility is requested to readSchema
  • FilePartitionReader is requested to ignoreCorruptFiles

ignoreMissingFiles

The value of spark.sql.files.ignoreMissingFiles configuration property

Used when:

inMemoryPartitionPruning

spark.sql.inMemoryColumnarStorage.partitionPruning

Used when InMemoryTableScanExec physical operator is requested for filtered cached column batches (as a RDD[CachedBatch]).

isParquetBinaryAsString

spark.sql.parquet.binaryAsString

isParquetINT96AsTimestamp

spark.sql.parquet.int96AsTimestamp

isParquetINT96TimestampConversion

spark.sql.parquet.int96TimestampConversion

Used when ParquetFileFormat is requested to build a data reader with partition column values appended.

joinReorderEnabled

spark.sql.cbo.joinReorder.enabled

Used in CostBasedJoinReorder logical plan optimization

legacyIntervalEnabled

spark.sql.legacy.interval.enabled

Used when:

limitScaleUpFactor

spark.sql.limit.scaleUpFactor

Used when a physical operator is requested the first n rows as an array.

LOCAL_SHUFFLE_READER_ENABLED

spark.sql.adaptive.localShuffleReader.enabled

Used when:

manageFilesourcePartitions

spark.sql.hive.manageFilesourcePartitions

Used when:

maxConcurrentOutputFileWriters

The value of spark.sql.maxConcurrentOutputFileWriters configuration property

Used when:

maxMetadataStringLength

The value of spark.sql.maxMetadataStringLength configuration property

Used when:

maxRecordsPerFile

The value of spark.sql.files.maxRecordsPerFile configuration property

Used when:

maxToStringFields

The value of spark.sql.debug.maxToStringFields configuration property

metastorePartitionPruning

spark.sql.hive.metastorePartitionPruning

Used when HiveTableScanExec physical operator is executed with a partitioned table (and requested for rawPartitions)

methodSplitThreshold

spark.sql.codegen.methodSplitThreshold

Used when:

minNumPostShufflePartitions

spark.sql.adaptive.minNumPostShufflePartitions

Used when EnsureRequirements physical optimization is executed (for Adaptive Query Execution).

nestedSchemaPruningEnabled

The value of spark.sql.optimizer.nestedSchemaPruning.enabled configuration property

Used when SchemaPruning, ColumnPruning and V2ScanRelationPushDown logical optimizations are executed

nonEmptyPartitionRatioForBroadcastJoin

The value of spark.sql.adaptive.nonEmptyPartitionRatioForBroadcastJoin configuration property

Used when:

numShufflePartitions

The value of spark.sql.shuffle.partitions configuration property or...FIXME

rangeExchangeSampleSizePerPartition

The value of spark.sql.execution.rangeExchange.sampleSizePerPartition configuration property

Used when:

REMOVE_REDUNDANT_SORTS_ENABLED

The value of spark.sql.execution.removeRedundantSorts configuration property

Used when:

SKEW_JOIN_SKEWED_PARTITION_FACTOR

spark.sql.adaptive.skewJoin.skewedPartitionFactor configuration property

Used when:

SKEW_JOIN_SKEWED_PARTITION_THRESHOLD

spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes configuration property

Used when:

SKEW_JOIN_ENABLED

spark.sql.adaptive.skewJoin.enabled configuration property

Used when:

offHeapColumnVectorEnabled

spark.sql.columnVector.offheap.enabled

Used when:

OPTIMIZE_ONE_ROW_RELATION_SUBQUERY

spark.sql.optimizer.optimizeOneRowRelationSubquery

Used when:

  • OptimizeOneRowRelationSubquery logical optimization is executed

optimizeNullAwareAntiJoin

spark.sql.optimizeNullAwareAntiJoin configuration property

Used when:

optimizerExcludedRules

The value of spark.sql.optimizer.excludedRules configuration property

Used when Optimizer is requested for the batches

optimizerInSetConversionThreshold

spark.sql.optimizer.inSetConversionThreshold

Used when OptimizeIn logical query optimization is executed

orcVectorizedReaderNestedColumnEnabled

spark.sql.orc.enableNestedColumnVectorizedReader

Used when:

ORC_IMPLEMENTATION

Supported values:

  • native for OrcFileFormat
  • hive for org.apache.spark.sql.hive.orc.OrcFileFormat

parallelFileListingInStatsComputation

spark.sql.statistics.parallelFileListingInStatsComputation.enabled

Used when CommandUtils helper object is requested to calculate the total size of a table (with partitions) (for AnalyzeColumnCommand and AnalyzeTableCommand commands)

parquetFilterPushDown

spark.sql.parquet.filterPushdown

Used when ParquetFileFormat is requested to build a data reader with partition column values appended.

parquetFilterPushDownDate

spark.sql.parquet.filterPushdown.date

Used when ParquetFileFormat is requested to build a data reader with partition column values appended.

parquetRecordFilterEnabled

spark.sql.parquet.recordLevelFilter.enabled

Used when ParquetFileFormat is requested to build a data reader with partition column values appended.

parquetVectorizedReaderBatchSize

spark.sql.parquet.columnarReaderBatchSize

Used when ParquetFileFormat is requested for a data reader (and creates a VectorizedParquetRecordReader for Vectorized Parquet Decoding)

parquetVectorizedReaderEnabled

spark.sql.parquet.enableVectorizedReader

Used when:

partitionOverwriteMode

The value of spark.sql.sources.partitionOverwriteMode configuration property

Used when InsertIntoHadoopFsRelationCommand logical command is executed

planChangeLogLevel

The value of spark.sql.planChangeLog.level configuration property

Used when:

planChangeBatches

The value of spark.sql.planChangeLog.batches configuration property

Used when:

  • PlanChangeLogger is requested to logBatch

planChangeRules

The value of spark.sql.planChangeLog.rules configuration property

Used when:

  • PlanChangeLogger is requested to logRule

preferSortMergeJoin

spark.sql.join.preferSortMergeJoin

Used in JoinSelection execution planning strategy to prefer sort merge join over shuffle hash join.

LEAF_NODE_DEFAULT_PARALLELISM

spark.sql.leafNodeDefaultParallelism

Used when:

LEGACY_CTE_PRECEDENCE_POLICY

spark.sql.legacy.ctePrecedencePolicy

replaceDatabricksSparkAvroEnabled

spark.sql.legacy.replaceDatabricksSparkAvro.enabled

replaceExceptWithFilter

spark.sql.optimizer.replaceExceptWithFilter

Used when ReplaceExceptWithFilter logical optimization is executed

runSQLonFile

spark.sql.runSQLOnFiles

Used when:

sessionLocalTimeZone

spark.sql.session.timeZone

sessionWindowBufferInMemoryThreshold

spark.sql.sessionWindow.buffer.in.memory.threshold

Used when:

  • UpdatingSessionsExec unary physical operator is executed

sessionWindowBufferSpillThreshold

spark.sql.sessionWindow.buffer.spill.threshold

Used when:

  • UpdatingSessionsExec unary physical operator is executed

sortBeforeRepartition

The value of spark.sql.execution.sortBeforeRepartition configuration property

Used when ShuffleExchangeExec physical operator is executed

starSchemaDetection

spark.sql.cbo.starSchemaDetection

Used in ReorderJoin logical optimization (and indirectly in StarSchemaDetection)

stringRedactionPattern

spark.sql.redaction.string.regex

Used when:

subexpressionEliminationEnabled

spark.sql.subexpressionElimination.enabled

Used when SparkPlan is requested for subexpressionEliminationEnabled flag.

subqueryReuseEnabled

spark.sql.execution.reuseSubquery

Used when:

supportQuotedRegexColumnName

spark.sql.parser.quotedRegexColumnNames

Used when:

targetPostShuffleInputSize

spark.sql.adaptive.shuffle.targetPostShuffleInputSize

Used when EnsureRequirements physical optimization is executed (for Adaptive Query Execution)

THRIFTSERVER_FORCE_CANCEL

spark.sql.thriftServer.interruptOnCancel

Used when:

  • SparkExecuteStatementOperation is created (forceCancel)

truncateTableIgnorePermissionAcl

spark.sql.truncateTable.ignorePermissionAcl.enabled

Used when TruncateTableCommand logical command is executed

useCompression

The value of spark.sql.inMemoryColumnarStorage.compressed configuration property

Used when CacheManager is requested to cache a structured query

useObjectHashAggregation

spark.sql.execution.useObjectHashAggregateExec

Used when Aggregation execution planning strategy is executed (and uses AggUtils to create an aggregation physical operator).

wholeStageEnabled

spark.sql.codegen.wholeStage

Used in:

wholeStageFallback

spark.sql.codegen.fallback

Used when WholeStageCodegenExec physical operator is executed.

wholeStageMaxNumFields

spark.sql.codegen.maxFields

Used in:

wholeStageSplitConsumeFuncByOperator

spark.sql.codegen.splitConsumeFuncByOperator

Used when CodegenSupport is requested to consume

wholeStageUseIdInClassName

spark.sql.codegen.useIdInClassName

Used when WholeStageCodegenExec is requested to generate the Java source code for the child physical plan subtree (when created)

windowExecBufferInMemoryThreshold

spark.sql.windowExec.buffer.in.memory.threshold

Used when WindowExec unary physical operator is executed

windowExecBufferSpillThreshold

spark.sql.windowExec.buffer.spill.threshold

Used when WindowExec unary physical operator is executed

Back to top