Skip to content

Configuration Properties

Configuration properties (aka settings) allow you to fine-tune a Spark Structured Streaming application.

The Internals of Spark SQL

Learn more about Configuration Properties in The Internals of Spark SQL.

spark.sql.streaming.commitProtocolClass

(internal) FileCommitProtocol to use for writing out micro-batches in FileStreamSink.

Default: org.apache.spark.sql.execution.streaming.ManifestFileCommitProtocol

Use SQLConf.streamingFileCommitProtocolClass to access the current value.

The Internals of Apache Spark

Learn more on FileCommitProtocol in The Internals of Apache Spark.

spark.sql.streaming.metricsEnabled

Enables streaming metrics

Default: false

Use SQLConf.streamingMetricsEnabled to access the current value.

spark.sql.streaming.fileSink.log.cleanupDelay

(internal) How long (in millis) that a file is guaranteed to be visible for all readers.

Default: 10 minutes

Use SQLConf.fileSinkLogCleanupDelay to access the current value.

spark.sql.streaming.fileSink.log.deletion

(internal) Whether to delete the expired log files in file stream sink

Default: true

Use SQLConf.fileSinkLogDeletion to access the current value.

spark.sql.streaming.fileSink.log.compactInterval

(internal) Number of log files after which all the previous files are compacted into the next log file

Default: 10

Use SQLConf.fileSinkLogCompactInterval to access the current value.

spark.sql.streaming.minBatchesToRetain

(internal) Minimum number of batches that must be retained and made recoverable

Stream execution engines discard (purge) offsets from the offsets metadata log when the current batch ID (in MicroBatchExecution) or the epoch committed (in ContinuousExecution) is above the threshold.

Default: 100

Use SQLConf.minBatchesToRetain to access the current value.

spark.sql.streaming.aggregation.stateFormatVersion

(internal) Version of the state format

Default: 2

Supported values:

Used when StatefulAggregationStrategy execution planning strategy is executed (and plans a streaming query with an aggregate that simply boils down to creating a StateStoreRestoreExec with the proper implementation version of StreamingAggregationStateManager)

Among the checkpointed properties that are not supposed to be overriden after a streaming query has once been started (and could later recover from a checkpoint after being restarted)

spark.sql.streaming.checkpointFileManagerClass

(internal) CheckpointFileManager to use to write checkpoint files atomically

Default: FileContextBasedCheckpointFileManager (with FileSystemBasedCheckpointFileManager in case of unsupported file system used for storing metadata files)

spark.sql.streaming.checkpointLocation

Default checkpoint directory for storing checkpoint data

Default: (empty)

spark.sql.streaming.continuous.executorQueueSize

(internal) The size (measured in number of rows) of the queue used in continuous execution to buffer the results of a ContinuousDataReader.

Default: 1024

spark.sql.streaming.continuous.executorPollIntervalMs

(internal) The interval (in millis) at which continuous execution readers will poll to check whether the epoch has advanced on the driver.

Default: 100 (ms)

spark.sql.streaming.disabledV2MicroBatchReaders

(internal) A comma-separated list of fully-qualified class names of data source providers for which MicroBatchStream is disabled. Reads from these sources will fall back to the V1 Sources.

Default: (empty)

Use SQLConf.disabledV2StreamingMicroBatchReaders to get the current value.

spark.sql.streaming.fileSource.log.cleanupDelay

(internal) How long (in millis) a file is guaranteed to be visible for all readers.

Default: 10 (minutes)

Use SQLConf.fileSourceLogCleanupDelay to get the current value.

spark.sql.streaming.fileSource.log.compactInterval

(internal) Number of log files after which all the previous files are compacted into the next log file.

Default: 10

Must be a positive value (greater than 0)

Use SQLConf.fileSourceLogCompactInterval to get the current value.

spark.sql.streaming.fileSource.log.deletion

(internal) Whether to delete the expired log files in file stream source

Default: true

Use SQLConf.fileSourceLogDeletion to get the current value.

spark.sql.streaming.flatMapGroupsWithState.stateFormatVersion

(internal) State format version used to create a StateManager for FlatMapGroupsWithStateExec physical operator

Default: 2

Supported values:

  • 1
  • 2

Among the checkpointed properties that are not supposed to be overriden after a streaming query has once been started (and could later recover from a checkpoint after being restarted)

spark.sql.streaming.maxBatchesToRetainInMemory

(internal) The maximum number of batches which will be retained in memory to avoid loading from files.

Default: 2

Maximum count of versions a State Store implementation should retain in memory.

The value adjusts a trade-off between memory usage vs cache miss:

  • 2 covers both success and direct failure cases
  • 1 covers only success case
  • 0 or negative value disables cache to maximize memory size of executors

Used when HDFSBackedStateStoreProvider is requested to initialize.

spark.sql.streaming.multipleWatermarkPolicy

Global watermark policy that is the policy to calculate the global watermark value when there are multiple watermark operators in a streaming query

Default: min

Supported values:

  • min - chooses the minimum watermark reported across multiple operators
  • max - chooses the maximum across multiple operators

Cannot be changed between query restarts from the same checkpoint location.

spark.sql.streaming.noDataMicroBatches.enabled

Flag to control whether the streaming micro-batch engine should execute batches with no data to process for eager state management for stateful streaming queries (true) or not (false).

Default: true

Use SQLConf.streamingNoDataMicroBatchesEnabled to get the current value

spark.sql.streaming.noDataProgressEventInterval

(internal) How long to wait between two progress events when there is no data (in millis) when ProgressReporter is requested to finish a trigger

Default: 10000L

Use SQLConf.streamingNoDataProgressEventInterval to get the current value

spark.sql.streaming.numRecentProgressUpdates

Number of StreamingQueryProgresses to retain in progressBuffer internal registry when ProgressReporter is requested to update progress of streaming query

Default: 100

Use SQLConf.streamingProgressRetention to get the current value

spark.sql.streaming.pollingDelay

(internal) How long (in millis) to delay StreamExecution before polls for new data when no data was available in a batch

Default: 10 (milliseconds)

spark.sql.streaming.stateStore.maintenanceInterval

The initial delay and how often to execute StateStore's maintenance task.

Default: 60s

spark.sql.streaming.stateStore.minDeltasForSnapshot

(internal) Minimum number of state store delta files that need to be generated before HDFSBackedStateStore will consider generating a snapshot (consolidate the deltas into a snapshot)

Default: 10

Use SQLConf.stateStoreMinDeltasForSnapshot to get the current value.

spark.sql.streaming.stateStore.providerClass

(internal) The fully-qualified class name of the StateStoreProvider implementation that manages state data in stateful streaming queries. This class must have a zero-arg constructor.

Default: HDFSBackedStateStoreProvider

Use SQLConf.stateStoreProviderClass to get the current value.

spark.sql.streaming.unsupportedOperationCheck

(internal) When enabled (true), StreamingQueryManager makes sure that the logical plan of a streaming query uses supported operations only

Default: true


Last update: 2021-02-07