SessionState — State Separation Layer Between SparkSessions¶
SessionState
is a state separation layer between Spark SQL sessions, including SQL configuration, tables, functions, UDFs, SQL parser, and everything else that depends on a SQLConf.
Attributes¶
ColumnarRules¶
columnarRules: Seq[ColumnarRule]
ExecutionListenerManager¶
listenerManager: ExecutionListenerManager
ExperimentalMethods¶
experimentalMethods: ExperimentalMethods
FunctionRegistry¶
functionRegistry: FunctionRegistry
Logical Analyzer¶
analyzer: Analyzer
Initialized lazily (only when requested the first time) using the analyzerBuilder factory function.
Logical Optimizer¶
optimizer: Optimizer
Logical Optimizer that is created using the optimizerBuilder function (and cached for later usage)
Used when:
QueryExecution
is requested to create an optimized logical plan- (Structured Streaming)
IncrementalExecution
is requested to create an optimized logical plan
ParserInterface¶
sqlParser: ParserInterface
SessionCatalog¶
catalog: SessionCatalog
SessionCatalog that is created using the catalogBuilder function (and cached for later usage).
SessionResourceLoader¶
resourceLoader: SessionResourceLoader
Spark Query Planner¶
planner: SparkPlanner
SQLConf¶
conf: SQLConf
StreamingQueryManager¶
streamingQueryManager: StreamingQueryManager
UDFRegistration¶
udfRegistration: UDFRegistration
Creating Instance¶
SessionState
takes the following to be created:
- SQLConf
- ExperimentalMethods
- FunctionRegistry
- UDFRegistration
- Function to build a SessionCatalog (
() => SessionCatalog
) - ParserInterface
- Function to build a Analyzer (
() => Analyzer
) - Function to build a Logical Optimizer (
() => Optimizer
) - SparkPlanner
- Function to build a
StreamingQueryManager
(() => StreamingQueryManager
) - ExecutionListenerManager
- Function to build a
SessionResourceLoader
(() => SessionResourceLoader
) - Function to build a QueryExecution (
LogicalPlan => QueryExecution
) -
SessionState
Clone Function ((SparkSession, SessionState) => SessionState
) - ColumnarRules
SessionState
is created when SparkSession
is requested to instantiateSessionState (when requested for the SessionState per spark.sql.catalogImplementation configuration property).
Note
When requested for the SessionState, SparkSession
uses spark.sql.catalogImplementation configuration property to load and create a BaseSessionStateBuilder that is then requested to create a SessionState instance.
There are two BaseSessionStateBuilders
available:
- (default) SessionStateBuilder for
in-memory
catalog - HiveSessionStateBuilder for
hive
catalog
hive
catalog is set when the SparkSession
was created with the Hive support enabled (using Builder.enableHiveSupport).
Creating QueryExecution For LogicalPlan¶
executePlan(
plan: LogicalPlan): QueryExecution
executePlan
uses the createQueryExecution function to create a QueryExecution for the given LogicalPlan.
Creating New Hadoop Configuration¶
newHadoopConf(): Configuration
newHadoopConf
returns a new Hadoop Configuration (with the SparkContext.hadoopConfiguration
and all the configuration properties of the SQLConf).
Creating New Hadoop Configuration With Extra Options¶
newHadoopConfWithOptions(
options: Map[String, String]): Configuration
newHadoopConfWithOptions
creates a new Hadoop Configuration with the input options
set (except path
and paths
options that are skipped).
newHadoopConfWithOptions
is used when:
TextBasedFileFormat
is requested to say whether it is splitable or notFileSourceScanExec
physical operator is requested for the input RDD- InsertIntoHadoopFsRelationCommand logical command is executed
PartitioningAwareFileIndex
is requested for the Hadoop Configuration
Accessing SessionState¶
SessionState
is available as the SparkSession.sessionState.
import org.apache.spark.sql.SparkSession
assert(spark.isInstanceOf[SparkSession])
// object SessionState in package org.apache.spark.sql.internal cannot be accessed directly
scala> :type spark.sessionState
org.apache.spark.sql.internal.SessionState