Skip to content

JdbcUtils Utility

JdbcUtils is an utility to support JDBCRDD, JDBCRelation and JdbcRelationProvider.

[[methods]] .JdbcUtils API [cols="1,2",options="header",width="100%"] |=== | Name | Description

| <> a| Used when:

<>
<>
<>

| <> | Replaces data types in a table schema

Used exclusively when JDBCRelation is datasources/jdbc/JDBCRelation.md#schema[created] (and JDBCOptions.md#customSchema[customSchema] JDBC option was defined)

<>

| <> | Used when JDBCRDD is requested to resolveTable

| <> | Used when JdbcRelationProvider is requested to write the rows of a structured query (a DataFrame) to a table

| <> | Used when...FIXME

| <> | Used when JDBCRDD is requested to compute a partition

<>
<>

| <> | Used when JdbcRelationProvider is requested to write the rows of a structured query (a DataFrame) to a table

| <> | Used when...FIXME |===

createConnectionFactory

createConnectionFactory(options: JDBCOptions): () => Connection

createConnectionFactory...FIXME

createConnectionFactory is used when:

=== [[getCommonJDBCType]] getCommonJDBCType Method

[source, scala]

getCommonJDBCType(dt: DataType): Option[JdbcType]

getCommonJDBCType...FIXME

NOTE: getCommonJDBCType is used when...FIXME

=== [[getCatalystType]] getCatalystType Internal Method

[source, scala]

getCatalystType( sqlType: Int, precision: Int, scale: Int, signed: Boolean): DataType


getCatalystType...FIXME

NOTE: getCatalystType is used when...FIXME

=== [[getSchemaOption]] getSchemaOption Method

[source, scala]

getSchemaOption(conn: Connection, options: JDBCOptions): Option[StructType]

getSchemaOption...FIXME

NOTE: getSchemaOption is used when...FIXME

=== [[getSchema]] getSchema Method

[source, scala]

getSchema( resultSet: ResultSet, dialect: JdbcDialect, alwaysNullable: Boolean = false): StructType


getSchema...FIXME

NOTE: getSchema is used when...FIXME

=== [[resultSetToRows]] resultSetToRows Method

[source, scala]

resultSetToRows(resultSet: ResultSet, schema: StructType): Iterator[Row]

resultSetToRows...FIXME

NOTE: resultSetToRows is used when...FIXME

=== [[resultSetToSparkInternalRows]] resultSetToSparkInternalRows Method

[source, scala]

resultSetToSparkInternalRows( resultSet: ResultSet, schema: StructType, inputMetrics: InputMetrics): Iterator[InternalRow]


resultSetToSparkInternalRows...FIXME

NOTE: resultSetToSparkInternalRows is used when...FIXME

=== [[schemaString]] schemaString Method

[source, scala]

schemaString( df: DataFrame, url: String, createTableColumnTypes: Option[String] = None): String


schemaString...FIXME

NOTE: schemaString is used exclusively when JdbcUtils is requested to <>.

=== [[parseUserSpecifiedCreateTableColumnTypes]] parseUserSpecifiedCreateTableColumnTypes Internal Method

[source, scala]

parseUserSpecifiedCreateTableColumnTypes( df: DataFrame, createTableColumnTypes: String): Map[String, String]


parseUserSpecifiedCreateTableColumnTypes...FIXME

NOTE: parseUserSpecifiedCreateTableColumnTypes is used exclusively when JdbcUtils is requested to <>.

=== [[saveTable]] saveTable Method

[source, scala]

saveTable( df: DataFrame, tableSchema: Option[StructType], isCaseSensitive: Boolean, options: JDBCOptions): Unit


saveTable takes the url, table, batchSize, isolationLevel options and createConnectionFactory.

saveTable <>.

saveTable takes the numPartitions option and applies coalesce operator to the input DataFrame if the number of partitions of its RDD is less than the numPartitions option.

In the end, saveTable requests the possibly-repartitioned DataFrame for its <> (it may have changed after the <> operator) and executes <> for every partition (using RDD.foreachPartition).

saveTable is used when JdbcRelationProvider is requested to write the rows of a structured query (a DataFrame) to a table.

=== [[getCustomSchema]] Replacing Data Types In Table Schema -- getCustomSchema Method

[source, scala]

getCustomSchema( tableSchema: StructType, customSchema: String, nameEquality: Resolver): StructType


getCustomSchema replaces the data type of the fields in the input tableSchema schema that are included in the input customSchema (if defined).

Internally, getCustomSchema branches off per the input customSchema.

If the input customSchema is undefined or empty, getCustomSchema simply returns the input tableSchema unchanged.

Otherwise, if the input customSchema is not empty, getCustomSchema requests CatalystSqlParser to spark-sql-AbstractSqlParser.md#parseTableSchema[parse it] (i.e. create a new StructType for the given customSchema canonical schema representation).

getCustomSchema then uses SchemaUtils to spark-sql-SchemaUtils.md#checkColumnNameDuplication[checkColumnNameDuplication] (in the column names of the user-defined customSchema schema with the input nameEquality).

In the end, getCustomSchema replaces the data type of the fields in the input tableSchema that are included in the input userSchema.

NOTE: getCustomSchema is used exclusively when JDBCRelation is datasources/jdbc/JDBCRelation.md#schema[created] (and JDBCOptions.md#customSchema[customSchema] JDBC option was defined).

=== [[dropTable]] dropTable Method

[source, scala]

dropTable(conn: Connection, table: String): Unit

dropTable...FIXME

NOTE: dropTable is used when...FIXME

=== [[createTable]] Creating Table Using JDBC -- createTable Method

[source, scala]

createTable( conn: Connection, df: DataFrame, options: JDBCOptions): Unit


createTable <> (given the input DataFrame with the <> and <> options).

createTable uses the <> and <> options.

In the end, createTable concatenates all the above texts into a CREATE TABLE [table] ([strSchema]) [createTableOptions] SQL DDL statement followed by executing it (using the input JDBC Connection).

createTable is used when JdbcRelationProvider is requested to write the rows of a structured query (a DataFrame) to a table.

=== [[getInsertStatement]] getInsertStatement Method

[source, scala]

getInsertStatement( table: String, rddSchema: StructType, tableSchema: Option[StructType], isCaseSensitive: Boolean, dialect: JdbcDialect): String


getInsertStatement...FIXME

NOTE: getInsertStatement is used when...FIXME

=== [[getJdbcType]] getJdbcType Internal Method

[source, scala]

getJdbcType(dt: DataType, dialect: JdbcDialect): JdbcType

getJdbcType...FIXME

NOTE: getJdbcType is used when...FIXME

=== [[tableExists]] tableExists Method

[source, scala]

tableExists(conn: Connection, options: JDBCOptions): Boolean

tableExists...FIXME

tableExists is used when JdbcRelationProvider is requested to write the rows of a structured query (a DataFrame) to a table.

=== [[truncateTable]] truncateTable Method

[source, scala]

truncateTable(conn: Connection, options: JDBCOptions): Unit

truncateTable...FIXME

truncateTable is used when JdbcRelationProvider is requested to write the rows of a structured query (a DataFrame) to a table.

=== [[savePartition]] Saving Rows (Per Partition) to Table -- savePartition Method

[source, scala]

savePartition( getConnection: () => Connection, table: String, iterator: Iterator[Row], rddSchema: StructType, insertStmt: String, batchSize: Int, dialect: JdbcDialect, isolationLevel: Int): Iterator[Byte]


savePartition creates a JDBC Connection using the input getConnection function.

savePartition tries to set the input isolationLevel if it is different than TRANSACTION_NONE and the database supports transactions.

savePartition then writes rows (in the input Iterator[Row]) using batches that are submitted after batchSize rows where added.

NOTE: savePartition is used exclusively when JdbcUtils is requested to <>.

Back to top