Data Types¶
DataType
abstract class is the base type of all built-in data types in Spark SQL, e.g. strings, longs.
DataType
has two main type families:
- <
> as an internal type to represent types that are not null
, UDTs, arrays, structs, and maps - [[NumericType]] Numeric Types with <
> and < > types
[[standard-types]] .Standard Data Types [align="center",width="100%",options="header",cols="1,1,1"] |=== .| Type Family | Data Type | Scala Types
.5+.| Atomic Types
(except <BinaryType
|
[[BooleanType]] BooleanType |
---|
[[DateType]] DateType |
---|
[[StringType]] StringType |
---|
| [[TimestampType]] TimestampType
| java.sql.Timestamp
.3+.| [[FractionalType]] Fractional Types
(concrete <DecimalType
|
[[DoubleType]] DoubleType |
---|
[[FloatType]] FloatType |
---|
.4+.| [[IntegralType]] Integral Types
(concrete <ByteType
|
[[IntegerType]] IntegerType |
---|
[[LongType]] LongType |
---|
[[ShortType]] ShortType |
---|
.8+.| | [[ArrayType]] ArrayType
|
[[CalendarIntervalType]] CalendarIntervalType |
---|
[[MapType]] MapType |
---|
[[NullType]] NullType |
---|
[[ObjectType]] ObjectType |
---|
[[StructType]] StructType |
---|
[[UserDefinedType]] UserDefinedType |
---|
| [[AnyDataType]] AnyDataType
| Matches any concrete data type |===
CAUTION: FIXME What about AbstractDataType?
You can extend the type system and create your own <
The <
NOTE: DataType
(and the concrete Spark SQL types) live in org.apache.spark.sql.types
package.
[source, scala]¶
import org.apache.spark.sql.types.StringType
scala> StringType.json res0: String = "string"
scala> StringType.sql res1: String = STRING
scala> StringType.catalogString res2: String = string
You should use <
[source, scala]¶
import org.apache.spark.sql.types.DataTypes
scala> val arrayType = DataTypes.createArrayType(BooleanType) arrayType: org.apache.spark.sql.types.ArrayType = ArrayType(BooleanType,true)
scala> val mapType = DataTypes.createMapType(StringType, LongType) mapType: org.apache.spark.sql.types.MapType = MapType(StringType,LongType,true)
DataType
has support for Scala's pattern matching using unapply
method.
[source, scala]¶
???¶
=== [[contract]] DataType Contract
Any type in Spark SQL follows the DataType
contract which means that the types define the following methods:
- [[json]]
json
andprettyJson
to build JSON representations of a data type defaultSize
to know the default size of values of a typesimpleString
andcatalogString
to build user-friendly string representations (with the latter for external catalogs)sql
to build SQL representation
[source, scala]¶
import org.apache.spark.sql.types.DataTypes._
val maps = StructType( StructField("longs2strings", createMapType(LongType, StringType), false) :: Nil)
scala> maps.prettyJson res0: String = { "type" : "struct", "fields" : [ { "name" : "longs2strings", "type" : { "type" : "map", "keyType" : "long", "valueType" : "string", "valueContainsNull" : true }, "nullable" : false, "metadata" : { } } ] }
scala> maps.defaultSize res1: Int = 2800
scala> maps.simpleString res2: String = struct
scala> maps.catalogString res3: String = struct
scala> maps.sql res4: String = STRUCT<longs2strings
: MAP
=== [[DataTypes]] DataTypes -- Factory Methods for Data Types
DataTypes
is a Java class with methods to access simple or create complex DataType
types in Spark SQL, i.e. arrays and maps.
TIP: It is recommended to use DataTypes
class to define DataType
types in a schema.
DataTypes
lives in org.apache.spark.sql.types
package.
[source, scala]¶
import org.apache.spark.sql.types.DataTypes
scala> val arrayType = DataTypes.createArrayType(BooleanType) arrayType: org.apache.spark.sql.types.ArrayType = ArrayType(BooleanType,true)
scala> val mapType = DataTypes.createMapType(StringType, LongType) mapType: org.apache.spark.sql.types.MapType = MapType(StringType,LongType,true)
[NOTE]¶
Simple DataType
types themselves, i.e. StringType
or CalendarIntervalType
, come with their own Scala's case object
s alongside their definitions.
You may also import the types
package and have access to the types.
[source, scala]¶
import org.apache.spark.sql.types._¶
====
=== [[user-defined-types]] UDTs -- User-Defined Types
CAUTION: FIXME
=== [[fromDDL]] Converting DDL into DataType -- fromDDL
Utility
[source, scala]¶
fromDDL(ddl: String): DataType¶
fromDDL
...FIXME
NOTE: fromDDL
is used when...FIXME