Skip to content


[[shortName]] TextFileFormat is a TextBasedFileFormat for text format."text").load("text-datasets")

// or the same as above using a shortcut"text-datasets")

TextFileFormat uses <> while loading a dataset.

[[options]] [[TextOptions]] .TextFileFormat's Options [cols="1,1,3",options="header",width="100%"] |=== | Option | Default Value | Description

| [[compression]] compression | a| Compression codec that can be either one of the known aliases or a fully-qualified class name.

| [[wholetext]] wholetext | false | Enables loading a file as a single row (i.e. not splitting by "\n") |===

=== [[prepareWrite]] prepareWrite Method

[source, scala]

prepareWrite( sparkSession: SparkSession, job: Job, options: Map[String, String], dataSchema: StructType): OutputWriterFactory


prepareWrite is part of FileFormat abstraction.

=== [[buildReader]] Building Partitioned Data Reader -- buildReader Method

[source, scala]

buildReader( sparkSession: SparkSession, dataSchema: StructType, partitionSchema: StructType, requiredSchema: StructType, filters: Seq[Filter], options: Map[String, String], hadoopConf: Configuration): (PartitionedFile) => Iterator[InternalRow]


buildReader is part of FileFormat abstraction.

=== [[readToUnsafeMem]] readToUnsafeMem Internal Method

[source, scala]

readToUnsafeMem( conf: Broadcast[SerializableConfiguration], requiredSchema: StructType, wholeTextMode: Boolean): (PartitionedFile) => Iterator[UnsafeRow]


readToUnsafeMem is used when TextFileFormat is requested to buildReader

Last update: 2020-11-16