Skip to content

PartitioningAwareFileIndex

PartitioningAwareFileIndex is an extension of the FileIndex abstraction for indices that are aware of partitioned tables.

Contract

leafDirToChildrenFiles

leafDirToChildrenFiles: Map[Path, Array[FileStatus]]

Used for files matching filters, all files and infer partitioning

Leaf Files

leafFiles: mutable.LinkedHashMap[Path, FileStatus]

Used for all files and base locations

PartitionSpec

partitionSpec(): PartitionSpec

Partition specification with partition columns and values, and directories (as Hadoop Paths)

Used for a partition schema, to list the files matching filters and all files

Implementations

Creating Instance

PartitioningAwareFileIndex takes the following to be created:

  • SparkSession
  • Options for partition discovery (Map[String, String])
  • Optional User-Defined Schema
  • FileStatusCache (default: NoopCache)
Abstract Class

PartitioningAwareFileIndex is an abstract class and cannot be created directly. It is created indirectly for the concrete PartitioningAwareFileIndexes.

All Files

allFiles(): Seq[FileStatus]

allFiles...FIXME

allFiles is used when:

Files Matching Filters

listFiles(
  partitionFilters: Seq[Expression],
  dataFilters: Seq[Expression]): Seq[PartitionDirectory]

listFiles...FIXME

listFiles is part of the FileIndex abstraction.

Partition Schema

partitionSchema: StructType

partitionSchema gives the partitionColumns of the partition specification.

partitionSchema is part of the FileIndex abstraction.

Input Files

inputFiles: Array[String]

inputFiles requests all the files for their location (as Hadoop Paths converted to Strings).

inputFiles is part of the FileIndex abstraction.

Size

sizeInBytes: Long

sizeInBytes sums up the length (in bytes) of all the files.

sizeInBytes is part of the FileIndex abstraction.

Inferring Partitioning

inferPartitioning(): PartitionSpec

inferPartitioning...FIXME

inferPartitioning is used by the PartitioningAwareFileIndexes.

Base Locations

basePaths: Set[Path]

basePaths...FIXME

basePaths is used to infer partitioning.


Last update: 2021-05-30
Back to top