Skip to content


FileStreamSourceLog is a concrete CompactibleFileStreamLog (of FileEntry metadata) of FileStreamSource.

FileStreamSourceLog uses a fixed-size <> of metadata of compaction batches.

[[defaultCompactInterval]] FileStreamSourceLog uses <> configuration property (default: 10) for the default compaction interval.

[[fileCleanupDelayMs]] FileStreamSourceLog uses <> configuration property (default: 10 minutes) for the fileCleanupDelayMs.

[[isDeletingExpiredLog]] FileStreamSourceLog uses <> configuration property (default: true) for the isDeletingExpiredLog.

Creating Instance

FileStreamSourceLog (like the parent CompactibleFileStreamLog) takes the following to be created:

  • [[metadataLogVersion]] Metadata version
  • [[sparkSession]] SparkSession
  • [[path]] Path of the metadata log directory

=== [[add]] Storing (Adding) Metadata of Streaming Batch -- add Method

[source, scala]

add( batchId: Long, logs: Array[FileEntry]): Boolean

add requests the parent CompactibleFileStreamLog to store metadata (possibly compacting logs if the batch is compaction).

If so (and this is a compation batch), add adds the batch and the logs to <> internal registry (and possibly removing the eldest entry if the size is above the <>).

add is part of the MetadataLog abstraction.

=== [[get]][[get-range]] get Method

[source, scala]

get( startId: Option[Long], endId: Option[Long]): Array[(Long, Array[FileEntry])]


get is part of the MetadataLog abstraction.

Internal Properties

[cols="30m,70",options="header",width="100%"] |=== | Name | Description

| cacheSize a| [[cacheSize]] Size of the <> that is exactly the compact interval

Used when the <> is requested to add a new entry in <> and <> a compaction batch

| fileEntryCache a| [[fileEntryCache]] Metadata of a streaming batch (FileEntry) per batch ID (LinkedHashMap[Long, Array[FileEntry]]) of size configured using the <>

Used when <> (for a compaction batch)