Skip to content

FileStreamSinkLog

FileStreamSinkLog is a CompactibleFileStreamLog (of SinkFileStatuses) for FileStreamSink and MetadataLogFileIndex.

FileStreamSinkLog concatenates metadata logs to a single compact file after defined compact interval.

Creating Instance

FileStreamSinkLog (like the parent CompactibleFileStreamLog) takes the following to be created:

  • Version of the Metadata Log
  • SparkSession
  • Path of the Metadata Log

Configuration Properties

spark.sql.streaming.fileSink.log.cleanupDelay

FileStreamSinkLog uses spark.sql.streaming.fileSink.log.cleanupDelay configuration property for fileCleanupDelayMs.

spark.sql.streaming.fileSink.log.compactInterval

FileStreamSinkLog uses spark.sql.streaming.fileSink.log.compactInterval configuration property for defaultCompactInterval.

spark.sql.streaming.fileSink.log.deletion

FileStreamSinkLog uses spark.sql.streaming.fileSink.log.deletion configuration property for isDeletingExpiredLog.

Compacting Logs

compactLogs(
  logs: Seq[SinkFileStatus]): Seq[SinkFileStatus]

compactLogs finds delete actions in the given collection of SinkFileStatuses.

If there are no deletes, compactLogs gives the SinkFileStatuses back (unmodified).

Otherwise, compactLogs removes the deleted paths from the SinkFileStatuses.

compactLogs is part of the CompactibleFileStreamLog abstraction.

Version

FileStreamSinkLog uses 1 for the version.

Actions

Add

FileStreamSinkLog uses add action to create new metadata logs.

Delete

FileStreamSinkLog uses delete action to mark status files to be excluded from compaction.

Important

Delete action is not used in Spark Structured Streaming and will be removed in 3.1.0.


Last update: 2020-11-28