CompactibleFileStreamLog takes the following to be created:
- Version of the Metadata Log
- Path of the Metadata Log
CompactibleFileStreamLog is an abstract class and cannot be created directly. It is created indirectly for the concrete CompactibleFileStreamLogs.
Filtering Out Obsolete Logs¶
compactLogs( logs: Seq[T]): Seq[T]
compactLogs does nothing important in the available implementations. Consider this method a noop.
Default Compact Interval¶
Used for the compact interval
File Cleanup Delay¶
Used for delete expired log entries
Used to store metadata
compact( batchId: Long, logs: Array[T]): Boolean
compact stores the metadata (the filtered metadata files and the input
logs) for the input
compact tracks elapsed time (
compact prints out the following DEBUG message (only when the total elapsed time of
writeElapsedMs are below the unconfigurable
Compacting took [elapsedMs] ms (load: [loadElapsedMs] ms, write: [writeElapsedMs] ms) for compact batch [batchId]
In case the total epased time is above the unconfigurable
compact prints out the following WARN messages:
Compacting took [elapsedMs] ms (load: [loadElapsedMs] ms, write: [writeElapsedMs] ms) for compact batch [batchId] Loaded [allLogs] entries (estimated [allLogs] bytes in memory), and wrote [compactedLogs] entries for compact batch [batchId]
compact throws an
IllegalStateException when one of the metadata files to compact is not valid (not accessible on a file system or of incorrect format):
[batchIdToPath] doesn't exist when compacting batch [batchId] (compactInterval: [compactInterval])
compact is used while storing metadata for streaming batch.
compact File Suffix¶
Storing Metadata for Streaming Batch¶
add( batchId: Long, logs: Array[T]): Boolean
add is part of the MetadataLog abstraction.
Deleting Expired Log Entries¶
deleteExpiredLog( currentBatchId: Long): Unit
compactInterval is the number of metadata log files between compactions.
compactInterval is a Scala lazy value which means that the code to initialize it is executed once only (when accessed for the first time) and cached afterwards.
compactInterval finds compacted IDs and determines the compact interval.
compactInterval requests the CheckpointFileManager for the files in the metadataPath that are batch (and possibly compacted).
compactInterval takes the compacted files only (if available), converts them to batch IDs and sorts in descending order.
compactInterval starts with the default compact interval.
- If there are two compacted IDs, their difference is the compact interval
- If there is one compacted ID only,
compactInterval"derives" the compact interval (FIXME)
compactInterval asserts that the compact interval is a positive value or throws an
compactInterval prints out the following INFO message to the logs (with the defaultCompactInterval):
Set the compact interval to [interval] [defaultCompactInterval: [defaultCompactInterval]]
All Files (Except Deleted)¶
allFiles is used when:
Converting Batch Id to Hadoop Path¶
batchIdToPath( batchId: Long): Path
batchIdToPath is part of the HDFSMetadataLog abstraction.
Converting Hadoop Path to Batch Id¶
pathToBatchId( path: Path): Long
pathToBatchId is part of the HDFSMetadataLog abstraction.
isBatchFile( path: Path): Boolean
true when successful to get the batchId for the given path. Otherwise is
isBatchFile is part of the HDFSMetadataLog abstraction.
Serializing Metadata (Writing Metadata in Serialized Format)¶
serialize( logData: Array[T], out: OutputStream): Unit
serialize writes the version header (
v and the <
serialize then writes the log data (serialized using Json4s (with Jackson binding) library). Entries are separated by new lines.
serialize is part of the HDFSMetadataLog abstraction.
deserialize( in: InputStream): Array[T]
deserialize is part of the HDFSMetadataLog abstraction.
getBatchIdFromFileName( fileName: String): Long
getBatchIdFromFileName simply removes the .compact suffix from the given
fileName and converts the remaining part to a number.
getValidBatchesBeforeCompactionBatch( compactionBatchId: Long, compactInterval: Int): Seq[Long]
getBatchIdFromFileName is used for compaction.