HadoopFileLinesReader¶
HadoopFileLinesReader
is a Scala http://www.scala-lang.org/api/2.11.11/#scala.collection.Iterator[Iterator] of Apache Hadoop's https://hadoop.apache.org/docs/r2.7.3/api/org/apache/hadoop/io/Text.html[org.apache.hadoop.io.Text].
HadoopFileLinesReader
is <
SimpleTextSource
LibSVMFileFormat
TextInputCSVDataSource
TextInputJsonDataSource
- TextFileFormat
HadoopFileLinesReader
uses the internal <
Creating Instance¶
HadoopFileLinesReader
takes the following when created:
- [[file]] PartitionedFile
- [[conf]] Hadoop's
Configuration
=== [[iterator]] iterator
Internal Property
[source, scala]¶
iterator: RecordReaderIterator[Text]¶
When <HadoopFileLinesReader
creates an internal iterator
that uses Hadoop's https://hadoop.apache.org/docs/r2.7.3/api/org/apache/hadoop/mapreduce/lib/input/FileSplit.html[org.apache.hadoop.mapreduce.lib.input.FileSplit] with Hadoop's https://hadoop.apache.org/docs/r2.7.3/api/org/apache/hadoop/fs/Path.html[org.apache.hadoop.fs.Path] and <
iterator
creates Hadoop's TaskAttemptID
, TaskAttemptContextImpl
and LineRecordReader
.
iterator
initializes LineRecordReader
and passes it on to a RecordReaderIterator.
NOTE: iterator
is used for Iterator
-specific methods, i.e. hasNext
, next
and close
.