-
Notifications
You must be signed in to change notification settings - Fork 980
Record Readers
Record readers read data from files in DFS, converting the data into a series of value vectors. A reader is associated with a FormatPlugin
and defined by a FormatPluginConfig
. Each format plugin is associated with a StoragePlugin
which provides access to the file system which stores the data read by the storage plugin. Each format plugin can also define a RecordWriter
to support CTAS operations.
Each record reader instance is associated with a single file (or portion of a file) in the file system defined by the storage plugin.
The actual RecordReader
API is quite simple:
void setup(OperatorContext context, OutputMutator output) throws ExecutionSetupException;
void allocate(Map<String, ValueVector> vectorMap) throws OutOfMemoryException;
int next();
The setup()
method ...
The allocate()
method ...
The next()
method reads a fixed number of records into a previously-allocated record batch (set of value vectors.) Each call to next()
returns a new schema, uses the existing schema, or signals EOF (by returning 0). Note that each schema change must occur at record batch boundaries.