-
Notifications
You must be signed in to change notification settings - Fork 980
EVF Introduction
Paul Rogers edited this page Apr 28, 2019
·
4 revisions
The Enhanced Vector Framework (EVF) provides a number of services to write to, and read from vectors. These services include:
- Column Accessors: a set of JSON-like readers and writers for each kind of Drill vector.
- Metadata Framework: An enhanced set of classes to describe a Drill schema.
- Row Set Framework: a simple mechanism to create and read individual record batches, typically used for unit testing.
- Result Set Loader: An extended version of the row set writer that manages memory usage, performs projection and more. Used thus far to build scan operators and record readers.
- Projection Framework: Handles the complete set of tasks typical of scan operators.
- Operator Framework A refactored, simplified version of the Record Batch structure (which, despite its name, is actually an operator.)
- Scan Framework Combines all of the above into a single mechanism to build a scan operator. You just add configuration and the format-specific code to load data into vectors.
This may seem like quite a bit to learn. Fortunately, for the most part, you set a few options and let the framework handle the rest. Where you must provide code (to create an operator, or write a batch reader), the framework makes that code as simple as possible by factoring out common tasks.
This guide is a high-level introduction to get you started. Much detailed information is available:
- Row Set Framework: An earlier, simpler version of this material.
- Batch Handling Upgrades: The design documents for this mechanism that explains how they work internally.
- Extensive Java Package Javadoc (also here and here): Explains the usage and design of key EVF components.
Examples are often very useful to learn a new mechanism. Here are a few:
- Example Unit Test: Shows how to use the Row Set framework for tests. See also the many unit tests that now use the framework.
- Text Reader v.3: Shows a conversion of the "compliant" (CSV) text reader to EVF.
The EVF is "enhanced" because it builds on quite a few mechanisms already in Drill including:
- The "complex" readers and writers used, among other places, by UDFs. (The complex writers use similar JSON-like API, but are a bit less flexible, and do not enforce vector memory limits.)
- The column explorer which handles implicit file metadata columns and partition directory columns.
- The "first generation" Scan operator which demonstrated the many tasks required for scanners and readers to correctly manage value vectors.