Skip to content
Paul Rogers edited this page Apr 28, 2019 · 4 revisions

The Enhanced Vector Framework (EVF) provides a number of services to write to, and read from vectors. These services include:

  • Column Accessors: a set of JSON-like readers and writers for each kind of Drill vector.
  • Metadata Framework: An enhanced set of classes to describe a Drill schema.
  • Row Set Framework: a simple mechanism to create and read individual record batches, typically used for unit testing.
  • Result Set Loader: An extended version of the row set writer that manages memory usage, performs projection and more. Used thus far to build scan operators and record readers.
  • Projection Framework: Handles the complete set of tasks typical of scan operators.
  • Operator Framework A refactored, simplified version of the Record Batch structure (which, despite its name, is actually an operator.)
  • Scan Framework Combines all of the above into a single mechanism to build a scan operator. You just add configuration and the format-specific code to load data into vectors.

This may seem like quite a bit to learn. Fortunately, for the most part, you set a few options and let the framework handle the rest. Where you must provide code (to create an operator, or write a batch reader), the framework makes that code as simple as possible by factoring out common tasks.

This guide is a high-level introduction to get you started. Much detailed information is available:

Examples are often very useful to learn a new mechanism. Here are a few:

  • Example Unit Test: Shows how to use the Row Set framework for tests. See also the many unit tests that now use the framework.
  • Text Reader v.3: Shows a conversion of the "compliant" (CSV) text reader to EVF.

The EVF is "enhanced" because it builds on quite a few mechanisms already in Drill including:

  • The "complex" readers and writers used, among other places, by UDFs. (The complex writers use similar JSON-like API, but are a bit less flexible, and do not enforce vector memory limits.)
  • The column explorer which handles implicit file metadata columns and partition directory columns.
  • The "first generation" Scan operator which demonstrated the many tasks required for scanners and readers to correctly manage value vectors.
Clone this wiki locally