-
Notifications
You must be signed in to change notification settings - Fork 980
Field Readers and Complex Writers
The holder classes discussed thus far are all you need for the vast majority of your UDFs. However, there are two special cases you may need for unusual needs: the FieldReader
and ComplexWriter
classes.
The FieldReader
class provides a generic way to read any vector without having to declare a type-specific Holder. Drill itself provides the perfect example: the typeOf()
function that takes any type as input and returns the name of the type. You'll find it in org.apache.drill.exec.expr.fn.impl.UnionFunctions
. Here is a simplified version:
@FunctionTemplate(names = {"typeOf"},
scope = FunctionTemplate.FunctionScope.SIMPLE,
nulls = NullHandling.INTERNAL)
public static class GetType implements DrillSimpleFunc {
@Param FieldReader input;
@Output VarCharHolder out;
@Inject DrillBuf buf;
@Override
public void eval() {
String typeName = input.getType().getMinorType().name();
...
}
}
Here we are concerned with just the (major) type as returned from FieldReader.getType()
. The major type includes the actual data type (the so-called "minor type") and type parameters such as precision and scale for decimal types, nested fields for maps and so on.
You can also use the FieldReader
to read data by casting the reader to the proper subtype. Use your IDE to explore the various interfaces which FieldReader
implements. However, if you go this route, you'll find that you'll be creating nested case statements for every data type and cardinality ("mode").
To avoid this dynamic run-time checking, Drill typically uses FreeMarker to generate different versions of a function for each type. See exec/java-exec/src/codegen/templates
for a wide range of examples.
The @Output
holders work for most Drill types, but they do not work for maps. In Drill, a map is a nested tuple: each map has the same structure as the top-level row. A Map in Drill is like a struct
in Impala or Hive: a collection of columns with fixed schema.
The ComplexWriter
lets you define and write to these fields.
The ComplexWriter
also lets you write to an array of maps (that is, a Repeated Map
).
A good example is the Drill mappify()
(AKA kvgen
) function defined in org.apache.drill.exec.expr.fn.impl.Mappify
:
@FunctionTemplate(names = {"mappify", "kvgen"}, scope = FunctionTemplate.FunctionScope.SIMPLE, nulls = FunctionTemplate.NullHandling.NULL_IF_NULL, isRandom = true)
public static class ConvertMapToKeyValuePairs implements DrillSimpleFunc {
@Param FieldReader reader;
@Inject DrillBuf buffer;
@Output ComplexWriter writer;
public void setup() {
}
public void eval() {
buffer = org.apache.drill.exec.expr.fn.impl.MappifyUtility.mappify(reader, writer, buffer);
}
}
If we examine the Java implementation of mappify
we'll see the code that creates columns and populates them:
public static DrillBuf mappify(FieldReader reader, BaseWriter.ComplexWriter writer, DrillBuf buffer) {
...
BaseWriter.ListWriter listWriter = writer.rootAsList();
listWriter.startList();
BaseWriter.MapWriter mapWriter = listWriter.map();
// Iterate over the fields in the map
Iterator<String> fieldIterator = reader.iterator();
while (fieldIterator.hasNext()) {
String str = fieldIterator.next();
FieldReader fieldReader = reader.reader(str);
...
// writing a new field, start a new map
mapWriter.start();
// write "key":"columnname" into the map
VarCharHolder vh = new VarCharHolder();
...
mapWriter.varChar(fieldKey).write(vh);
// Write the value to the map
MapUtility.writeToMapFromReader(fieldReader, mapWriter);
mapWriter.end();
}
listWriter.endList();
return buffer;
}
The use of the ComplexWriter
is quite advanced, but it is the only way to go if you need to create a function that emits a map (or list of maps) as its output value.