Skip to content

mapr-demos/simple-drill-functions

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Simple Drill Examples

This project contains examples of user defined functions for Apache Drill.

These are packaged as a separate project so that you can clone this project as a head start in creating your own extensions for Drill.

How to Compile Install

Clone this package and compile it.

git clone https://github.com/mapr-demos/simple-drill-functions.git
cd simple-drill-functions
mvn package
cd ..

Now download and unpack Apache Drill.

wget http://getdrill.org/drill/download/apache-drill-1.1.0.tar.gz
tar xvf apache-drill-1.1.0.tar.gz

Copy the jar files from your functions into the 3rdparty directory in the Drill distro

cp simple-drill-functions/target/*.jar apache-drill-1.1.0/jars/3rdparty

Edit the drill-override.conf file to add a reference to the package these functions live in:

echo 'drill.logical.function.package+=[com.mapr.drill]' >> apache-drill-1.1.0/conf/drill-override.conf

Now run drill and test the results

$ cd apache-drill-1.1.0/
$ bin/drill-embedded
0: jdbc:drill:zk=local> select myaddints(position_id, 3) from cp.`employee.json` limit 3;
+---------+
| EXPR$0  |
+---------+
| 4.0     |
| 5.0     |
| 5.0     |
+---------+

Sample UDF's provide

zip(...)

Zip approximately replicates the python zip function, but only for pairs of lists containing numbers.

If you have data like this:

+------+--------------------+--------------------+--------------------+
|  n   |         t          |         x          |         y          |
+------+--------------------+--------------------+--------------------+
| 1.0  | [1.0,2.0,3.0]      | [3.0,2.0,1.0]      | [4.0,5.0,6.0]      |
| 2.0  | [0.0,1.0,2.0,3.0]  | [0.0,3.0,2.0,1.0]  | [7.0,6.0,5.0,4.0]  |
+------+--------------------+--------------------+--------------------+

Then a query like this

 select n, data[0] as t, data[1] as v 
 from (
    select n, flatten( zip(t,x)) as data 
    from dfs.root.`/Users/tdunning/tmp/data.json`);

Will produce these results

+------+------+------+
|  n   |  t   |  v   |
+------+------+------+
| 1.0  | 1.0  | 3.0  |
| 1.0  | 2.0  | 2.0  |
| 1.0  | 3.0  | 1.0  |
| 2.0  | 1.0  | 3.0  |
| 2.0  | 2.0  | 2.0  |
| 2.0  | 3.0  | 1.0  |
| 2.0  | 0.0  | 0.0  |
+------+------+------+

Two, three or four arguments can be used. Thus,

 select n, data[0] as t, data[1] as x, data[2] as y 
 from (
     select n, flatten( zip(t,x,y)) as data 
     from dfs.root.`/Users/tdunning/tmp/data.json`);

will produce

+------+------+------+------+
|  n   |  t   |  x   |  y   |
+------+------+------+------+
| 1.0  | 1.0  | 3.0  | 4.0  |
| 1.0  | 2.0  | 2.0  | 5.0  |
| 1.0  | 3.0  | 1.0  | 6.0  |
| 2.0  | 1.0  | 3.0  | 4.0  |
| 2.0  | 2.0  | 2.0  | 5.0  |
| 2.0  | 3.0  | 1.0  | 6.0  |
| 2.0  | 0.0  | 0.0  | 7.0  |
+------+------+------+------+

Aside from any utility this function itself might have, zip demonstrates how to construct complex output structures.

mask(...)

Mask will replace the n first characters by a masking character as you can see below:

SELECT MASK(first_name, '*', 3) first , MASK(last_name, '#', 10) last FROM cp.`employee.json` LIMIT 2;
+----------+----------+
|  first   |   last   |
+----------+----------+
| ***ri    | ######   |
| ***rick  | #######  |
+----------+----------+