Docker Image for Apache Drill

This will create a Docker container of Apache Drill for analyzing file-based data (e.g., parquet files).
While creating the container, the image will also generate a /data folder for the data files.
This image will also extract the jdbc driver jar of Apache Drill to connect SQL user interfaces (e.g., Data Grip) to the Drill container.

This image allows you to generate an out-of-the-box analytics environment without cluttering your machine. If this is not your cup of tea, head over to the instructions for a standard installation of Apache Drill.

Docker Environment

Install Docker by following the instructions here. If you do not have an account with Docker, you may be asked to create one.

Ensure that Docker is correctly running using the following command:

docker version

You should see a result similar to the following:

Client: Docker Engine - Community
 Version:           18.09.2
 API version:       1.39
 Go version:        go1.10.8
 Git commit:        6247962
 Built:             Sun Feb 10 04:12:39 2019
 OS/Arch:           darwin/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          18.09.2
  API version:      1.39 (minimum version 1.12)
  Go version:       go1.10.6
  Git commit:       6247962
  Built:            Sun Feb 10 04:13:06 2019
  OS/Arch:          linux/amd64
  Experimental:     false

Tip: Use the Docker app to adjust the memory, swap, and CPUs allocated to Docker containers.

The docker image

The docker image contains the following files:

The docker-compose.yml file includes the description of how to build and configure the container that will run Apache Drill.
The .env file contains the parameter DRILL_VERSION that determines which version of Apache Drill is being built and run. Currently, the DRILL_VERSION is set to 1.16.0. If you need to change this, adapt the .env file.
The /build/Dockerfile contains the build descriptions for the container.
The run_drill.sh contains the startup script for Apache Drill.
The .gitignore prevents that any file in the data folder or any parquet/csv files will be added to the repository.

Get the docker image

Use the GitHub Desktop or run the following commands (assuming that you have git installed):

Check for git

git version

You should see a result similar to the following:

git version 2.20.1 (Apple Git-117)

Check out the repo into a directory of your choice:

git clone https://github.com/mschermann/docker_apache_drill_datagrip.git

Build and start the container

Check the configuration:

docker-compose config

You should see a result similar to the following:

services:
  drill:
    build:
      args:
        DRILL_VERSION: 1.16.0
      context: /<YOUR PATH>/build
      dockerfile: Dockerfile
    command: ./run_drill.sh
    container_name: drill
    environment:
      DRILL_VERSION: 1.16.0
    hostname: drill
    ports:
    - 8047:8047/tcp
    - 31010:31010/tcp
    restart: on-failure
    tty: true
    volumes:
    - /<YOUR PATH>/data:/data:rw

Build the container

docker-compose build

You should see that docker starts to build the container. This will take a while depending on your internet speed and machine configuration.

Building drill
Step 1/13 : FROM centos:latest
...

Start and stop the container

docker-compose up

You should see that docker starts the container. When you see the Drill message of the day, Drill is up and running:

Attaching to e382a3c16e10_drill
e382a3c16e10_drill | Apache Drill 1.16.0
e382a3c16e10_drill | "There are two types of analysts in the world: those who use Drill and those who don't."

Additionally, this step has also extracted the JDBC driver for Apache Drill (e.g., drill-jdbc-all-1.16.0.jar) into the /build folder.

You can stop the container with Control+C, which should result in the following output:

Gracefully stopping... (press Ctrl+C again to force)
Stopping e382a3c16e10_drill ... done

Explore Drill

Drill is starting a web GUI at http://localhost:8047.

If you click on Query, you can run SQL queries directly from the browser (Do not use this for any heavy-load querying).

Make sure that everything works fine by entering the example query SELECT * FROM cp.employee.json LIMIT 20.

It will show you a waiting screen and, if everything works fine, the results.

Now, head over to the Drill Documentation and start learning how to use Drill.

Connect to the Drill container with a SQL tool

Let's connect Data Grip to the Drill container.

Create a new Driver in Data Grip by pointing towards the JDBC driver for Apache Drill in the /build folder.

Create a data source using the Drill driver. Test the connection and make sure you get the green checkmark.

Run a Sample Query

Using the same query as above (SELECT * FROM cp.employee.json LIMIT 20), you should see the following output.

At this point, you are all set. Add your data files to the /data folder, and you should be able to query them.

If you use parquet data files, the following command will give you the five rows of the data.

SELECT * FROM dfs.`/data` LIMIT 5;

Head over to the Drill documentation for a more in-depth explanation and help.

Access the Drill container

You may want to access the Drill container at some point in time. The following step show how to connect to the container.

Find the name of your container.

docker ps

This should result in an output like this:

CONTAINER ID  IMAGE  COMMAND  CREATED  STATUS  PORTS  NAMES
e382a3c16e10  docker_drill_parquet_datagrip_drill  "./run_drill.sh"  About an hour ago  Up 47 seconds  0.0.0.0:8047->8047/tcp, 0.0.0.0:31010->31010/tcp e382a3c16e10_drill

Access the container From the NAMES column, you can see that this container is called e382a3c16e10_drill. You can connect to this container using the following command:

docker exec -it e382a3c16e10_drill bash

This will result in a prompt inside the container:

[root@drill drill]#

Acknowledgements

This repo was inspired by this post by Mattia Casotto.
The Dockerfile is an adapted version from Apache's Drill Dockerfile

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
assets		assets
build		build
.env		.env
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Docker Image for Apache Drill

Docker Environment

The docker image

Get the docker image

Build and start the container

Explore Drill

Connect to the Drill container with a SQL tool

Access the Drill container

Acknowledgements

About

Uh oh!

Releases

Packages

Languages

mschermann/docker_apache_drill_datagrip

Folders and files

Latest commit

History

Repository files navigation

Docker Image for Apache Drill

Docker Environment

The docker image

Get the docker image

Build and start the container

Explore Drill

Connect to the Drill container with a SQL tool

Access the Drill container

Acknowledgements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages