Skip to content
Tom Gianos edited this page Feb 27, 2015 · 4 revisions

Introduction

This page contains instructions on how to setup Genie 2.0.0. If you're looking for a different version please see the list of releases here.

Caveats

The current Genie Docker image is NOT PRODUCTION READY.

This guide will help you get up and running quickly, but take note of the following:

  • There has been no tuning of memory settings for Tomcat or the launched processes on the Genie container
  • By default the MySQL connection for Genie launches a connection pool with 20 connections. There hasn't been any extensive testing of the MySQL container to see if this causes problems.
  • There is no real way to flexibly configure the Genie node beyond instructions included here. This will hopefully be improved upon in future releases.
  • Since Genie usually runs on a fully fleshed out system it uses getHostName functionality from Java Libraries to dynamically create some links in the UI. Within Docker this returns container ID which if you're accessing the UI from a Host (e.g. Mac) browser will present broken links. Just replace the host with the IP address of the Genie Docker image and it will work. We're hoping to fix this in future releases.

Pre-Requisites

  • Docker 1.5+
    • The Genie image was developed and tested using Docker 1.5.0 so some of these steps or functionality may not be available in older versions. In particular the ability to edit /etc/hosts is not available in older versions of Docker.

Configure

Configure both MySQL and Hadoop if you want to run the example. If you don't want to run the example then you don't need to bring up Hadoop or add the link flag when bringing up Genie. MySQL (or other database if you want to configure your own) is required.

Setup MySQL (Required)

Run

docker run --name mysql-genie -e MYSQL_ROOT_PASSWORD=genie -e MYSQL_DATABASE=genie -d mysql:5.6.21

Notes

  1. This launches a MySQL 5.6.21 container with the name mysql-genie. This is important as it's referenced by this name from Genie for the connection.
  2. This container is launched as a daemon (-d) so it will run in the background
  3. This sets the database to be genie and the password to be genie for the root user. If you change these you'll have to manually change configuration in Genie container later

Verify

  1. docker exec -it mysql-genie mysql -pgenie

  2. Should enter you into MySql CLI where you can run commands like so:

     root@ubuntu:~# docker exec -it mysql-genie mysql -pgenie
     Warning: Using a password on the command line interface can be insecure.
     Welcome to the MySQL monitor.  Commands end with ; or \g.
     Your MySQL connection id is 3
     Server version: 5.6.21 MySQL Community Server (GPL)
    
     Copyright (c) 2000, 2014, Oracle and/or its affiliates. All rights reserved.
    
     Oracle is a registered trademark of Oracle Corporation and/or its
     affiliates. Other names may be trademarks of their respective
     owners.
    
     Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
    
     mysql> show Databases;
     +--------------------+
     | Database           |
     +--------------------+
     | information_schema |
     | genie              |
     | mysql              |
     | performance_schema |
     +--------------------+
     4 rows in set (0.00 sec)
    
     mysql> use genie;
     Database changed
     mysql> exit
     Bye
    

Setup Hadoop To Run Example (Optional)

Run

docker run --name hadoop-genie -it --rm -p 10020:10020 -p 19888:19888 sequenceiq/hadoop-docker:2.6.0 /etc/bootstrap.sh -bash

Notes

  1. This launches a Hadoop 2.6.0 container with the name hadoop-genie. This is already is running most of the standard Hadoop 2 daemons.
  2. This is launched in interactive mode (-it) so it will leave you in a bash shell within the container when completed so you'll want to start this in a terminal you can leave running.
  3. It is started with -rm so that when you exit the bash window it completely removes the docker container and you won't even see it with docker ps -a
  4. You can find the IP Address of the Hadoop daemons by entering docker inspect hadoop-genie and locating the IP address field
  5. Hadoop daemon UI's are on the following ports
    1. NameNode 50070
    2. DataNode 50075
    3. Secondary NameNode 50090
    4. NodeManager 8042
    5. ResourceManager 8088
    6. JobHistoryServer 19888
  6. -p 10020:10020 -p 19888:19888 in the run command is exposing two more ports that aren't exposed by default by the Hadoop container. These ports are for the JobHistory server which isn't started by default so we'll start it manually in order to successfully run jobs from Genie node.

Modify /etc/hosts

  1. The sequenceiq images assume you're running Hadoop jobs locally so /etc/hosts is set up only for internal reference

  2. vi /etc/hosts/ and add hadoop-genie after the container id to the first line (space separated)

  3. This will allow the daemons to resolve each other when a job is submitted from the Genie node later on by container name

  4. File should now look something like this:

     172.17.0.2	44821c3dccd3 hadoop-genie
     127.0.0.1	localhost
     ::1	localhost ip6-localhost ip6-loopback
     fe00::0	ip6-localnet
     ff00::0	ip6-mcastprefix
     ff02::1	ip6-allnodes
     ff02::2	ip6-allrouters
    

Start Job History Server

  1. /usr/local/hadoop/sbin/mr-jobhistory-daemon.sh start historyserver

  2. jps should now show something like this:

     bash-4.1# jps
     356 SecondaryNameNode
     911 JobHistoryServer
     191 DataNode
     490 ResourceManager
     112 NameNode
     976 Jps
     570 NodeManager
    

Verify

  1. Locally you should be anything you can do with normal Hadoop client machine

  2. If jps and /etc/hosts look like above that should be good

  3. Can run a hadoop command just to verify (don't delete input directory if you plan to run example):

     bash-4.1# /usr/local/hadoop/bin/hadoop fs -ls input
     Found 31 items
     -rw-r--r--   1 root supergroup       4436 2015-01-15 04:05 input/capacity-scheduler.xml
     -rw-r--r--   1 root supergroup       1335 2015-01-15 04:05 input/configuration.xsl
     -rw-r--r--   1 root supergroup        318 2015-01-15 04:05 input/container-executor.cfg
     -rw-r--r--   1 root supergroup        155 2015-01-15 04:05 input/core-site.xml
     -rw-r--r--   1 root supergroup        154 2015-01-15 04:05 input/core-site.xml.template
     -rw-r--r--   1 root supergroup       3670 2015-01-15 04:05 input/hadoop-env.cmd
     -rw-r--r--   1 root supergroup       4302 2015-01-15 04:05 input/hadoop-env.sh
     -rw-r--r--   1 root supergroup       2490 2015-01-15 04:05 input/hadoop-metrics.properties
     -rw-r--r--   1 root supergroup       2598 2015-01-15 04:05 input/hadoop-metrics2.properties
     -rw-r--r--   1 root supergroup       9683 2015-01-15 04:05 input/hadoop-policy.xml
     -rw-r--r--   1 root supergroup        126 2015-01-15 04:05 input/hdfs-site.xml
     -rw-r--r--   1 root supergroup       1449 2015-01-15 04:05 input/httpfs-env.sh
     -rw-r--r--   1 root supergroup       1657 2015-01-15 04:05 input/httpfs-log4j.properties
     -rw-r--r--   1 root supergroup         21 2015-01-15 04:05 input/httpfs-signature.secret
     -rw-r--r--   1 root supergroup        620 2015-01-15 04:05 input/httpfs-site.xml
     -rw-r--r--   1 root supergroup       3523 2015-01-15 04:05 input/kms-acls.xml
     -rw-r--r--   1 root supergroup       1325 2015-01-15 04:05 input/kms-env.sh
     -rw-r--r--   1 root supergroup       1631 2015-01-15 04:05 input/kms-log4j.properties
     -rw-r--r--   1 root supergroup       5511 2015-01-15 04:05 input/kms-site.xml
     -rw-r--r--   1 root supergroup      11291 2015-01-15 04:05 input/log4j.properties
     -rw-r--r--   1 root supergroup        938 2015-01-15 04:05 input/mapred-env.cmd
     -rw-r--r--   1 root supergroup       1383 2015-01-15 04:05 input/mapred-env.sh
     -rw-r--r--   1 root supergroup       4113 2015-01-15 04:05 input/mapred-queues.xml.template
     -rw-r--r--   1 root supergroup        138 2015-01-15 04:05 input/mapred-site.xml
     -rw-r--r--   1 root supergroup        758 2015-01-15 04:05 input/mapred-site.xml.template
     -rw-r--r--   1 root supergroup         10 2015-01-15 04:05 input/slaves
     -rw-r--r--   1 root supergroup       2316 2015-01-15 04:05 input/ssl-client.xml.example
     -rw-r--r--   1 root supergroup       2268 2015-01-15 04:05 input/ssl-server.xml.example
     -rw-r--r--   1 root supergroup       2237 2015-01-15 04:05 input/yarn-env.cmd
     -rw-r--r--   1 root supergroup       4567 2015-01-15 04:05 input/yarn-env.sh
     -rw-r--r--   1 root supergroup       1525 2015-01-15 04:05 input/yarn-site.xml
    

Run the Genie Container

With Hadoop For Example

docker run --name genie --link mysql-genie:mysql-genie --link hadoop-genie:hadoop-genie -d netflixoss/genie:2.0.0

Standalone

docker run --name genie --link mysql-genie:mysql-genie -d netflixoss/genie:2.0.0

Monitor

Can monitor the Tomcat log:

docker logs -f genie

Can view container info:

docker inspect genie

Can "login" to Genie container:

docker exec -it genie /bin/bash

Get all IP Addresses for you containers:

docker ps -q | xargs docker inspect --format '{{ .NetworkSettings.IPAddress }}  {{ .Name }} {{ .Config.Image }} {{ .State.Running }} {{ .Id }}'

Verify

Base Verification

Verify the MySQL Connection From Genie

Should see the tables in the schema now.

    root@ubuntu:~# docker exec -it mysql-genie mysql -pgenie
    Warning: Using a password on the command line interface can be insecure.
    Welcome to the MySQL monitor.  Commands end with ; or \g.
    Your MySQL connection id is 4
    Server version: 5.6.21 MySQL Community Server (GPL)

    Copyright (c) 2000, 2014, Oracle and/or its affiliates. All rights reserved.

    Oracle is a registered trademark of Oracle Corporation and/or its
    affiliates. Other names may be trademarks of their respective
    owners.

    Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

    mysql> use genie;
    Reading table information for completion of table and column names
    You can turn off this feature to get a quicker startup with -A

    Database changed
    mysql> show tables;
    +---------------------+
    | Tables_in_genie     |
    +---------------------+
    | Application         |
    | Application_configs |
    | Application_jars    |
    | Application_tags    |
    | Cluster             |
    | Cluster_Command     |
    | Cluster_configs     |
    | Cluster_tags        |
    | Command             |
    | Command_configs     |
    | Command_tags        |
    | Job                 |
    | Job_tags            |
    +---------------------+
    13 rows in set (0.00 sec)

    mysql> exit
    Bye

Verify Genie Web App Up and Running

http://<genie_container_ip_address>:8080 will bring up Genie UI

http://<genie_container_ip_address>:8080/genie-jobs will bring up Genie Job directory where jobs will be executed from

http://<genie_container_ip_address>:8077 will bring up Karyon console which shows node information

Example Verification

Verify Hadoop Connection

    root@ubuntu:~# docker exec -it genie /bin/bash
    root@d59ba66a5fda:/# ping hadoop-genie
    PING hadoop-genie (172.17.0.4) 56(84) bytes of data.
    64 bytes from hadoop-genie (172.17.0.4): icmp_seq=1 ttl=64 time=0.101 ms
    64 bytes from hadoop-genie (172.17.0.4): icmp_seq=2 ttl=64 time=0.076 ms
    64 bytes from hadoop-genie (172.17.0.4): icmp_seq=3 ttl=64 time=0.067 ms
    ^C
    --- hadoop-genie ping statistics ---
    3 packets transmitted, 3 received, 0% packet loss, time 2004ms
    rtt min/avg/max/mdev = 0.067/0.081/0.101/0.016 ms
    root@d59ba66a5fda:/# hadoop fs -ls input
    Found 31 items
    -rw-r--r--   1 root supergroup       4436 2015-01-15 09:05 input/capacity-scheduler.xml
    -rw-r--r--   1 root supergroup       1335 2015-01-15 09:05 input/configuration.xsl
    -rw-r--r--   1 root supergroup        318 2015-01-15 09:05 input/container-executor.cfg
    -rw-r--r--   1 root supergroup        155 2015-01-15 09:05 input/core-site.xml
    -rw-r--r--   1 root supergroup        154 2015-01-15 09:05 input/core-site.xml.template
    -rw-r--r--   1 root supergroup       3670 2015-01-15 09:05 input/hadoop-env.cmd
    -rw-r--r--   1 root supergroup       4302 2015-01-15 09:05 input/hadoop-env.sh
    -rw-r--r--   1 root supergroup       2490 2015-01-15 09:05 input/hadoop-metrics.properties
    -rw-r--r--   1 root supergroup       2598 2015-01-15 09:05 input/hadoop-metrics2.properties
    -rw-r--r--   1 root supergroup       9683 2015-01-15 09:05 input/hadoop-policy.xml
    -rw-r--r--   1 root supergroup        126 2015-01-15 09:05 input/hdfs-site.xml
    -rw-r--r--   1 root supergroup       1449 2015-01-15 09:05 input/httpfs-env.sh
    -rw-r--r--   1 root supergroup       1657 2015-01-15 09:05 input/httpfs-log4j.properties
    -rw-r--r--   1 root supergroup         21 2015-01-15 09:05 input/httpfs-signature.secret
    -rw-r--r--   1 root supergroup        620 2015-01-15 09:05 input/httpfs-site.xml
    -rw-r--r--   1 root supergroup       3523 2015-01-15 09:05 input/kms-acls.xml
    -rw-r--r--   1 root supergroup       1325 2015-01-15 09:05 input/kms-env.sh
    -rw-r--r--   1 root supergroup       1631 2015-01-15 09:05 input/kms-log4j.properties
    -rw-r--r--   1 root supergroup       5511 2015-01-15 09:05 input/kms-site.xml
    -rw-r--r--   1 root supergroup      11291 2015-01-15 09:05 input/log4j.properties
    -rw-r--r--   1 root supergroup        938 2015-01-15 09:05 input/mapred-env.cmd
    -rw-r--r--   1 root supergroup       1383 2015-01-15 09:05 input/mapred-env.sh
    -rw-r--r--   1 root supergroup       4113 2015-01-15 09:05 input/mapred-queues.xml.template
    -rw-r--r--   1 root supergroup        138 2015-01-15 09:05 input/mapred-site.xml
    -rw-r--r--   1 root supergroup        758 2015-01-15 09:05 input/mapred-site.xml.template
    -rw-r--r--   1 root supergroup         10 2015-01-15 09:05 input/slaves
    -rw-r--r--   1 root supergroup       2316 2015-01-15 09:05 input/ssl-client.xml.example
    -rw-r--r--   1 root supergroup       2268 2015-01-15 09:05 input/ssl-server.xml.example
    -rw-r--r--   1 root supergroup       2237 2015-01-15 09:05 input/yarn-env.cmd
    -rw-r--r--   1 root supergroup       4567 2015-01-15 09:05 input/yarn-env.sh
    -rw-r--r--   1 root supergroup       1525 2015-01-15 09:05 input/yarn-site.xml

Run The Examples

The example configures Genie with the Hadoop configuration information for the Hadoop container you brought up earlier as well as two commands (Hadoop and Pig). Then it will launch a Hadoop MR job which is the example provided by Hadoop hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar grep input output 'dfs[a-z.]+'. You can look at the scripts which use the Genie Python Client but REST or [Java client] (http://netflix.github.io/genie/docs/javadoc/client/index.html) calls would also work.

  1. Login to the Genie container
    1. docker exec -it genie /bin/bash
  2. Execute the setup script to register the Hadoop cluster and Hadoop / Pig commands
    1. /apps/genie/example/setup.py
  3. Verify the configurations have been loaded
    1. From host machine navigate to the UI http://<genie_container_ip_address>:8080
    2. Should see 1 active cluster on main page
    3. Search commands and find the two ACTIVE commands Hadoop and Pig
  4. Run the Hadoop job
    1. Back within Genie container
    2. /apps/genie/example/run_hadoop_job.py
    3. Should be able to navigate to UI and find the job when you search jobs
    4. Can monitor job progress in Hadoop ResourceManager http://<hadoop_container_ip_address>:8088
    5. Script will output job id and you can use it to look at the job object at http://<genie_container_ip_address>:8080/genie/v2/jobs/<job_id>
    6. You can also look at the job working directory at http://<genie_container_ip_address>:8080/genie-jobs/<job_id>
    7. Script should exit when job completes and look something like this
root@d59ba66a5fda:/# /apps/genie/example/run_hadoop_job.py
Job f2280825-cb79-4984-80bb-0db644ae4d0f is RUNNING
...
...
Job f2280825-cb79-4984-80bb-0db644ae4d0f is RUNNING
Job f2280825-cb79-4984-80bb-0db644ae4d0f finished with status SUCCEEDED

You should now be able to see the output of the job in HDFS using hadoop fs -ls output and all the logs will be in the genie-job directory.

To rerun the example you would need to delete the output directory from HDFS hadoop fs -rm -r output or change the output location in the job configuration in the script command line parameters.

To reset the whole system stop and remove all the docker containers and rerun the config steps above.

More Info