-
Notifications
You must be signed in to change notification settings - Fork 0
Genie 2.0.0
This page contains instructions on how to setup Genie 2.0.0. If you're looking for a different version please see the list of releases here.
The current Genie Docker image is NOT PRODUCTION READY.
This guide will help you get up and running quickly, but take note of the following:
- There has been no tuning of memory settings for Tomcat or the launched processes on the Genie container
- By default the MySQL connection for Genie launches a connection pool with 20 connections. There hasn't been any extensive testing of the MySQL container to see if this causes problems.
- There is no real way to flexibly configure the Genie node beyond instructions included here. This will hopefully be improved upon in future releases.
- Since Genie usually runs on a fully fleshed out system it uses getHostName functionality from Java Libraries to dynamically create some links in the UI. Within Docker this returns container ID which if you're accessing the UI from a Host (e.g. Mac) browser will present broken links. Just replace the host with the IP address of the Genie Docker image and it will work. We're hoping to fix this in future releases.
- Docker 1.5+
- The Genie image was developed and tested using Docker 1.5.0 so some of these steps or functionality may not be
available in older versions. In particular the ability to edit
/etc/hosts
is not available in older versions of Docker.
- The Genie image was developed and tested using Docker 1.5.0 so some of these steps or functionality may not be
available in older versions. In particular the ability to edit
Configure both MySQL and Hadoop if you want to run the example. If you don't want to run the example then you don't need to bring up Hadoop or add the link flag when bringing up Genie. MySQL (or other database if you want to configure your own) is required.
docker run --name mysql-genie -e MYSQL_ROOT_PASSWORD=genie -e MYSQL_DATABASE=genie -d mysql:5.6.21
- This launches a MySQL 5.6.21 container with the name mysql-genie. This is important as it's referenced by this name from Genie for the connection.
- This container is launched as a daemon (-d) so it will run in the background
- This sets the database to be genie and the password to be genie for the root user. If you change these you'll have to manually change configuration in Genie container later
-
docker exec -it mysql-genie mysql -pgenie
-
Should enter you into MySql CLI where you can run commands like so:
root@ubuntu:~# docker exec -it mysql-genie mysql -pgenie Warning: Using a password on the command line interface can be insecure. Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 3 Server version: 5.6.21 MySQL Community Server (GPL) Copyright (c) 2000, 2014, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> show Databases; +--------------------+ | Database | +--------------------+ | information_schema | | genie | | mysql | | performance_schema | +--------------------+ 4 rows in set (0.00 sec) mysql> use genie; Database changed mysql> exit Bye
docker run --name hadoop-genie -it --rm -p 10020:10020 -p 19888:19888 sequenceiq/hadoop-docker:2.6.0 /etc/bootstrap.sh -bash
- This launches a Hadoop 2.6.0 container with the name hadoop-genie. This is already is running most of the standard Hadoop 2 daemons.
- This is launched in interactive mode (
-it
) so it will leave you in a bash shell within the container when completed so you'll want to start this in a terminal you can leave running. - It is started with
-rm
so that when you exit the bash window it completely removes the docker container and you won't even see it withdocker ps -a
- You can find the IP Address of the Hadoop daemons by entering
docker inspect hadoop-genie
and locating the IP address field - Hadoop daemon UI's are on the following ports
- NameNode 50070
- DataNode 50075
- Secondary NameNode 50090
- NodeManager 8042
- ResourceManager 8088
- JobHistoryServer 19888
-
-p 10020:10020 -p 19888:19888
in the run command is exposing two more ports that aren't exposed by default by the Hadoop container. These ports are for the JobHistory server which isn't started by default so we'll start it manually in order to successfully run jobs from Genie node.
-
The sequenceiq images assume you're running Hadoop jobs locally so /etc/hosts is set up only for internal reference
-
vi /etc/hosts/
and add hadoop-genie after the container id to the first line (space separated) -
This will allow the daemons to resolve each other when a job is submitted from the Genie node later on by container name
-
File should now look something like this:
172.17.0.2 44821c3dccd3 hadoop-genie 127.0.0.1 localhost ::1 localhost ip6-localhost ip6-loopback fe00::0 ip6-localnet ff00::0 ip6-mcastprefix ff02::1 ip6-allnodes ff02::2 ip6-allrouters
-
/usr/local/hadoop/sbin/mr-jobhistory-daemon.sh start historyserver
-
jps
should now show something like this:bash-4.1# jps 356 SecondaryNameNode 911 JobHistoryServer 191 DataNode 490 ResourceManager 112 NameNode 976 Jps 570 NodeManager
-
Locally you should be anything you can do with normal Hadoop client machine
-
If
jps
and/etc/hosts
look like above that should be good -
Can run a hadoop command just to verify (don't delete input directory if you plan to run example):
bash-4.1# /usr/local/hadoop/bin/hadoop fs -ls input Found 31 items -rw-r--r-- 1 root supergroup 4436 2015-01-15 04:05 input/capacity-scheduler.xml -rw-r--r-- 1 root supergroup 1335 2015-01-15 04:05 input/configuration.xsl -rw-r--r-- 1 root supergroup 318 2015-01-15 04:05 input/container-executor.cfg -rw-r--r-- 1 root supergroup 155 2015-01-15 04:05 input/core-site.xml -rw-r--r-- 1 root supergroup 154 2015-01-15 04:05 input/core-site.xml.template -rw-r--r-- 1 root supergroup 3670 2015-01-15 04:05 input/hadoop-env.cmd -rw-r--r-- 1 root supergroup 4302 2015-01-15 04:05 input/hadoop-env.sh -rw-r--r-- 1 root supergroup 2490 2015-01-15 04:05 input/hadoop-metrics.properties -rw-r--r-- 1 root supergroup 2598 2015-01-15 04:05 input/hadoop-metrics2.properties -rw-r--r-- 1 root supergroup 9683 2015-01-15 04:05 input/hadoop-policy.xml -rw-r--r-- 1 root supergroup 126 2015-01-15 04:05 input/hdfs-site.xml -rw-r--r-- 1 root supergroup 1449 2015-01-15 04:05 input/httpfs-env.sh -rw-r--r-- 1 root supergroup 1657 2015-01-15 04:05 input/httpfs-log4j.properties -rw-r--r-- 1 root supergroup 21 2015-01-15 04:05 input/httpfs-signature.secret -rw-r--r-- 1 root supergroup 620 2015-01-15 04:05 input/httpfs-site.xml -rw-r--r-- 1 root supergroup 3523 2015-01-15 04:05 input/kms-acls.xml -rw-r--r-- 1 root supergroup 1325 2015-01-15 04:05 input/kms-env.sh -rw-r--r-- 1 root supergroup 1631 2015-01-15 04:05 input/kms-log4j.properties -rw-r--r-- 1 root supergroup 5511 2015-01-15 04:05 input/kms-site.xml -rw-r--r-- 1 root supergroup 11291 2015-01-15 04:05 input/log4j.properties -rw-r--r-- 1 root supergroup 938 2015-01-15 04:05 input/mapred-env.cmd -rw-r--r-- 1 root supergroup 1383 2015-01-15 04:05 input/mapred-env.sh -rw-r--r-- 1 root supergroup 4113 2015-01-15 04:05 input/mapred-queues.xml.template -rw-r--r-- 1 root supergroup 138 2015-01-15 04:05 input/mapred-site.xml -rw-r--r-- 1 root supergroup 758 2015-01-15 04:05 input/mapred-site.xml.template -rw-r--r-- 1 root supergroup 10 2015-01-15 04:05 input/slaves -rw-r--r-- 1 root supergroup 2316 2015-01-15 04:05 input/ssl-client.xml.example -rw-r--r-- 1 root supergroup 2268 2015-01-15 04:05 input/ssl-server.xml.example -rw-r--r-- 1 root supergroup 2237 2015-01-15 04:05 input/yarn-env.cmd -rw-r--r-- 1 root supergroup 4567 2015-01-15 04:05 input/yarn-env.sh -rw-r--r-- 1 root supergroup 1525 2015-01-15 04:05 input/yarn-site.xml
docker run --name genie --link mysql-genie:mysql-genie --link hadoop-genie:hadoop-genie -d netflixoss/genie:2.0.0
docker run --name genie --link mysql-genie:mysql-genie -d netflixoss/genie:2.0.0
Can monitor the Tomcat log:
docker logs -f genie
Can view container info:
docker inspect genie
Can "login" to Genie container:
docker exec -it genie /bin/bash
Get all IP Addresses for you containers:
docker ps -q | xargs docker inspect --format '{{ .NetworkSettings.IPAddress }} {{ .Name }} {{ .Config.Image }} {{ .State.Running }} {{ .Id }}'
Should see the tables in the schema now.
root@ubuntu:~# docker exec -it mysql-genie mysql -pgenie
Warning: Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 4
Server version: 5.6.21 MySQL Community Server (GPL)
Copyright (c) 2000, 2014, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> use genie;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> show tables;
+---------------------+
| Tables_in_genie |
+---------------------+
| Application |
| Application_configs |
| Application_jars |
| Application_tags |
| Cluster |
| Cluster_Command |
| Cluster_configs |
| Cluster_tags |
| Command |
| Command_configs |
| Command_tags |
| Job |
| Job_tags |
+---------------------+
13 rows in set (0.00 sec)
mysql> exit
Bye
http://<genie_container_ip_address>:8080
will bring up Genie UI
http://<genie_container_ip_address>:8080/genie-jobs
will bring up Genie Job directory where jobs will be executed
from
http://<genie_container_ip_address>:8077
will bring up Karyon console which shows node information
root@ubuntu:~# docker exec -it genie /bin/bash
root@d59ba66a5fda:/# ping hadoop-genie
PING hadoop-genie (172.17.0.4) 56(84) bytes of data.
64 bytes from hadoop-genie (172.17.0.4): icmp_seq=1 ttl=64 time=0.101 ms
64 bytes from hadoop-genie (172.17.0.4): icmp_seq=2 ttl=64 time=0.076 ms
64 bytes from hadoop-genie (172.17.0.4): icmp_seq=3 ttl=64 time=0.067 ms
^C
--- hadoop-genie ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2004ms
rtt min/avg/max/mdev = 0.067/0.081/0.101/0.016 ms
root@d59ba66a5fda:/# hadoop fs -ls input
Found 31 items
-rw-r--r-- 1 root supergroup 4436 2015-01-15 09:05 input/capacity-scheduler.xml
-rw-r--r-- 1 root supergroup 1335 2015-01-15 09:05 input/configuration.xsl
-rw-r--r-- 1 root supergroup 318 2015-01-15 09:05 input/container-executor.cfg
-rw-r--r-- 1 root supergroup 155 2015-01-15 09:05 input/core-site.xml
-rw-r--r-- 1 root supergroup 154 2015-01-15 09:05 input/core-site.xml.template
-rw-r--r-- 1 root supergroup 3670 2015-01-15 09:05 input/hadoop-env.cmd
-rw-r--r-- 1 root supergroup 4302 2015-01-15 09:05 input/hadoop-env.sh
-rw-r--r-- 1 root supergroup 2490 2015-01-15 09:05 input/hadoop-metrics.properties
-rw-r--r-- 1 root supergroup 2598 2015-01-15 09:05 input/hadoop-metrics2.properties
-rw-r--r-- 1 root supergroup 9683 2015-01-15 09:05 input/hadoop-policy.xml
-rw-r--r-- 1 root supergroup 126 2015-01-15 09:05 input/hdfs-site.xml
-rw-r--r-- 1 root supergroup 1449 2015-01-15 09:05 input/httpfs-env.sh
-rw-r--r-- 1 root supergroup 1657 2015-01-15 09:05 input/httpfs-log4j.properties
-rw-r--r-- 1 root supergroup 21 2015-01-15 09:05 input/httpfs-signature.secret
-rw-r--r-- 1 root supergroup 620 2015-01-15 09:05 input/httpfs-site.xml
-rw-r--r-- 1 root supergroup 3523 2015-01-15 09:05 input/kms-acls.xml
-rw-r--r-- 1 root supergroup 1325 2015-01-15 09:05 input/kms-env.sh
-rw-r--r-- 1 root supergroup 1631 2015-01-15 09:05 input/kms-log4j.properties
-rw-r--r-- 1 root supergroup 5511 2015-01-15 09:05 input/kms-site.xml
-rw-r--r-- 1 root supergroup 11291 2015-01-15 09:05 input/log4j.properties
-rw-r--r-- 1 root supergroup 938 2015-01-15 09:05 input/mapred-env.cmd
-rw-r--r-- 1 root supergroup 1383 2015-01-15 09:05 input/mapred-env.sh
-rw-r--r-- 1 root supergroup 4113 2015-01-15 09:05 input/mapred-queues.xml.template
-rw-r--r-- 1 root supergroup 138 2015-01-15 09:05 input/mapred-site.xml
-rw-r--r-- 1 root supergroup 758 2015-01-15 09:05 input/mapred-site.xml.template
-rw-r--r-- 1 root supergroup 10 2015-01-15 09:05 input/slaves
-rw-r--r-- 1 root supergroup 2316 2015-01-15 09:05 input/ssl-client.xml.example
-rw-r--r-- 1 root supergroup 2268 2015-01-15 09:05 input/ssl-server.xml.example
-rw-r--r-- 1 root supergroup 2237 2015-01-15 09:05 input/yarn-env.cmd
-rw-r--r-- 1 root supergroup 4567 2015-01-15 09:05 input/yarn-env.sh
-rw-r--r-- 1 root supergroup 1525 2015-01-15 09:05 input/yarn-site.xml
The example configures Genie with the Hadoop configuration information for the Hadoop container you brought up earlier as well as two commands (Hadoop and Pig).
Then it will launch a Hadoop MR job which is the example provided by Hadoop
hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar grep input output 'dfs[a-z.]+'
. You can
look at the scripts which use the Genie Python Client but
REST or [Java client]
(http://netflix.github.io/genie/docs/javadoc/client/index.html) calls would also work.
- Login to the Genie container
docker exec -it genie /bin/bash
- Execute the setup script to register the Hadoop cluster and Hadoop / Pig commands
/apps/genie/example/setup.py
- Verify the configurations have been loaded
- From host machine navigate to the UI
http://<genie_container_ip_address>:8080
- Should see 1 active cluster on main page
- Search commands and find the two ACTIVE commands Hadoop and Pig
- From host machine navigate to the UI
- Run the Hadoop job
- Back within Genie container
/apps/genie/example/run_hadoop_job.py
- Should be able to navigate to UI and find the job when you search jobs
- Can monitor job progress in Hadoop ResourceManager
http://<hadoop_container_ip_address>:8088
- Script will output job id and you can use it to look at the job object at
http://<genie_container_ip_address>:8080/genie/v2/jobs/<job_id>
- You can also look at the job working directory at
http://<genie_container_ip_address>:8080/genie-jobs/<job_id>
- Script should exit when job completes and look something like this
root@d59ba66a5fda:/# /apps/genie/example/run_hadoop_job.py
Job f2280825-cb79-4984-80bb-0db644ae4d0f is RUNNING
...
...
Job f2280825-cb79-4984-80bb-0db644ae4d0f is RUNNING
Job f2280825-cb79-4984-80bb-0db644ae4d0f finished with status SUCCEEDED
You should now be able to see the output of the job in HDFS using hadoop fs -ls output
and all the logs will be
in the genie-job directory.
To rerun the example you would need to delete the output directory from HDFS hadoop fs -rm -r output
or change
the output location in the job configuration in the script command line parameters.
To reset the whole system stop and remove all the docker containers and rerun the config steps above.
- Netflix Tech Blog Posts
- Genie Github
- Client API Documentation
- Mailing List