Hive Worker

Hive Worker is a Scala library to run Hive job flows on the Amazon Elastic MapReduce platform using the AWS Java SDK.

Hive Worker uses Google guice and twitter-server as a server stack. twitter-server is built on top of Finagle -- see Finagle's User's Guide for more information.

NOTE: this library is a work in-progress

Install

git clone git://github.com/cacoco/hiveworker.git

Building

Hive Worker is built using Maven and requires Scala 2.10.1

To build, just run:

cd hiveworker
mvn clean install

Configuration

The parsing of the job configuration steps supports a basic form of date/time formatting if the value sent is of type io.angstrom.hiveworker.util.StepArgument. The default timezone is UTC and is not configurable. Supported formatting includes:

Hour, LastHour, Today, Yesterday, TwoDaysAgo, LastMonth

Hive Worker uses the joda-time library for date/time manipulation and formatting.

Running

To run locally:

mvn exec:java -Dexec.args="-aws.access.key=ACCESSS_KEY 
-aws.access.secret.key=SECRET_KEY 
-hadoop.bucket=s3:///hadoop.angstrom.io 
-hadoop.log.uri=s3://hadoop.angstrom.io/logs 
-aws.sns.topic.arn.job.errors=arn:aws:sns:us-east-1:111111111111:job-errors 
-aws.sqs.queue.url.default=https://queue.amazonaws.com/11111111111/HIVE_JOB_FLOW"

Honeycomb icons created by Freepik - Flaticon

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!