CERN HLoader

HLoader is a tool built around Apache Sqoop and Oozie for data ingestion from relational databases into Hadoop

Installation & Configuration

Clone the repository (git clone https://github.com/cerndb/hloader.git)
Install the requirements (pip install -r requirements.txt) and Oracle .rpm files located in /travis-resources
Set up the config.ini (properties starting with AUTH_ represent the authentication database, while properties starting with POSTGRE_ represent the meta database)
Run HLoader.py

In case you would like to set up the meta database yourself, the script is located in /hloader/db/PostgreSQL_backend_schema.sql

REST API

The REST API exposes meta data to the user and enables the submission of new ingestion jobs
It runs on http://127.0.0.1:5000

The available methods are:

GET /headers
returns PYTHON ENVIRONMENT VARIABLES and REQUEST HEADERS

GET /api/v1
the index page

GET /api/v1/clusters
returns a json with an array of clusters, potentially filtered by an attribute value

GET /api/v1/servers
returns a json with an array of servers, potentially filtered by an attribute value

GET /api/v1/schemas
returns json with arrays of available and unavailable schemas given an owner username. Reuired parameter: owner_username

GET /api/v1/jobs
returns a json with an array of jobs, potentially filtered by an attribute value

POST /api/v1/jobs
submits a job and returns a json containing its ID.
Required parameters: source_server_id, source_schema_name, source_object_name, destination_cluster_id, destination_path, owner_username, workflow_suffix, sqoop_direct
Optional parameters: coordinator_suffix, sqoop_nmap, sqoop_splitting_column, sqoop_incremental_method, start_time, end_time, interval, job_last_update

DELETE /api/v1/jobs
deletes a job given its ID and reports the status of the operation. Required parameter: job_id

GET /api/v1/logs
returns a json with an array of logs, potentially filtered by an attribute value

GET /api/v1/transfers
returns a json with an array of transfers, potentially filtered by an attribute value

A word on Apache Oozie

Currently the submitted jobs will be executed using Oozie Workflows or Coordinators. The path to the Workflow/Coordinator app on HDFS should be provided in the workflow_suffix / coordinator_suffix parameters respectively. The URL to the Oozie deployment is contained in the clusters meta data.

Sample insert statements for meta data can be found in /hloader/db/PostgreSQL_test_data.sql

Name		Name	Last commit message	Last commit date
Latest commit History 150 Commits
hloader		hloader
tests		tests
travis-resources		travis-resources
.gitignore		.gitignore
.travis.yml		.travis.yml
HLoader.py		HLoader.py
README.md		README.md
requirements.txt		requirements.txt
sqoop_coordinator.xml		sqoop_coordinator.xml
sqoop_wf.xml		sqoop_wf.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CERN HLoader

Installation & Configuration

REST API

The available methods are:

A word on Apache Oozie

About

Releases

Packages

Contributors 4

Languages

cerndb/hloader

Folders and files

Latest commit

History

Repository files navigation

CERN HLoader

Installation & Configuration

REST API

The available methods are:

A word on Apache Oozie

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages