Refinery contains scripts, artifacts, and configuration for WMF's analytics cluster.
-
Install git-lfs on your system. Debian includes a git-lfs package.
-
Make sure the
docopt
anddateutil
Python packages are available on your system.On Ubuntu systems, you can achiev this by running
sudo apt-get install python-docopt sudo apt-get install python-dateutil
-
Clone the repository.
You can find the commands to clone the repository at WMF's gerrit.
To clone anonymously, just run
git clone https://gerrit.wikimedia.org/r/analytics/refinery
-
change to the cloned repository by running
cd refinery
-
Initialize git-lfs by running
git lfs install
-
Pull existing artifacts into the repository by running
git lfs pull
(Depending on you internet connection, this step may take some time.)
-
Add the
refinery/python
directory to yourPYTHONPATH
.To add it only in the running shell, you can use
export PYTHONPATH=/path/to/analytics/refinery/python
Please refer to your operating system's documentation on how to do this globally.
-
Done.
- Job base names is following directory pattern in the oozie directory,
replacing slashes with dashes. For instance
webrequest/load/bundle.xml
job is namedwebrequest-load-bundle
, andlast_access_uniques/daily/coordinator.xml
is namedlast_access_uniques-daily-coord
. - Root job names end either in
-bundle
or-coord
, while children job names end with job parameters separated with dashes.