Skip to content

Latest commit

 

History

History
103 lines (80 loc) · 2.19 KB

RCI_quick_start.md

File metadata and controls

103 lines (80 loc) · 2.19 KB

RCI Quick Start

This document gives a quick setup overview of ml4logs development environment on RCI cluster.

Create a virtual environment

Start a short interactive GPU session:

srun -p gpufast --gres=gpu:1 --time=0:30:00 --pty bash -i

Load Python module:

ml PyTorch/1.7.1-fosscuda-2020b

Create the virtual environment:

python -m venv ml4logs_env

Acvtivate it:

source ml4logs_env/bin/activate

You might want to setup Jupyter kernel for the virtual environment:

pip install jupyter

ipython kernel install --name "ml4logs_env" --user

Clone the repository:

git clone [email protected]:LogAnalysisTeam/ml4logs.git

cd ml4logs

Setup for development:

python setup.py develop

Note that during this step all dependencies are installed. You might want to check the log wheter everything went well.

Create init_environment.sh:

echo "initializing environment..."
ml PyTorch/1.7.1-fosscuda-2020b
source {PUT_YOUR_PYTHON_VIRTUALENV_DIRECTORY_HERE}/ml4logs_env/bin/activate

if [[ -z "${ML4LOGS_PYTHON}" ]]; then
    export ML4LOGS_PYTHON=python
fi

echo "ML4LOGS_PYTHON: \"${ML4LOGS_PYTHON}\""
echo "PROJECT_DIR: \"${PROJECT_DIR}\""
echo "done"

This file will be automatically sourced by all run scripts/.

SLURM

All batch files in scripts/ can be run both locally or on the cluster. RCI uses SLURM where you use

sbatch scripts/SCRIPT_NAME.batch

command to schedule a job. SLURM job configuration is done via commented lines in head of each batch file, so these get ignored when run locally, e.g.:

bash scripts/SCRIPT_NAME.batch

If using the Makefile to run the jobs on cluster do not forget to set

export ML4LOGS_SHELL=sbatch

If not set, the Makefile defaults to running localy (bash).

Download the Data

Try to run everything on the reduced dataset initially:

make hdfs1_100k_data

Preprocess the Data

make hdfs1_100k_preprocess

Run all HDFS Benchmarks

make hdfs1_100k_train_test

Experiments with the Full Dataset

make hdfs1_data
make hdfs1_preprocess
make hdfs1_train_test