Skip to content

Commit

Permalink
add more guides
Browse files Browse the repository at this point in the history
  • Loading branch information
heheda12345 committed Apr 19, 2023
1 parent 8266a71 commit f2d7df3
Show file tree
Hide file tree
Showing 9 changed files with 250 additions and 83 deletions.
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -54,4 +54,6 @@ nnfusion_rt/
models/frozenmodels/

artifacts/data
artifacts/reproduce_results
artifacts/reproduce_results
*.onnx
*.tfgraph
93 changes: 93 additions & 0 deletions artifacts/INSTALL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
# Installation Tutorial
This document describes how to install the software used in the artifact on a node with NVIDIA GPU. All scripts are assumed to be run from `nnfusion/artifacts` directory.

## Prerequirements
We assume that you have a node with NVIDIA GPU and CUDA installed. We also assume that you have installed conda and nvcc. If you have not installed conda, you can install it by following the instructions [here](https://docs.conda.io/projects/conda/en/latest/user-guide/install/linux.html) (Miniconda is enough, and this artifact assumes that miniconda is installed at the default path `~/miniconda3`). If you have not installed nvcc, you can install it by following the instructions [here](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html).

## TensorFlow
The onnx-tf for TF 1.15 needs to be built from source because the pre-compiled version depends on TF2. We also fix some bugs in that commit to properly support the control flow operations. The following commands will prepare the conda env for TF 1.15.

```bash
conda create python=3.8 --name baseline_tf1 -y
conda activate baseline_tf1
pip install nvidia-pyindex
pip install -r env/requirements_tf.txt
mkdir -p third-party && cd third-party
git clone https://github.com/onnx/onnx-tensorflow.git
cd onnx-tensorflow
git checkout 0e4f4836 # v1.7.0-tf-1.15m
git apply ../../env/onnx_tf.patch
pip install -e .
conda deactivate
```
## JAX
The following commands will prepare the conda env for JAX.
```bash
conda create python=3.8 --name baseline_jax -y
conda activate baseline_jax
pip install nvidia-pyindex
pip install -r env/requirements_jax.txt -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html -f https://download.pytorch.org/whl/torch_stable.html
conda deactivate
```

## TVM
The following commands will prepare the conda env for TVM.
```bash
conda create python==3.8 --name kerneldb -y
conda activate kerneldb
pip install ply==3.11
mkdir -p third-party && cd third-party
git clone https://github.com/apache/tvm.git
cd tvm
git checkout 22ba6523c
git submodule init && git submodule update
git apply ../../env/tvm.patch
mkdir build
cd build
cp ../../../env/tvm.config.cmake config.cmake
make -j
cd ../python
pip install -e .
```

## NNFusion
The following commands will build nnfusion. Please use the [script](../maint/script/install_dependency.sh) (needs sudo) to prepare the environment for nnfusion before running the following commands.

```bash
cd .. # to $YOUR_DIR_FOR_NNFUSION/nnfusion
mkdir build && cd build && cmake .. && make -j
```

## Pytorch & Grinder
```bash
conda create python=3.7 --name grinder -y
conda activate grinder
pip install nvidia-pyindex
pip install -r env/requirements_pytorch.txt -f https://download.pytorch.org/whl/torch_stable.html
pip install -e .
conda deactivate
```

TODO get data

TODO prepare kerneldb

docker: --shm-size="32g"
docker build -t grinder:latest -f env/Dockerfile.rocm --network=host .





assume running at artifacts directory


## Pre-requisites
conda, nvcc ......




srun --pty -w nico3 -p Long --exclusive ./run_nv_gpu.sh

cd plot && ./plot_nv.sh && cd -
136 changes: 59 additions & 77 deletions artifacts/README.md
Original file line number Diff line number Diff line change
@@ -1,88 +1,41 @@
# Installation of Evaluated Systems
assume running at artifacts directory
# OSDI'23 Grinder Artifacts Evaluation

## 0. Overview
This code branch is used for OSDI'23 Artifact Evaluation of paper #628, titled "Grinder: Analysis and Optimization for Dynamic Control Flow in Deep Learning".

## Pre-requisites
conda, nvcc ......
### Evaluation Setup
* Artifacts Available:
* All Grinder related code are available under NNFusion open-source project located in: [https://github.com/microsoft/nnfusion/tree/TODO](https://github.com/microsoft/nnfusion/tree/TODO)
* Artifacts Functional:
* *Documentation*: the following of documents include detailed guidelines on how to build, install, test Grinder and the experiments to compare with other baselines.
* *Completeness*: the [C++ part](..) of Grinder has been merged into NNFusion in this branch, and the [Python part](ast_analyzer) is available in this artifact.
* *Exercisability*: under the *artifacts* folder, we prepare all the script and data to reproduce the experiements in individual folders named by the figure name in paper.
* Results Reproduced:
* To reproduce the main results presented in our paper, we provide Docker images containing all the environments and baseline software, and machines with the same configurations as we used in paper evaluation. We also provide detailed guideline to help reproduce the results step by step.

## TensorFlow
install from env/requirements_tf.txt
Install onnx-tf from source (the pre-compiled version depends on TF2)
## 1. Environment Preparation

```bash
conda create python=3.8 --name baseline_tf1 -y
conda activate baseline_tf1
pip install nvidia-pyindex
pip install -r env/requirements_tf.txt
mkdir third-party && cd third-party
git clone https://github.com/onnx/onnx-tensorflow.git
cd onnx-tensorflow
git checkout 0e4f4836 # v1.7.0-tf-1.15m
git apply ../../env/onnx_tf.patch
pip install -e .
conda deactivate
```
## JAX
```bash
conda create python=3.8 --name baseline_jax -y
conda activate baseline_jax
pip install nvidia-pyindex
pip install -r env/requirements_jax.txt -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html -f https://download.pytorch.org/whl/torch_stable.html
conda deactivate
```
**For AE Reviewers**:
1. The nico cluster we provide for artifact evaluation is managed by slurm. To run GPU-related commands, please use `srun --pty --exclusive` before the original command, which will submit the job to the compute node (nico[3-4]). For your convenience, we have included this prefix in our artifact but will remove it in the final version. If you are running the artifact on your own machine, please remember to remove the prefix.
2. Due to security concerns, we cannot provide the docker permission to reviewers. Instead, for NVIDIA GPU, we provide an account with all the dependencies installed, and for AMD GPU, we provide ssh access into the dockers. You can skip this environment preparation section.

# TVM
```bash
conda create python==3.8 --name kerneldb -y
pip install ply==3.11
mkdir third-party && cd third-party
git clone https://github.com/apache/tvm.git --recursive
cd tvm
git checkout 22ba6523c
git apply ../../env/tvm.patch
mkdir build
cd build
cp ../../../env/tvm.config.cmake config.cmake
make -j
cd ../python
pip install -e .
## NVIDIA GPU
```

## NNFusion

## Pytorch & Grinder
```bash
conda create python=3.7 --name grinder -y
conda activate grinder
pip install nvidia-pyindex
pip install -r env/requirements_pytorch.txt -f https://download.pytorch.org/whl/torch_stable.html
conda deactivate
cd $YOUR_DIR_FOR_NNFUSION
git clone https://github.com/microsoft/nnfusion.git --branch TODO --single-branch
cd nnfusion/artifacts
docker build -t grinder -f env/Dockerfile.nv .
docker run -it --name grinder-ae -v $YOUR_DIR_FOR_NNFUSION/nnfusion:/root/nnfusion grinder:latest --shm-size="32g" /bin/bash
```

## Grinder (with code)
```bash
export ARTIFACT_ROOT=***/ControlFlow/artifacts TODO
cd $ARTIFACT_ROOT/..
pip install -e .
```
TODO install nnfusion
TODO prepare kerneldb
adapted (TODO: remove)
docker build --network=host -t grinder -f env/Dockerfile.nv .
docker run -it --name heheda-grinder-ae -v /home/heheda/control_flow/nnfusion-docker:/root/nnfusion --shm-size="32g" --network=host grinder:latest /bin/bash

TODO get data

docker: --shm-size="32g"
docker build -t grinder:latest -f env/Dockerfile.rocm --network=host .

cmake ..
```
cd $ARTIFACT_ROOT/../nnfusion
mkdir build && cd build
cmake .. && make -j
cd $ARTIFACT_ROOT/..
pip install -e .
TODO: config.py
```

# build jax docker
## AMD GPU
* build jax docker
```bash
mkdir third-party && cd third-party
git clone https://github.com/google/jax.git
Expand All @@ -91,8 +44,37 @@ git checkout 0282b4bfad
git apply ../../env/jax.rocm.patch
./build/rocm/ci_build.sh --keep_image bash -c "./build/rocm/build_rocm.sh"
```
TODO get data

## 2. Getting Started with a Simple Example

* Go to the *get_started_tutorial/* folder and follow [README_GET_STARTED.md](get_started_tutorial/README_GET_STARTED.md).


## 3. Kernel Generation
This step generates all kernels for Grinder. More details can be found in [README_KERNEL_DB.md](kernel_db/README_KERNEL_DB.md).
**NOTE**: this process will take about TODO hours.
```bash
# assume running at nnfusion/artifacts directory
cd kernel_db
srun --pty --exclusive ./reproduce_kernel_db.sh
```

## 4. Reproducing Individual Experiement Results
**NOTE**: we provide a script named "run_nv_gpu.sh" to run the experiments except Figure19. You can use `./run_nv_gpu.sh` to run the experiments. TODO: explain the run of Figure 19.

**For AE Reviewers**: Please use `srun --pty -w nico3 --exclusive ./run_nv_gpu.sh ` to submit the jobs to the compute node of the provided cluster. TODO: 是否需要 -p Long?

srun --pty -w nico3 -p Long --exclusive ./run_nv_gpu.sh
| Experiments | Figure # in Paper | Script Location |
| ----------- | ----------- | ----------- |
| #1. Control flow overhead in JAX | Figure 2 | N/A (use the results in Figure 15, 16, and 18) |
| #2. End-to-end DNN inference on NVIDIA V100 GPU | Figure 14 | [run.sh](Figure14/run.sh) |
| #3. Control flow overhead of models with loops | Figure 15 | [run.sh](Figure15/run.sh) |
| #4. Control flow overhead of models with branches | Figure 16 | [run.sh](Figure16/run.sh) |
| #5. Different ratio of executed layers | Figure 17 | [run.sh](Figure17/run.sh) |
| #6. Control flow overhead of RAE with recursion | Figure 18 | [run.sh](Figure18/run.sh) |
| #7. End-to-end DNN inference on ROCm MI100 GPU with BS=1 | Figure 19 | [run.sh](Figure19/run.sh) TODO |
| #8. Breakdown of models with BS=1 | Figure 20 | [run.sh](Figure20/run.sh)|

cd plot && ./plot_nv.sh && cd -
## 5. Reproduce the Figures in the paper
TODO (how to draw figure 19?)
2 changes: 1 addition & 1 deletion artifacts/ast_analyzer/utils/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# config start
KERNELDB_REQUEST_FNAME="kerneldb_request.log"
TMP_DIR = f"/dev/shm/{getpass.getuser()}/grinder"
NNFUSION_ROOT = os.path.normpath(os.path.join(os.path.dirname(os.path.realpath(__file__)), '../..'))
NNFUSION_ROOT = os.path.normpath(os.path.join(os.path.dirname(os.path.realpath(__file__)), '../../..'))
KERNELDB_PATH = os.path.expanduser(f"/tmp/{getpass.getuser()}/kernel_cache.db")
NUM_GPU = 8
# config end
Expand Down
2 changes: 1 addition & 1 deletion artifacts/env/Dockerfile.nv
Original file line number Diff line number Diff line change
Expand Up @@ -12,4 +12,4 @@ RUN source /root/miniconda3/etc/profile.d/conda.sh && conda create python=3.8 --
RUN source /root/miniconda3/etc/profile.d/conda.sh && conda create python=3.8 --name baseline_jax -y && conda activate baseline_jax && pip install nvidia-pyindex && pip install -r env/requirements_jax.txt -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html -f https://download.pytorch.org/whl/torch_stable.html && conda deactivate
RUN source /root/miniconda3/etc/profile.d/conda.sh && conda create python=3.7 --name grinder -y && conda activate grinder && pip install nvidia-pyindex && pip install -r env/requirements_pytorch.txt -f https://download.pytorch.org/whl/torch_stable.html && conda deactivate
RUN source /root/miniconda3/etc/profile.d/conda.sh && conda create python==3.8 --name kerneldb -y && pip install ply==3.11 && mkdir -p third-party && cd third-party && git clone https://github.com/apache/tvm.git && cd tvm && git checkout 22ba6523c && git submodule init && git submodule update && git apply ../../env/tvm.patch && mkdir build && cd build && cp ../cmake/config.cmake config.cmake && sed -i "s/USE_CUDA OFF/USE_CUDA ON/g" config.cmake && sed -i "s/USE_LLVM OFF/USE_LLVM ON/g" config.cmake && cmake .. && make -j && cd ../python && pip install -e .
RUN apt-get install -y libgflags-dev libsqlite3-dev libcurl4-openssl-dev curl libcurl4-openssl-dev
RUN cd env && bash install_nnfusion_dependency.sh && cd ..
8 changes: 8 additions & 0 deletions artifacts/env/install_grinder.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
#!/bin/bash
cd nnfusion
mkdir build && cd build && cmake .. && make -j && cd -

cd artifacts
conda activate grinder
pip install -e .
conda deactivate
76 changes: 76 additions & 0 deletions artifacts/env/install_nnfusion_dependency.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
#!/bin/bash -e

# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.

echo "Running NNFusion install_dependency.sh"
DEB_PACKAGES="build-essential cmake git curl zlib1g zlib1g-dev libtinfo-dev unzip \
autoconf automake libtool ca-certificates gdb sqlite3 libsqlite3-dev libcurl4-openssl-dev \
libprotobuf-dev protobuf-compiler libgflags-dev libgtest-dev"

ubuntu_codename=$(. /etc/os-release;echo $UBUNTU_CODENAME)

if [[ $ubuntu_codename != "focal" ]]; then
DEB_PACKAGES="${DEB_PACKAGES} clang-3.9 clang-format-3.9"
fi

if [[ "$(whoami)" != "root" ]]; then
SUDO=sudo
fi

if ! dpkg -L $DEB_PACKAGES >/dev/null 2>&1; then
#Thirdparty deb for ubuntu 18.04(bionic)
$SUDO sh -c "apt update && apt install -y --no-install-recommends software-properties-common apt-transport-https ca-certificates gnupg wget"
$SUDO sh -c "wget -O - https://apt.kitware.com/keys/kitware-archive-latest.asc 2>/dev/null | gpg --dearmor - | tee /etc/apt/trusted.gpg.d/kitware.gpg >/dev/null"
$SUDO sh -c "apt-add-repository 'deb https://apt.kitware.com/ubuntu/ $ubuntu_codename main'"
$SUDO sh -c "apt update && apt install -y --no-install-recommends $DEB_PACKAGES"

if [[ $ubuntu_codename != "focal" ]]; then
# Install protobuf 3.6.1 from source
$SUDO sh -c "wget https://github.com/protocolbuffers/protobuf/releases/download/v3.6.1/protobuf-cpp-3.6.1.tar.gz -P /tmp"
$SUDO sh -c "cd /tmp && tar -xf /tmp/protobuf-cpp-3.6.1.tar.gz && rm /tmp/protobuf-cpp-3.6.1.tar.gz"
$SUDO sh -c "cd /tmp/protobuf-3.6.1/ && ./configure && make && make check && make install && ldconfig && rm -rf /tmp/protobuf-3.6.1/"
fi
fi

# if [[ $ubuntu_codename == "focal" ]]; then
# # Install clang-format-3.9
# $SUDO sh -c "cd /tmp && wget https://releases.llvm.org/3.9.0/clang+llvm-3.9.0-x86_64-linux-gnu-ubuntu-16.04.tar.xz && tar -xf clang+llvm-3.9.0-x86_64-linux-gnu-ubuntu-16.04.tar.xz"
# $SUDO sh -c "cp /tmp/clang+llvm-3.9.0-x86_64-linux-gnu-ubuntu-16.04/bin/clang-format /usr/bin/clang-format-3.9 && ln -s /usr/bin/clang-format-3.9 /usr/bin/clang-format"
# $SUDO sh -c "rm -rf /tmp/clang+llvm-3.9.0-x86_64-linux-gnu-ubuntu-16.04/bin/clang-format /tmp/clang+llvm-3.9.0-x86_64-linux-gnu-ubuntu-16.04.tar.xz"
# fi

echo "- Dependencies are installed in system."

if [ ! -f "/usr/lib/libgtest.a" ]; then

# if Ubuntu 16.04, we have some dev node using ubuntu 16.04
if [[ $ubuntu_codename == "xenial" ]]; then
$SUDO sh -c "mkdir /usr/src/googletest && ln -s /usr/src/gtest /usr/src/googletest/googletest"
fi

# Compile gtest
$SUDO sh -c "cd /usr/src/googletest/googletest/ && mkdir -p build && cd build && cmake .. -DCMAKE_CXX_FLAGS=\"-std=c++11\" && make -j"

if [[ $ubuntu_codename == "focal" ]]; then
$SUDO sh -c "cp /usr/src/googletest/googletest/build/lib/libgtest*.a /usr/lib/"
else
$SUDO sh -c "cp /usr/src/googletest/googletest/build/libgtest*.a /usr/lib/"
fi

$SUDO sh -c "rm -rf /usr/src/googletest/googletest/build"
$SUDO sh -c "mkdir /usr/local/lib/googletest"
$SUDO sh -c "ln -s /usr/lib/libgtest.a /usr/local/lib/googletest/libgtest.a"
$SUDO sh -c "ln -s /usr/lib/libgtest_main.a /usr/local/lib/googletest/libgtest_main.a"
fi
echo "- libgtest is installed in system."

# Install numpy
$SUDO sh -c "apt install -y python3 python3-pip"
if [[ $ubuntu_codename == "xenial" ]]; then
$SUDO sh -c "pip3 install numpy==1.18.5"
else
$SUDO sh -c "pip3 install numpy"
fi

echo "- Done."
3 changes: 0 additions & 3 deletions artifacts/get_started_tutorial/README_GET_STARTED.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,6 @@ We assume you already build and install Grinder following the *Environment Prepa

The goal of this tutorial is to demonstrate how to compile and optimize a typical DNN model with control flow, and showcase the performance improvement with Grinder compiler.

**For AE Reviewers**: The nico cluster we provide for artifact evaluation is managed by slurm. To run GPU-related commands, please use `srun --pty --exclusive` before the original command, which will submit the job to the compute node (nico[3-4]). For your convenience, we have included this prefix in our artifact but will remove it in the final version. If you are running the artifact on your own machine, please remember to remove the prefix.


## Run PyTorch, TensorFlow, and JAX baselines

```bash
Expand Down
9 changes: 9 additions & 0 deletions artifacts/kernel_db/README_KERNEL_DB.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Kernel DB for GrinderBase and Grinder

The `reproduce_kernel_db.sh` scripts will leverage AutoTVM, Ansor, Roller, and manual implementation to generate kernels. The result kernels will be injected in to a kernel database, located in *~/.cache/nnfusion/kernel_cache.db*, which will be finally loaded by NNFusion.

This folder contains the following contents:
* `*_kernels` folders: the tuning result of each source
* `db`: scripts for injecting kernels into the kernel database, adapted from [https://github.com/microsoft/nnfusion/tree/osdi20_artifact/artifacts/kernel_db/kernel_db_scripts](https://github.com/microsoft/nnfusion/tree/osdi20_artifact/artifacts/kernel_db/kernel_db_scripts)
* `roller`: the source code of Roller, adapted from [https://github.com/microsoft/nnfusion/tree/osdi22_artifact/artifacts/roller](https://github.com/microsoft/nnfusion/tree/osdi22_artifact/artifacts/roller)
* `test_config`: the TVM implementation of each operator, adapted from [https://github.com/microsoft/nnfusion/tree/osdi22_artifact/artifacts/roller/test_config](https://github.com/microsoft/nnfusion/tree/osdi22_artifact/artifacts/roller/test_config)

0 comments on commit f2d7df3

Please sign in to comment.