add more guides

microsoft · xysmlx · Apr 24, 2023 · Jul 15, 2021 · Jul 15, 2021 · Jul 15, 2021
commit f2d7df38f5c778319b975f369b74dfcdceeffc1d
diff --git a/.gitignore b/.gitignore
@@ -54,4 +54,6 @@ nnfusion_rt/
 models/frozenmodels/
 
 artifacts/data
-artifacts/reproduce_results
+artifacts/reproduce_results
+*.onnx
+*.tfgraph
diff --git a/artifacts/INSTALL.md b/artifacts/INSTALL.md
@@ -0,0 +1,93 @@
+# Installation Tutorial
+This document describes how to install the software used in the artifact on a node with NVIDIA GPU. All scripts are assumed to be run from `nnfusion/artifacts` directory.
+
+## Prerequirements
+We assume that you have a node with NVIDIA GPU and CUDA installed. We also assume that you have installed conda and nvcc. If you have not installed conda, you can install it by following the instructions [here](https://docs.conda.io/projects/conda/en/latest/user-guide/install/linux.html) (Miniconda is enough, and this artifact assumes that miniconda is installed at the default path `~/miniconda3`). If you have not installed nvcc, you can install it by following the instructions [here](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html).
+
+## TensorFlow
+The onnx-tf for TF 1.15 needs to be built from source because the pre-compiled version depends on TF2. We also fix some bugs in that commit to properly support the control flow operations. The following commands will prepare the conda env for TF 1.15.
+
+```bash
+conda create python=3.8 --name baseline_tf1 -y
+conda activate baseline_tf1
+pip install nvidia-pyindex
+pip install -r env/requirements_tf.txt
+mkdir -p third-party && cd third-party
+git clone https://github.com/onnx/onnx-tensorflow.git
+cd onnx-tensorflow
+git checkout 0e4f4836 # v1.7.0-tf-1.15m
+git apply ../../env/onnx_tf.patch
+pip install -e .
+conda deactivate
+```
+## JAX
+The following commands will prepare the conda env for JAX.
+```bash
+conda create python=3.8 --name baseline_jax -y
+conda activate baseline_jax
+pip install nvidia-pyindex
+pip install -r env/requirements_jax.txt -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html -f https://download.pytorch.org/whl/torch_stable.html
+conda deactivate
+```
+
+## TVM
+The following commands will prepare the conda env for TVM.
+```bash
+conda create python==3.8 --name kerneldb -y
+conda activate kerneldb
+pip install ply==3.11
+mkdir -p third-party && cd third-party
+git clone https://github.com/apache/tvm.git
+cd tvm
+git checkout 22ba6523c
+git submodule init && git submodule update
+git apply ../../env/tvm.patch
+mkdir build
+cd build
+cp ../../../env/tvm.config.cmake config.cmake
+make -j
+cd ../python
+pip install -e .
+```
+
+## NNFusion
+The following commands will build nnfusion. Please use the [script](../maint/script/install_dependency.sh) (needs sudo) to prepare the environment for nnfusion before running the following commands.
+
+```bash
+cd .. # to $YOUR_DIR_FOR_NNFUSION/nnfusion
+mkdir build && cd build && cmake .. && make -j
+```
+
+## Pytorch & Grinder
+```bash
+conda create python=3.7 --name grinder -y
+conda activate grinder
+pip install nvidia-pyindex
+pip install -r env/requirements_pytorch.txt -f https://download.pytorch.org/whl/torch_stable.html
+pip install -e .
+conda deactivate
+```
+
+TODO get data
+
+TODO prepare kerneldb
+
+docker: --shm-size="32g"
+docker build -t grinder:latest -f env/Dockerfile.rocm --network=host .
+
+
+
+
+
+assume running at artifacts directory
+
+
+## Pre-requisites
+conda, nvcc ......
+
+
+
+
+srun --pty -w nico3 -p Long --exclusive ./run_nv_gpu.sh 
+
+cd plot && ./plot_nv.sh && cd -
diff --git a/artifacts/README.md b/artifacts/README.md
@@ -1,88 +1,41 @@
-# Installation of Evaluated Systems
-assume running at artifacts directory
+# OSDI'23 Grinder Artifacts Evaluation
 
+## 0. Overview
+This code branch is used for OSDI'23 Artifact Evaluation of paper #628, titled "Grinder: Analysis and Optimization for Dynamic Control Flow in Deep Learning".
 
-## Pre-requisites
-conda, nvcc ......
+### Evaluation Setup
+* Artifacts Available:
+    * All Grinder related code are available under NNFusion open-source project located in: [https://github.com/microsoft/nnfusion/tree/TODO](https://github.com/microsoft/nnfusion/tree/TODO)
+* Artifacts Functional:
+    * *Documentation*: the following of documents include detailed guidelines on how to build, install, test Grinder and the experiments to compare with other baselines.
+    * *Completeness*: the [C++ part](..) of Grinder has been merged into NNFusion in this branch, and the [Python part](ast_analyzer) is available in this artifact.
+    * *Exercisability*: under the *artifacts* folder, we prepare all the script and data to reproduce the experiements in individual folders named by the figure name in paper.
+* Results Reproduced:
+    * To reproduce the main results presented in our paper, we provide Docker images containing all the environments and baseline software, and machines with the same configurations as we used in paper evaluation. We also provide detailed guideline to help reproduce the results step by step. 
 
-## TensorFlow
-install from env/requirements_tf.txt
-Install onnx-tf from source (the pre-compiled version depends on TF2)
+## 1. Environment Preparation
 
-```bash
-conda create python=3.8 --name baseline_tf1 -y
-conda activate baseline_tf1
-pip install nvidia-pyindex
-pip install -r env/requirements_tf.txt
-mkdir third-party && cd third-party
-git clone https://github.com/onnx/onnx-tensorflow.git
-cd onnx-tensorflow
-git checkout 0e4f4836 # v1.7.0-tf-1.15m
-git apply ../../env/onnx_tf.patch
-pip install -e .
-conda deactivate
-```
-## JAX
-```bash
-conda create python=3.8 --name baseline_jax -y
-conda activate baseline_jax
-pip install nvidia-pyindex
-pip install -r env/requirements_jax.txt -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html -f https://download.pytorch.org/whl/torch_stable.html
-conda deactivate
-```
+**For AE Reviewers**:
+1. The nico cluster we provide for artifact evaluation is managed by slurm. To run GPU-related commands, please use `srun --pty --exclusive` before the original command, which will submit the job to the compute node (nico[3-4]). For your convenience, we have included this prefix in our artifact but will remove it in the final version. If you are running the artifact on your own machine, please remember to remove the prefix.
+2. Due to security concerns, we cannot provide the docker permission to reviewers. Instead, for NVIDIA GPU, we provide an account with all the dependencies installed, and for AMD GPU, we provide ssh access into the dockers. You can skip this environment preparation section.
 
-# TVM
-```bash
-conda create python==3.8 --name kerneldb -y
-pip install ply==3.11
-mkdir third-party && cd third-party
-git clone https://github.com/apache/tvm.git --recursive
-cd tvm
-git checkout 22ba6523c
-git apply ../../env/tvm.patch
-mkdir build
-cd build
-cp ../../../env/tvm.config.cmake config.cmake
-make -j
-cd ../python
-pip install -e .
+## NVIDIA GPU
 ```
-
-## NNFusion
-
-## Pytorch & Grinder
-```bash
-conda create python=3.7 --name grinder -y
-conda activate grinder
-pip install nvidia-pyindex
-pip install -r env/requirements_pytorch.txt -f https://download.pytorch.org/whl/torch_stable.html
-conda deactivate
+cd $YOUR_DIR_FOR_NNFUSION
+git clone https://github.com/microsoft/nnfusion.git --branch TODO --single-branch
+cd nnfusion/artifacts
+docker build -t grinder -f env/Dockerfile.nv .
+docker run -it --name grinder-ae -v $YOUR_DIR_FOR_NNFUSION/nnfusion:/root/nnfusion grinder:latest --shm-size="32g" /bin/bash
 ```
 
-## Grinder (with code)
-```bash
-export ARTIFACT_ROOT=***/ControlFlow/artifacts TODO
-cd $ARTIFACT_ROOT/..
-pip install -e .
-```
-TODO install nnfusion
-TODO prepare kerneldb
+adapted (TODO: remove)
+docker build --network=host -t grinder -f env/Dockerfile.nv .
+docker run -it --name heheda-grinder-ae -v /home/heheda/control_flow/nnfusion-docker:/root/nnfusion --shm-size="32g" --network=host grinder:latest /bin/bash
 
+TODO get data
 
-docker: --shm-size="32g"
-docker build -t grinder:latest -f env/Dockerfile.rocm --network=host .
-
-cmake ..
-```
-cd $ARTIFACT_ROOT/../nnfusion
-mkdir build && cd build
-cmake .. && make -j
-cd $ARTIFACT_ROOT/..
-pip install -e . 
-TODO: config.py
-```
-
-# build jax docker
+## AMD GPU
+* build jax docker
 ```bash
 mkdir third-party && cd third-party
 git clone https://github.com/google/jax.git
@@ -91,8 +44,37 @@ git checkout 0282b4bfad
 git apply ../../env/jax.rocm.patch
 ./build/rocm/ci_build.sh --keep_image bash -c "./build/rocm/build_rocm.sh"
 ```
+TODO get data
+
+## 2. Getting Started with a Simple Example
+
+* Go to the *get_started_tutorial/* folder and follow [README_GET_STARTED.md](get_started_tutorial/README_GET_STARTED.md).
+
+
+## 3. Kernel Generation
+This step generates all kernels for Grinder. More details can be found in [README_KERNEL_DB.md](kernel_db/README_KERNEL_DB.md).
+**NOTE**: this process will take about TODO hours.
+```bash
+# assume running at nnfusion/artifacts directory
+cd kernel_db
+srun --pty --exclusive ./reproduce_kernel_db.sh
+```
+
+## 4. Reproducing Individual Experiement Results
+**NOTE**: we provide a script named "run_nv_gpu.sh" to run the experiments except Figure19. You can use `./run_nv_gpu.sh` to run the experiments. TODO: explain the run of Figure 19.
 
+**For AE Reviewers**: Please use `srun --pty -w nico3 --exclusive ./run_nv_gpu.sh ` to submit the jobs to the compute node of the provided cluster. TODO: 是否需要 -p Long?
 
-srun --pty -w nico3 -p Long --exclusive ./run_nv_gpu.sh 
+| Experiments   | Figure # in Paper |  Script Location |
+| -----------     | -----------  |  ----------- |
+| #1. Control flow overhead in JAX | Figure 2 | N/A (use the results in Figure 15, 16, and 18) |
+| #2. End-to-end DNN inference on NVIDIA V100 GPU | Figure 14 | [run.sh](Figure14/run.sh) |
+| #3. Control flow overhead of models with loops | Figure 15 | [run.sh](Figure15/run.sh) |
+| #4. Control flow overhead of models with branches | Figure 16 | [run.sh](Figure16/run.sh) |
+| #5. Different ratio of executed layers | Figure 17 | [run.sh](Figure17/run.sh) |
+| #6. Control flow overhead of RAE with recursion | Figure 18 | [run.sh](Figure18/run.sh) |
+| #7. End-to-end DNN inference on ROCm MI100 GPU with BS=1 | Figure 19 | [run.sh](Figure19/run.sh) TODO |
+| #8. Breakdown of models with BS=1 | Figure 20 | [run.sh](Figure20/run.sh)|
 
-cd plot && ./plot_nv.sh && cd -
+## 5. Reproduce the Figures in the paper
+TODO (how to draw figure 19?)
diff --git a/artifacts/ast_analyzer/utils/config.py b/artifacts/ast_analyzer/utils/config.py
@@ -5,7 +5,7 @@
 # config start
 KERNELDB_REQUEST_FNAME="kerneldb_request.log"
 TMP_DIR = f"/dev/shm/{getpass.getuser()}/grinder"
-NNFUSION_ROOT = os.path.normpath(os.path.join(os.path.dirname(os.path.realpath(__file__)), '../..'))
+NNFUSION_ROOT = os.path.normpath(os.path.join(os.path.dirname(os.path.realpath(__file__)), '../../..'))
 KERNELDB_PATH = os.path.expanduser(f"/tmp/{getpass.getuser()}/kernel_cache.db")
 NUM_GPU = 8
 # config end

diff --git a/artifacts/env/Dockerfile.nv b/artifacts/env/Dockerfile.nv
@@ -12,4 +12,4 @@ RUN source /root/miniconda3/etc/profile.d/conda.sh && conda create python=3.8 --
 RUN source /root/miniconda3/etc/profile.d/conda.sh && conda create python=3.8 --name baseline_jax -y && conda activate baseline_jax && pip install nvidia-pyindex && pip install -r env/requirements_jax.txt -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html -f https://download.pytorch.org/whl/torch_stable.html && conda deactivate
 RUN source /root/miniconda3/etc/profile.d/conda.sh && conda create python=3.7 --name grinder -y && conda activate grinder && pip install nvidia-pyindex && pip install -r env/requirements_pytorch.txt -f https://download.pytorch.org/whl/torch_stable.html && conda deactivate
 RUN source /root/miniconda3/etc/profile.d/conda.sh && conda create python==3.8 --name kerneldb -y && pip install ply==3.11 && mkdir -p third-party && cd third-party && git clone https://github.com/apache/tvm.git && cd tvm && git checkout 22ba6523c && git submodule init && git submodule update && git apply ../../env/tvm.patch && mkdir build && cd build && cp ../cmake/config.cmake config.cmake && sed -i "s/USE_CUDA OFF/USE_CUDA ON/g" config.cmake && sed -i "s/USE_LLVM OFF/USE_LLVM ON/g" config.cmake && cmake .. && make -j && cd ../python && pip install -e .
-RUN apt-get install -y libgflags-dev libsqlite3-dev libcurl4-openssl-dev curl libcurl4-openssl-dev
+RUN cd env && bash install_nnfusion_dependency.sh && cd ..
diff --git a/artifacts/env/install_grinder.sh b/artifacts/env/install_grinder.sh
@@ -0,0 +1,8 @@
+#!/bin/bash
+cd nnfusion
+mkdir build && cd build && cmake .. && make -j && cd -
+
+cd artifacts
+conda activate grinder
+pip install -e .
+conda deactivate
diff --git a/artifacts/env/install_nnfusion_dependency.sh b/artifacts/env/install_nnfusion_dependency.sh
@@ -0,0 +1,76 @@
+#!/bin/bash -e
+
+# Copyright (c) Microsoft Corporation.
+# Licensed under the MIT License.
+
+echo "Running NNFusion install_dependency.sh"
+DEB_PACKAGES="build-essential cmake git curl zlib1g zlib1g-dev libtinfo-dev unzip \
+autoconf automake libtool ca-certificates gdb sqlite3 libsqlite3-dev libcurl4-openssl-dev \
+libprotobuf-dev protobuf-compiler libgflags-dev libgtest-dev"
+
+ubuntu_codename=$(. /etc/os-release;echo $UBUNTU_CODENAME)
+
+if [[ $ubuntu_codename != "focal" ]]; then
+	DEB_PACKAGES="${DEB_PACKAGES} clang-3.9 clang-format-3.9"
+fi
+
+if [[ "$(whoami)" != "root" ]]; then
+	SUDO=sudo
+fi
+
+if ! dpkg -L $DEB_PACKAGES >/dev/null 2>&1; then
+	#Thirdparty deb for ubuntu 18.04(bionic)
+	$SUDO sh -c "apt update && apt install -y --no-install-recommends software-properties-common apt-transport-https ca-certificates gnupg wget"
+	$SUDO sh -c "wget -O - https://apt.kitware.com/keys/kitware-archive-latest.asc 2>/dev/null | gpg --dearmor - | tee /etc/apt/trusted.gpg.d/kitware.gpg >/dev/null"
+	$SUDO sh -c "apt-add-repository 'deb https://apt.kitware.com/ubuntu/ $ubuntu_codename main'"
+	$SUDO sh -c "apt update && apt install -y --no-install-recommends $DEB_PACKAGES"
+
+	if [[ $ubuntu_codename != "focal" ]]; then
+		# Install protobuf 3.6.1 from source
+		$SUDO sh -c "wget https://github.com/protocolbuffers/protobuf/releases/download/v3.6.1/protobuf-cpp-3.6.1.tar.gz -P /tmp"
+		$SUDO sh -c "cd /tmp && tar -xf /tmp/protobuf-cpp-3.6.1.tar.gz && rm /tmp/protobuf-cpp-3.6.1.tar.gz"
+		$SUDO sh -c "cd /tmp/protobuf-3.6.1/ && ./configure && make && make check && make install && ldconfig && rm -rf /tmp/protobuf-3.6.1/"
+	fi
+fi
+
+# if [[ $ubuntu_codename == "focal" ]]; then
+# 	# Install clang-format-3.9
+# 	$SUDO sh -c "cd /tmp && wget https://releases.llvm.org/3.9.0/clang+llvm-3.9.0-x86_64-linux-gnu-ubuntu-16.04.tar.xz && tar -xf clang+llvm-3.9.0-x86_64-linux-gnu-ubuntu-16.04.tar.xz"
+# 	$SUDO sh -c "cp /tmp/clang+llvm-3.9.0-x86_64-linux-gnu-ubuntu-16.04/bin/clang-format /usr/bin/clang-format-3.9 && ln -s /usr/bin/clang-format-3.9 /usr/bin/clang-format"
+# 	$SUDO sh -c "rm -rf /tmp/clang+llvm-3.9.0-x86_64-linux-gnu-ubuntu-16.04/bin/clang-format /tmp/clang+llvm-3.9.0-x86_64-linux-gnu-ubuntu-16.04.tar.xz"
+# fi
+
+echo "- Dependencies are installed in system."
+
+if [ ! -f "/usr/lib/libgtest.a" ]; then
+
+	# if Ubuntu 16.04, we have some dev node using ubuntu 16.04
+	if [[ $ubuntu_codename == "xenial" ]]; then
+		$SUDO sh -c "mkdir /usr/src/googletest && ln -s /usr/src/gtest /usr/src/googletest/googletest"
+	fi
+
+	# Compile gtest
+	$SUDO sh -c "cd /usr/src/googletest/googletest/ && mkdir -p  build && cd build && cmake .. -DCMAKE_CXX_FLAGS=\"-std=c++11\" && make -j"
+
+	if [[ $ubuntu_codename == "focal" ]]; then
+		$SUDO sh -c "cp /usr/src/googletest/googletest/build/lib/libgtest*.a /usr/lib/"
+	else
+		$SUDO sh -c "cp /usr/src/googletest/googletest/build/libgtest*.a /usr/lib/"
+	fi
+
+	$SUDO sh -c "rm -rf /usr/src/googletest/googletest/build"
+	$SUDO sh -c "mkdir /usr/local/lib/googletest"
+	$SUDO sh -c "ln -s /usr/lib/libgtest.a /usr/local/lib/googletest/libgtest.a"
+	$SUDO sh -c "ln -s /usr/lib/libgtest_main.a /usr/local/lib/googletest/libgtest_main.a"
+fi
+echo "- libgtest is installed in system."
+
+# Install numpy
+$SUDO sh -c "apt install -y python3 python3-pip"
+if [[ $ubuntu_codename == "xenial" ]]; then
+	$SUDO sh -c "pip3 install numpy==1.18.5"
+else
+	$SUDO sh -c "pip3 install numpy"
+fi
+
+echo "- Done."
diff --git a/artifacts/get_started_tutorial/README_GET_STARTED.md b/artifacts/get_started_tutorial/README_GET_STARTED.md
@@ -3,9 +3,6 @@ We assume you already build and install Grinder following the *Environment Prepa
 
 The goal of this tutorial is to demonstrate how to compile and optimize a typical DNN model with control flow, and showcase the performance improvement with Grinder compiler.
 
-**For AE Reviewers**: The nico cluster we provide for artifact evaluation is managed by slurm. To run GPU-related commands, please use `srun --pty --exclusive` before the original command, which will submit the job to the compute node (nico[3-4]). For your convenience, we have included this prefix in our artifact but will remove it in the final version. If you are running the artifact on your own machine, please remember to remove the prefix.
-
-
 ## Run PyTorch, TensorFlow, and JAX baselines
 
 ```bash

diff --git a/artifacts/kernel_db/README_KERNEL_DB.md b/artifacts/kernel_db/README_KERNEL_DB.md
@@ -0,0 +1,9 @@
+# Kernel DB for GrinderBase and Grinder
+
+The `reproduce_kernel_db.sh` scripts will leverage AutoTVM, Ansor, Roller, and manual implementation to generate kernels. The result kernels will be injected in to a kernel database, located in *~/.cache/nnfusion/kernel_cache.db*, which will be finally loaded by NNFusion.
+
+This folder contains the following contents:
+* `*_kernels` folders: the tuning result of each source
+* `db`: scripts for injecting kernels into the kernel database, adapted from [https://github.com/microsoft/nnfusion/tree/osdi20_artifact/artifacts/kernel_db/kernel_db_scripts](https://github.com/microsoft/nnfusion/tree/osdi20_artifact/artifacts/kernel_db/kernel_db_scripts)
+* `roller`: the source code of Roller, adapted from [https://github.com/microsoft/nnfusion/tree/osdi22_artifact/artifacts/roller](https://github.com/microsoft/nnfusion/tree/osdi22_artifact/artifacts/roller)
+* `test_config`: the TVM implementation of each operator, adapted from [https://github.com/microsoft/nnfusion/tree/osdi22_artifact/artifacts/roller/test_config](https://github.com/microsoft/nnfusion/tree/osdi22_artifact/artifacts/roller/test_config)