rename project and remove some script

microsoft · xysmlx · Apr 24, 2023 · Jul 15, 2021 · Jul 15, 2021 · Jul 15, 2021
commit 071ded7b35cabcce68b160fcbae0adebcabcb015
diff --git a/artifacts/Figure19/README.md b/artifacts/Figure19/README.md
@@ -8,7 +8,7 @@ logout
 ssh root@impreza0 -p 31703
 cd Figure19 && ./run_jax.sh # about 10 min
 logout
-# in grinder-ae docker
+# in cocktailer-ae docker
 ssh root@impreza0 -p 31705
 cd Figure19 && ./run_in_sys_docker.sh # about 1 hour
 ```
diff --git a/artifacts/INSTALL.md b/artifacts/INSTALL.md
@@ -59,7 +59,7 @@ cd .. # to $YOUR_DIR_FOR_NNFUSION/nnfusion
 mkdir build && cd build && cmake .. && make -j
 ```
 
-## Pytorch & Grinder
+## Pytorch & Cocktailer
 ```bash
 conda create python=3.7 --name controlflow -y
 conda activate controlflow

diff --git a/artifacts/README.md b/artifacts/README.md
@@ -1,14 +1,14 @@
-# OSDI'23 Grinder Artifacts Evaluation
+# OSDI'23 Cocktailer Artifacts Evaluation
 
 ## 0. Overview
-This code branch is used for OSDI'23 Artifact Evaluation of paper #628, titled "Grinder: Analysis and Optimization for Dynamic Control Flow in Deep Learning".
+This code branch is used for OSDI'23 Artifact Evaluation of paper #628, titled "Cocktailer: Analysis and Optimization for Dynamic Control Flow in Deep Learning".
 
 ### Evaluation Setup
 * Artifacts Available:
-    * All Grinder related code are available under NNFusion open-source project located in: [https://github.com/microsoft/nnfusion/tree/TODO](https://github.com/microsoft/nnfusion/tree/TODO)
+    * All Cocktailer related code are available under NNFusion open-source project located in: [https://github.com/microsoft/nnfusion/tree/TODO](https://github.com/microsoft/nnfusion/tree/TODO)
 * Artifacts Functional:
-    * *Documentation*: the following of documents include detailed guidelines on how to build, install, test Grinder and the experiments to compare with other baselines.
-    * *Completeness*: the [C++ part](..) of Grinder has been merged into NNFusion in this branch, and the [Python part](ast_analyzer) is available in this artifact.
+    * *Documentation*: the following of documents include detailed guidelines on how to build, install, test Cocktailer and the experiments to compare with other baselines.
+    * *Completeness*: the [C++ part](..) of Cocktailer has been merged into NNFusion in this branch, and the [Python part](ast_analyzer) is available in this artifact.
     * *Exercisability*: under the *artifacts* folder, we prepare all the script and data to reproduce the experiements in individual folders named by the figure name in paper.
 * Results Reproduced:
     * To reproduce the main results presented in our paper, we provide Docker images containing all the environments and baseline software, and machines with the same configurations as we used in paper evaluation. We also provide detailed guideline to help reproduce the results step by step. 
@@ -19,28 +19,20 @@ This code branch is used for OSDI'23 Artifact Evaluation of paper #628, titled "
 Please follow the instructions in "Comments for AEC" on HotCRP and skip this section if you want to use the provided environment. The following steps need docker permission which is not provided due to security concerns.
 
 ## NVIDIA GPU
-Please follow the instructions in [INSTALL.md](INSTALL.md) or use the following docker-based script to build and install Grinder.
+Please follow the instructions in [INSTALL.md](INSTALL.md) or use the following docker-based script to build and install Cocktailer.
 ```bash
 cd $YOUR_DIR_FOR_NNFUSION
 git clone https://github.com/microsoft/nnfusion.git --branch TODO --single-branch
 cd nnfusion/artifacts
-docker build -t grinder -f env/Dockerfile.nv .
+docker build -t cocktailer -f env/Dockerfile.nv .
 chmod 777 $YOUR_DIR_FOR_NNFUSION/nnfusion
-docker run -it --gpus all --name grinder-ae -v $YOUR_DIR_FOR_NNFUSION/nnfusion:/root/nnfusion --shm-size="32g" -w /root/nnfusion/artifacts grinder:latest /bin/bash
+docker run -it --gpus all --name cocktailer-ae -v $YOUR_DIR_FOR_NNFUSION/nnfusion:/root/nnfusion --shm-size="32g" -w /root/nnfusion/artifacts cocktailer:latest /bin/bash
 # run inside docker
 bash ./env/install_in_docker.sh
 ```
 
-adapted (TODO: remove)
-```bash
-docker build --network=host -t grinder -f env/Dockerfile.nv .
-docker run -it --gpus all --name heheda-grinder-ae -v /home/heheda/control_flow/nnfusion-docker:/root/nnfusion -v /home/heheda/control_flow/kernel_db.docker:/root/.cache/nnfusion -w /root/nnfusion/artifacts --privileged=true --shm-size="32g" --network=host grinder:latest /bin/bash
-srun -p AE -w nico1 --pty --exclusive docker exec -it heheda-grinder-ae bash ./run_nv_gpu.sh
-permission: chmod 777 the two folders, config not to /dev/shm
-```
-
 ## AMD GPU
-Please prepare four dockers for running JAX, TensorFlow, TVM, PyTorch \& Grinder respectively.
+Please prepare four dockers for running JAX, TensorFlow, TVM, PyTorch \& Cocktailer respectively.
 * download code
     ```bash
     cd $YOUR_DIR_FOR_NNFUSION
@@ -69,11 +61,11 @@ Please prepare four dockers for running JAX, TensorFlow, TVM, PyTorch \& Grinder
     docker build -t tvm_rocm_cuda:latest -f env/Dockerfile.tvm.rocm --network=host .
     docker run -it --device=/dev/kfd --device=/dev/dri --name tvm-ae -v $YOUR_DIR_FOR_NNFUSION/kernel_db:/root/.cache/nnfusion -v $YOUR_DIR_FOR_NNFUSION/nnfusion:/root/nnfusion -w /root/nnfusion/artifacts -e ARTIFACT_ROOT=/root/nnfusion/artifacts tvm_rocm_cuda /bin/bash
     ```
-* Build and run grinder docker
+* Build and run cocktailer docker
     ```bash
     cd $YOUR_DIR_FOR_NNFUSION/nnfusion/artifacts
-    docker build -t grinder:latest -f env/Dockerfile.rocm --network=host .
-    docker run -it --device=/dev/kfd --device=/dev/dri --name grinder-ae -v $YOUR_DIR_FOR_NNFUSION/kernel_db:/root/.cache/nnfusion -v $YOUR_DIR_FOR_NNFUSION/nnfusion:/root/nnfusion -w /root/nnfusion/artifacts -e ARTIFACT_ROOT=/root/nnfusion/artifacts grinder /bin/bash
+    docker build -t cocktailer:latest -f env/Dockerfile.rocm --network=host .
+    docker run -it --device=/dev/kfd --device=/dev/dri --name cocktailer-ae -v $YOUR_DIR_FOR_NNFUSION/kernel_db:/root/.cache/nnfusion -v $YOUR_DIR_FOR_NNFUSION/nnfusion:/root/nnfusion -w /root/nnfusion/artifacts -e ARTIFACT_ROOT=/root/nnfusion/artifacts cocktailer /bin/bash
     # run inside docker
     bash ./env/install_in_rocm_docker.sh
     ```
@@ -99,7 +91,7 @@ Please prepare four dockers for running JAX, TensorFlow, TVM, PyTorch \& Grinder
     │   │   └── tatoeba-eng-fra
     ```
 
-* Generates all kernels for Grinder. More details can be found in [README_KERNEL_DB.md](kernel_db/README_KERNEL_DB.md).
+* Generates all kernels for Cocktailer. More details can be found in [README_KERNEL_DB.md](kernel_db/README_KERNEL_DB.md).
     **NOTE**: this process will take about 20 minutes for each architecture if using the tuning result in the artifact, or longer if you want to re-tune the kernels.
     * NVIDIA GPU
         ```bash

diff --git a/artifacts/get_started_tutorial/README_GET_STARTED.md b/artifacts/get_started_tutorial/README_GET_STARTED.md
@@ -1,7 +1,7 @@
-# Get Start Tutorial: Compile a NASRNN model with Grinder
-We assume you already build and install Grinder following the *Environment Preparation* section in [README.md](../README.md).
+# Get Start Tutorial: Compile a NASRNN model with Cocktailer
+We assume you already build and install Cocktailer following the *Environment Preparation* section in [README.md](../README.md).
 
-The goal of this tutorial is to demonstrate how to compile and optimize a typical DNN model with control flow, and showcase the performance improvement with Grinder compiler.
+The goal of this tutorial is to demonstrate how to compile and optimize a typical DNN model with control flow, and showcase the performance improvement with Cocktailer compiler.
 
 ## Run PyTorch, TensorFlow, and JAX baselines
 
@@ -38,11 +38,11 @@ Summary: [min, max, mean] = [297.036409, 335.924387, 323.820636] ms
 Summary: [min, max, mean] = [43.358564, 43.553829, 43.469448] ms
 ```
 
-## Run GrinderBase and Grinder
+## Run CocktailerBase and Cocktailer
 
 ## Prepare kernel database
 
-Grinder needs the source code of dataflow operators to generate the optimized code. The source code of operators in NASRNN includes BatchMatMul generated by Roller, and built-in element-wise operators in [NNFusion](https://github.com/microsoft/nnfusion/tree/main/src/nnfusion/core/kernels/cuda_gpu/kernels). (TODO: check branch) Below is the script to generate and save the BatchMatmul kernel.
+Cocktailer needs the source code of dataflow operators to generate the optimized code. The source code of operators in NASRNN includes BatchMatMul generated by Roller, and built-in element-wise operators in [NNFusion](https://github.com/microsoft/nnfusion/tree/main/src/nnfusion/core/kernels/cuda_gpu/kernels). (TODO: check branch) Below is the script to generate and save the BatchMatmul kernel.
 
 ```bash
 export ARTIFACT_ROOT=TODO
@@ -57,7 +57,7 @@ cd $ARTIFACT_ROOT/get_started_tutorial
 
 After that, you can get a kernel database file in `~/.cache/nnfusion/kernel_cache.db`. NNFusion will automatically detect this path and import these kernels.
 
-## Run GrinderBase
+## Run CocktailerBase
 ```bash
 export ARTIFACT_ROOT=TODO
 cd $ARTIFACT_ROOT/get_started_tutorial
@@ -85,9 +85,9 @@ tensor equals!
 100 iters, min = 65.4466 ms, max = 66.7112 ms, avg = 65.8735 ms
 ```
 
-The `forward` function in the output is the python code executed during time measurement, which accelerates the basic block of the model (locate at `/dev/shm/$USER/controlflow/base_nasrnn_bs64_0/forward` and `/dev/shm/$USER/controlflow/base_nasrnn_bs64_2/forward`) and relies on PyTorch for executing the control flows. The `tensor equals!` indicates that the output of GrinderBase matches that of PyTorch.
+The `forward` function in the output is the python code executed during time measurement, which accelerates the basic block of the model (locate at `/dev/shm/$USER/controlflow/base_nasrnn_bs64_0/forward` and `/dev/shm/$USER/controlflow/base_nasrnn_bs64_2/forward`) and relies on PyTorch for executing the control flows. The `tensor equals!` indicates that the output of CocktailerBase matches that of PyTorch.
 
-## Run Grinder
+## Run Cocktailer
 ```bash
 export ARTIFACT_ROOT=TODO
 cd $ARTIFACT_ROOT/get_started_tutorial
@@ -110,11 +110,11 @@ tensor equals!
 100 iters, min = 25.3108 ms, max = 25.7773 ms, avg = 25.3788 ms
 ```
 
-The generated code of Grinder is located at `/dev/shm/$USER/controlflow/nasrnn_bs64_0/forward`. The `Best flag` indicates the schedule result of Grinder. The `tensor equals!` indicates that the output of Grinder matches that of PyTorch.
+The generated code of Cocktailer is located at `/dev/shm/$USER/controlflow/nasrnn_bs64_0/forward`. The `Best flag` indicates the schedule result of Cocktailer. The `tensor equals!` indicates that the output of Cocktailer matches that of PyTorch.
 
 ## Summary
-The following table summarizes the above experiments. Grinder achieves $1.71\times$ speedup against the fastest baseline (JAX).
+The following table summarizes the above experiments. Cocktailer achieves $1.71\times$ speedup against the fastest baseline (JAX).
 
-|             | TorchScript | TensorFlow | JAX | GrinderBase | Grinder |
+|             | TorchScript | TensorFlow | JAX | CocktailerBase | Cocktailer |
 |:-----------:|:------:|:--:|:----:|:--:|:---:|
 | **Time (ms)**   |    108.87    |  323.82 |   43.47  |  65.87 |  25.38  |
diff --git a/artifacts/kernel_db/README_KERNEL_DB.md b/artifacts/kernel_db/README_KERNEL_DB.md
@@ -1,4 +1,4 @@
-# Kernel DB for GrinderBase and Grinder
+# Kernel DB for CocktailerBase and Cocktailer
 
 The `reproduce_kernel_db.sh` scripts will leverage AutoTVM, Ansor, Roller, and manual implementation to generate kernels. The result kernels will be injected in to a kernel database, located in *~/.cache/nnfusion/kernel_cache.db*, which will be finally loaded by NNFusion.
 

diff --git a/artifacts/plot/common.py b/artifacts/plot/common.py
@@ -1,7 +1,7 @@
 import numpy as np
 import re
 
-sys_name = "Grinder"
+sys_name = "Cocktailer"
 
 line_markers = [
     'x',

diff --git a/artifacts/plot/figure14.py b/artifacts/plot/figure14.py
@@ -7,7 +7,7 @@
 
 figure_id = 14
 
-sys =      ['TorchScript', 'TensorFlow', 'JAX+JIT', 'GrinderBase', sys_name]
+sys =      ['TorchScript', 'TensorFlow', 'JAX+JIT', 'CocktailerBase', sys_name]
 
 hatch_def = [
     '..',

diff --git a/artifacts/plot/figure19.py b/artifacts/plot/figure19.py
@@ -7,7 +7,7 @@
 
 figure_id = 19
 
-sys =      ['TorchScript', 'TensorFlow', 'JAX+JIT', 'GrinderBase', sys_name]
+sys =      ['TorchScript', 'TensorFlow', 'JAX+JIT', 'CocktailerBase', sys_name]
 
 hatch_def = [
     '..',

diff --git a/artifacts/plot/figure20.py b/artifacts/plot/figure20.py
@@ -22,7 +22,7 @@
     '',
 ]
 
-sys_general = ['GrinderBase', 'schedule', 'optimize & schedule']
+sys_general = ['CocktailerBase', 'schedule', 'optimize & schedule']
 
 def get_log_from(filename: str):
     result_dir = f'../reproduce_results/Figure{figure_id}'
@@ -50,7 +50,7 @@ def get_log_from(filename: str):
 blockdrop = get_log_from('blockdrop.b1.log')
 skipnet = get_log_from('skipnet.b1.log')
 
-sys_recursive = ['GrinderBase', 'serial schedule', 'stack in global memory', 'stack in shared memory', 'parallel schedule']
+sys_recursive = ['CocktailerBase', 'serial schedule', 'stack in global memory', 'stack in shared memory', 'parallel schedule']
 rae = [
     parse_time(f'../reproduce_results/Figure{figure_id}/base/rae.b1.log'),
     parse_time(f'../reproduce_results/Figure{figure_id}/schedule/rae.opt1.b1.log'),