Skip to content

Latest commit

 

History

History
70 lines (47 loc) · 1.98 KB

README.md

File metadata and controls

70 lines (47 loc) · 1.98 KB

Assumptions

For an operator to be eligible for fusion, it must meet the following conditions:

  1. It has only one input, excluding Constant and initializer type tensors.
  2. It has only one output.
  3. The first dimension of both input and output shapes is annotated with "batch_size".

Therefore, we must first perform a more accurate shape inference, i.e., symbolic shape infer. Run the following command:

python ./tools/symbolic_shape_infer.py --input [input model path] --output [output model path]

Usage

  1. Download the onnxruntime project from https://github.com/microsoft/onnxruntime and build it from source by executing the following commands:

    git clone https://github.com/microsoft/onnxruntime.git
    cd onnxruntime
    git apply ./runtime/ort/changes.patches
  2. Install the Python package:

    pip install -e .

Examples

We have currently implemented custom CPU ops [Merge and Route] for onnxruntime.

Microbenchmark

In the ./example/micro directory, you can find some files. Follow these instructions to test the functionality for microbenchmark:

cd example/micro
python generate.py
./convert.sh

python fuse.py --num 2
python fuse.py
python test_runtime.py

Transformer Example

In the ./example/transformer directory, follow these instructions to test the functionality. We use two decode layers of the LLaMA model and its LoRA variant as our test models:

cd example/transformer
python generate.py
./convert.sh

python fuse.py
python test_runtime.py

TODO

  • Generalize input assumptions to handle multiple inputs
  • Refactor the single Route Op into multiple specialized Route Ops.
  • Fix height = 256 and width = 256 to obeserve the effect.