Assumptions

For an operator to be eligible for fusion, it must meet the following conditions:

It has only one input, excluding Constant and initializer type tensors.
It has only one output.
The first dimension of both input and output shapes is annotated with "batch_size".

Therefore, we must first perform a more accurate shape inference, i.e., symbolic shape infer. Run the following command:

python ./tools/symbolic_shape_infer.py --input [input model path] --output [output model path]

Usage

Download the onnxruntime project from https://github.com/microsoft/onnxruntime and build it from source by executing the following commands:
```
git clone https://github.com/microsoft/onnxruntime.git
cd onnxruntime
git apply ./runtime/ort/changes.patches
```
Install the Python package:
```
pip install -e .
```

Examples

We have currently implemented custom CPU ops [Merge and Route] for onnxruntime.

Microbenchmark

In the ./example/micro directory, you can find some files. Follow these instructions to test the functionality for microbenchmark:

cd example/micro
python generate.py
./convert.sh

python fuse.py --num 2
python fuse.py
python test_runtime.py

Transformer Example

In the ./example/transformer directory, follow these instructions to test the functionality. We use two decode layers of the LLaMA model and its LoRA variant as our test models:

cd example/transformer
python generate.py
./convert.sh

python fuse.py
python test_runtime.py

TODO

Generalize input assumptions to handle multiple inputs
Refactor the single Route Op into multiple specialized Route Ops.
Fix height = 256 and width = 256 to obeserve the effect.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Assumptions

Usage

Examples

Microbenchmark

Transformer Example

TODO

Files

README.md

Latest commit

History

README.md

File metadata and controls

Assumptions

Usage

Examples

Microbenchmark

Transformer Example

TODO