Skip to content

xwhzz/model_fuse

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Assumptions

For an operator to be eligible for fusion, it must meet the following conditions:

  1. It has only one input, excluding Constant and initializer type tensors.
  2. It has only one output.
  3. The first dimension of both input and output shapes is annotated with "batch_size".

Therefore, we must first perform a more accurate shape inference, i.e., symbolic shape infer. Run the following command:

python ./tools/symbolic_shape_infer.py --input [input model path] --output [output model path]

Usage

  1. Download the onnxruntime project from https://github.com/microsoft/onnxruntime and build it from source by executing the following commands:

    git clone https://github.com/microsoft/onnxruntime.git
    cd onnxruntime
    git apply ./runtime/ort/changes.patches
  2. Install the Python package:

    pip install -e .

Examples

We have currently implemented custom CPU ops [Merge and Route] for onnxruntime.

Microbenchmark

In the ./example/micro directory, you can find some files. Follow these instructions to test the functionality for microbenchmark:

cd example/micro
python generate.py
./convert.sh

python fuse.py --num 2
python fuse.py
python test_runtime.py

Transformer Example

In the ./example/transformer directory, follow these instructions to test the functionality. We use two decode layers of the LLaMA model and its LoRA variant as our test models:

cd example/transformer
python generate.py
./convert.sh

python fuse.py
python test_runtime.py

TODO

  • Generalize input assumptions to handle multiple inputs
  • Refactor the single Route Op into multiple specialized Route Ops.
  • Fix height = 256 and width = 256 to obeserve the effect.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published