Support for pipeline decoder model #729

baijumeswani · 2024-07-29T04:38:36Z

This pull request adds support for scenarios where the decoder only model is split into multiple smaller onnx models.

onnxruntime-genai will execute these models in a pipeline fashion. The pipeline is expected to be defined in the genai config.

Update to the GenAI config

The outputs of the previous model in the pipeline can be fed into inputs of the next model in the pipeline. Configurations exposed to the user:

filename: file name of the onnx model in the pipeline. Required field.
session_options: Each PipelineModel can define its own SessionOptions. In essence, that pipeline model session will use this session options to execute. By not provided, the default session options are used.
run_on_first_token_generation: Whether that pipeline model should be run on the first token generation. Default: true
run_on_nth_token_generation: Whether that pipeline model should be run on nth token generation (n != 1). Default: true
output_names_forwarder: In case output names from the previous model do not align with input names of the following model, this mapping can be defined in the config.
inputs: input names of the pipeline model. Required field.
outputs: output names of the pipeline model. Required field.

Example pipeline

Here we have split the decoder model in 3 parts:

Embeddings
Transformer
Model head

"pipeline": [
    {
        "embedding": {
            "filename": "phi-3-embedding.onnx",
            "inputs": [
                "input_ids"
            ],
            "outputs": [
                "inputs_embeds"
            ]
        },
        "transformer_model": {
            "filename": "phi-3-transformer.onnx",
            "inputs": [
                "inputs_embeds",
                "past_keys_0",
                "past_values_0",
                "..."
            ],
            "outputs": [
                "transformer_output",
                "present_keys_0",
                "present_values_0",
                "..."
            ],
        },
        "transformer_head": {
            "filename": "phi-3-transformer-head.onnx",
            "inputs": [
                "transformer_output"
            ],
            "outputs": [
                "logits"
            ]
        }
    }
]

In the above example, the outputs of the embedding pipeline model are fed into the inputs of the transformer model. Similarly, the outputs of the transformer model are fed into the inputs of the transformer model head pipeline model.

Assumptions and limitations

Final model inputs and outputs are expected to be the same as is currently supported in decoder only models. No other inputs/outputs are managed by the pipeline model. Inputs/outputs managed by the pipeline model:
- input_ids (input)
- kv cache (input)
- attention_mask (input)
- logits (output)
- kv cache (output)
The managed inputs and outputs (listed above) must be allocated on the device where the search is expected to take place. i.e. the pipeline does not move/copy data from one device to another after a session runs.
The intermediate (unmanaged inputs and outputs must reside on CPU) and the ort session is responsible to make copies to and from the device (-host) the session options is registered for.

Where can this feature be used

In scenarios where the user would like to run different parts of the model using different session options.
In scenarios where it is hard to combine multiple smaller models into one big model (due to limitations of the model or device being used to run the model).

Co-authors: @edgchen1 @ajindal1

wangyems · 2024-07-29T17:51:10Z

qq: is this for pipeline parallelism?

baijumeswani · 2024-08-06T21:12:43Z

qq: is this for pipeline parallelism?

No, the work here is not intended for pipeline parallelism. However, it could potentially be useful in pipeline parallelism. Sorry for the late response.

…into baijumeswani/phi3-pipeline

yufenglee · 2024-08-09T16:48:00Z

could you please add an unit test to cover the pipeline?

src/models/model.cpp

…into baijumeswani/phi3-pipeline

src/config.cpp

src/models/decoder_only_pipeline.cpp

src/models/model.h

…into baijumeswani/phi3-pipeline

test/test_models/pipeline-model/genai_config.json

src/config.cpp

src/models/decoder_only_pipeline.cpp

src/models/model.cpp

src/config.h

yufenglee

src/models/kv_cache.cpp

baijumeswani · 2024-09-19T00:04:42Z

Thank you all for the review. :)

baijumeswani added 2 commits August 6, 2024 13:31

Support for pipeline decoder model

2841344

Changes to support running complex pipeline models

0e19447

baijumeswani force-pushed the baijumeswani/phi3-pipeline branch from 434c66e to 0e19447 Compare August 6, 2024 20:33

Remove curr seq length and past seq length from config

9aac807

Remove curr and past sequence length checks

4b57fcf

baijumeswani force-pushed the baijumeswani/phi3-pipeline branch from e3f9253 to 4b57fcf Compare August 6, 2024 21:20

baijumeswani added 2 commits August 7, 2024 11:28

Merge branch 'main' of https://github.com/microsoft/onnxruntime-genai …

0f0d6b9

…into baijumeswani/phi3-pipeline

Merge conflict changes

a69800b

baijumeswani requested review from yufenglee and RyanUnderhill August 8, 2024 00:00

yufenglee reviewed Aug 9, 2024

View reviewed changes

src/models/model.cpp Show resolved Hide resolved

baijumeswani added 2 commits August 14, 2024 19:51

Merge branch 'main' of https://github.com/microsoft/onnxruntime-genai …

0549b7c

…into baijumeswani/phi3-pipeline

temp commit

7651e00

RyanUnderhill reviewed Aug 15, 2024

View reviewed changes

src/config.cpp Show resolved Hide resolved

src/config.cpp Show resolved Hide resolved

src/config.cpp Outdated Show resolved Hide resolved

src/models/decoder_only_pipeline.cpp Outdated Show resolved Hide resolved

src/models/model.h Show resolved Hide resolved

baijumeswani added 5 commits August 16, 2024 05:58

Add test and address pull request review comments

5c9aa8d

Merge branch 'main' of https://github.com/microsoft/onnxruntime-genai …

17ff01b

…into baijumeswani/phi3-pipeline

Address pipeline failures

0ddd0af

Merge branch 'main' of https://github.com/microsoft/onnxruntime-genai …

e07cfb8

…into baijumeswani/phi3-pipeline

Add weight sharing and curr and past sequence length inputs

df68b3f

baijumeswani force-pushed the baijumeswani/phi3-pipeline branch from 1920b1e to df68b3f Compare September 18, 2024 16:56

baijumeswani added 2 commits September 18, 2024 10:21

Fix merge conflicts

9351c5c

Remove unnecessary changes

6ea214d

kunal-vaishnavi reviewed Sep 18, 2024

View reviewed changes

test/test_models/pipeline-model/genai_config.json Show resolved Hide resolved

kunal-vaishnavi reviewed Sep 18, 2024

View reviewed changes

src/config.cpp Outdated Show resolved Hide resolved

Address pull-request review comments

5225064

RyanUnderhill reviewed Sep 18, 2024

View reviewed changes

src/models/decoder_only_pipeline.cpp Show resolved Hide resolved

src/models/decoder_only_pipeline.cpp Outdated Show resolved Hide resolved

src/models/decoder_only_pipeline.cpp Outdated Show resolved Hide resolved

src/models/model.cpp Outdated Show resolved Hide resolved

Address pull-request review comments

38857be

yufenglee reviewed Sep 18, 2024

View reviewed changes

src/config.h Outdated Show resolved Hide resolved

yufenglee reviewed Sep 18, 2024

View reviewed changes

src/config.h Outdated Show resolved Hide resolved

Address pull-request review comments

fd792b3

yufenglee approved these changes Sep 18, 2024

View reviewed changes

edgchen1 reviewed Sep 18, 2024

View reviewed changes

src/models/kv_cache.cpp Outdated Show resolved Hide resolved

Address pull-request review comments

9b8fbd5

kunal-vaishnavi approved these changes Sep 18, 2024

View reviewed changes

RyanUnderhill approved these changes Sep 18, 2024

View reviewed changes

baijumeswani merged commit f81b6eb into main Sep 19, 2024
13 checks passed

baijumeswani deleted the baijumeswani/phi3-pipeline branch September 19, 2024 00:04

baijumeswani mentioned this pull request Sep 20, 2024

Optimize Decoder Pipeline Model Execution #907

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for pipeline decoder model #729

Support for pipeline decoder model #729

baijumeswani commented Jul 29, 2024 •

edited

Loading

wangyems commented Jul 29, 2024

baijumeswani commented Aug 6, 2024

yufenglee commented Aug 9, 2024

yufenglee left a comment

baijumeswani commented Sep 19, 2024

Support for pipeline decoder model #729

Support for pipeline decoder model #729

Conversation

baijumeswani commented Jul 29, 2024 • edited Loading

Update to the GenAI config

Example pipeline

Assumptions and limitations

Where can this feature be used

wangyems commented Jul 29, 2024

baijumeswani commented Aug 6, 2024

yufenglee commented Aug 9, 2024

yufenglee left a comment

Choose a reason for hiding this comment

baijumeswani commented Sep 19, 2024

baijumeswani commented Jul 29, 2024 •

edited

Loading