-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for pipeline decoder model #729
Conversation
qq: is this for pipeline parallelism? |
434c66e
to
0e19447
Compare
No, the work here is not intended for pipeline parallelism. However, it could potentially be useful in pipeline parallelism. Sorry for the late response. |
e3f9253
to
4b57fcf
Compare
…into baijumeswani/phi3-pipeline
could you please add an unit test to cover the pipeline? |
…into baijumeswani/phi3-pipeline
…into baijumeswani/phi3-pipeline
…into baijumeswani/phi3-pipeline
1920b1e
to
df68b3f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you all for the review. :) |
This pull request adds support for scenarios where the decoder only model is split into multiple smaller onnx models.
onnxruntime-genai will execute these models in a pipeline fashion. The pipeline is expected to be defined in the genai config.
Update to the GenAI config
The outputs of the previous model in the pipeline can be fed into inputs of the next model in the pipeline. Configurations exposed to the user:
filename
: file name of the onnx model in the pipeline. Required field.session_options
: EachPipelineModel
can define its ownSessionOptions
. In essence, that pipeline model session will use this session options to execute. By not provided, the default session options are used.run_on_first_token_generation
: Whether that pipeline model should be run on the first token generation. Default: truerun_on_nth_token_generation
: Whether that pipeline model should be run on nth token generation (n != 1
). Default: trueoutput_names_forwarder
: In case output names from the previous model do not align with input names of the following model, this mapping can be defined in the config.inputs
: input names of the pipeline model. Required field.outputs
: output names of the pipeline model. Required field.Example pipeline
Here we have split the decoder model in 3 parts:
In the above example, the outputs of the embedding pipeline model are fed into the inputs of the transformer model. Similarly, the outputs of the transformer model are fed into the inputs of the transformer model head pipeline model.
Assumptions and limitations
Where can this feature be used
Co-authors: @edgchen1 @ajindal1