-
Notifications
You must be signed in to change notification settings - Fork 190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Static llm pipeline dynamic shape model #1240
base: master
Are you sure you want to change the base?
Static llm pipeline dynamic shape model #1240
Conversation
src/cpp/src/llm_pipeline_static.cpp
Outdated
int64_t position_ids_data = prompt_len -1; | ||
std::vector<int64_t> attention_mask_data(1, prompt_len); | ||
int64_t position_ids_data = prompt_len - 1; | ||
std::vector<int64_t> attention_mask_data(prompt_len - 1, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LOL, @TolyaTalamanov !!
6cdd518
to
cc34616
Compare
306ab4a
to
7c8ff06
Compare
### Details: - *item1* - *...* ### Related PRs: - GenAI: *openvinotoolkit/openvino.genai#1240 ### Tickets: - *ticket-id* --------- Co-authored-by: TolyaTalamanov <[email protected]>
src/cpp/src/llm_pipeline_static.hpp
Outdated
const ov::AnyMap& config); | ||
}; | ||
|
||
class SMStaticLLMPipeline : public LLMPipelineImplBase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SMStaticLLMPipeline sounds misleading, let's not focus on the point that it is a "single model" pipeline (as people got used to do a different thing here).
The CPU/GPU's pipeline is called Stateful* if I get it right.
So as this one is still static, let's call it StaticStatefulLLMPipeline?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, sure!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed!
src/cpp/src/llm_pipeline_static.hpp
Outdated
namespace genai { | ||
|
||
struct StaticLLMPipelineFactory { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there could be a better namespace job done, clearly statik::
could be a namespace here, so we'd get statik::StatelessLLMPipeline
(the old class) and statik::StatefulLLMPipeline
(the new one).
statik::
is picked to avoid the clash with the keyword, it could be Static::
too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dmatveev @TolyaTalamanov , please help to disambiguate: we allow user to pass dynamic stateful OpenVINO model into our new pipeline, where we are hiding things like converting of model to static and making it stateless. Should we this way still name the pipeline as static_llm::StatefulLLMPipeline
, as it still works with the static and stateless models inside? Or it can really be named as Stateful
because now, LLMCompiledModel
, which it creates, doesn't expose to user additional inputs and outputs, that correspond to states. (However, I don't know if by this logic the pipeline is still static)
src/cpp/src/llm_pipeline_static.cpp
Outdated
|
||
update_config(properties, {"NPU_USE_NPUW", "YES"}); | ||
update_config(properties, {"NPUW_LLM", "YES"}); | ||
update_config(properties, {"NPUW_LLM_MODEL_DESC", model_desc_to_string(model_desc)}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since it is C++, can we use a C++ structure directly here? Or the option is exposed as string only?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A downside - change in the option structure when it is a string will never break the build, but a change in the structure type will.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can, but it is not obvious where to define this structure, since OpenVINO NPUW code is not exposed as Public API. However, we can pass it as map<std::string, std::string>
or map<std::string, ov::Any>
, if you think it is better. Current implementation via std::string
ensures that compiled_model.get_property("NPUW_LLM_MODEL_DESC")
will print something meaningful (but it is not a requirement).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think?
src/cpp/src/llm_pipeline_static.cpp
Outdated
const uint32_t kMaxPromptLen = pop_int_and_cast(properties, "MAX_PROMPT_LEN").value_or(1024u); | ||
const uint32_t kMinResponseLen = pop_int_and_cast(properties, "MIN_RESPONSE_LEN").value_or(128u); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see the above code, it was aligned to 64. Probably it makes sense to unify how these options are handled between the two classes (without overdesign)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I did the alignment inside of the LLMCompiledModel
constructor in the NPUW. Do you think I need to remove it there and instead do alignment here? I did it in LLMCompiledModel
, since I thought it might be of implementation detail..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think?
auto decode_start_time = std::chrono::steady_clock::now(); | ||
DecodedResults decoded_results = {m_tokenizer.decode(encoded_results.tokens), encoded_results.scores}; | ||
auto decode_stop_time = std::chrono::steady_clock::now(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd highly recommend to use smt like https://github.com/openvinotoolkit/openvino/blob/master/src/plugins/intel_npu/src/plugin/npuw/perf.cpp#L9 - but it is clearly not for this PR (cc: @TolyaTalamanov )
m_request.set_tensor("input_ids", ov::Tensor(ov::element::i64, ov::Shape{1,1}, (void*)&input_ids_data)); | ||
m_request.set_tensor("position_ids", ov::Tensor(ov::element::i64, ov::Shape{1,1}, (void*)&position_ids_data)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a point in having these two set all the time - the data could've been updated in-place for these tensors - for the future @TolyaTalamanov
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is true, I think we discussed something like that already. And the conclusion was to check contracts of GenAI pipeline to understand what should be passed into the request. I don't remember the outcome, might be GenAI dynamic pipeline always pass some input, I need to check it.
src/cpp/src/llm_pipeline_static.cpp
Outdated
|
||
// TODO: How to check that KV-Cache is full? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still an open? I believe the llm_infer_request reports it via throw now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is true! However, I don't know if this should be exception or just end of the generation stage with the warning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed!
src/cpp/src/llm_pipeline_static.cpp
Outdated
const std::string& device, | ||
const ov::AnyMap& config) { | ||
auto properties = config; | ||
const auto use_sm_pipeline = pop_or_default(properties, "USE_SM_PIPELINE", false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't it be false
or NO
? Or maybe it shouldn't a binary option either?
we also need to be careful about the option name choice here.. And tbh I don't have a good name here in mind.
It shouldn't be a public-looking option, that's for sure. Maybe it shouldn't be a configurable option at all but an env var, like we did for memory allocation - but that'd complicate testing in the existing environments.
NPU_PIPELINE = STATEFUL
(as opposed to the today's STATELESS
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! Will fix this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
User can set false
and NO
, OpenVINO ov::Any
can parse it to the bool
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed!
74ceb28
to
b00d987
Compare
…genai into at/static-llm-pipeline-dynamic-shape-model
ec8c27f
to
34c3fc0
Compare
34c3fc0
to
c52bd12
Compare
Related PRs: