-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update streaming in LM Encoding & CB #1377
Conversation
src/cpp/src/lm_encoding.cpp
Outdated
@@ -126,7 +126,7 @@ std::pair<EncodedResults, int32_t> get_lm_encoded_results( | |||
get_active_sequence_groups), | |||
active_sequence_groups.end()); | |||
|
|||
while (active_sequence_groups.size() > 0) { | |||
do { | |||
size_t total_num_tokens = 0; | |||
|
|||
for (auto& sequence_group : active_sequence_groups) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW
@sbalandi should we use active_sequence_groups
instead of sequence_groups
when compute beam_offets
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks like yes, we can use active_sequence_groups
here, but that variant is ok too, as in non active num_running_seqs will be 0
I'll check it to be sure and fix in #1215
…nai into streaming_lm_encoding
if (streamer_ptr && generations.at(0).get()->can_read()) { | ||
std::unordered_map<uint64_t, GenerationOutput> token = generations.at(0).get()->back(); | ||
for (const auto& gen_token : token.begin()->second.generated_ids) { | ||
if (!streamer_ptr->put(gen_token)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
copy-pasted incorrect code?
According to documentation
openvino.genai/src/cpp/include/openvino/genai/streamer_base.hpp
Lines 18 to 20 in 9bcadf7
/// @brief put is called every time new token is decoded, | |
/// @return bool flag to indicate whether generation should be stopped, if return true generation stops | |
virtual bool put(int64_t token) = 0; |
we need to break when
put
returns true
OPENVINO_ASSERT(1 == token.size()); | ||
OPENVINO_ASSERT(1 == token.begin()->second.generated_ids.size()); | ||
continue_generation = !streamer_ptr->put(token.begin()->second.generated_ids.at(0)); | ||
for (const auto& gen_token : token.begin()->second.generated_ids) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
continue_generation
assignment is dropped here and hence, we can have abandoned requests with allocated block tables as drop_requests()
is not called below.
No description provided.