Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix streaming hieroglyphs #1492

Conversation

pavel-esir
Copy link
Contributor

@pavel-esir pavel-esir commented Jan 7, 2025

  • When only until n last characters are printed then they are cut in the middle and we get invalid python utf8 byte sequence and they are corrupted to �.
  • Print until n last tokens! This fixed this issue.
visual_language_chat.py ./tiny-random-minicpmv-2_6 ./images <<< $'Describe the images?'
��������������������룅 encouraging룅 encouraging룅 encouraging룅 encouraging룅 encouraging룅 encouraging룅 encouraging

After the fix

룅튜룅튜룅튜룅튜룅튜룅 encouraging룅 encouraging룅 encouraging룅 encouraging룅 encouraging룅 encouraging

CVS-159227

@pavel-esir pavel-esir added the bug Something isn't working label Jan 7, 2025
@pavel-esir pavel-esir added this to the 2024.6 milestone Jan 7, 2025
@ilya-lavrenov
Copy link
Contributor

ilya-lavrenov commented Jan 7, 2025

is it possible to add tests for streamer?
e.g. w/o model put some tokens and see how they are processed via callback (i.e. we don't receive broken symbols)

@github-actions github-actions bot added category: visual language Visual language pipeline category: LLM LLM pipeline (stateful, static) category: samples GenAI samples labels Jan 13, 2025
@pavel-esir
Copy link
Contributor Author

is it possible to add tests for streamer? e.g. w/o model put some tokens and see how they are processed via callback (i.e. we don't receive broken symbols)

Unfortunately, it was quite difficult to add calling python from c++ and be sure that environment is correct with necessary packages to convert with optimum-cli to openivno_detokenizer.xml.

I upgraded existing tests for streamer and ensure that streamed results are the same as from generate.

@pavel-esir pavel-esir requested a review from Wovchena January 13, 2025 08:54
@@ -361,25 +361,47 @@ def test_callback_batch_fail(callback):
pipe.generate(['1', '2'], ov_genai.GenerationConfig(), callback)


@pytest.mark.parametrize("callback", [print, user_defined_callback, lambda subword: print(subword)])
class StremerWithResults:
Copy link
Contributor

@ilya-lavrenov ilya-lavrenov Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to move this to

def run_llm_pipeline(
models_path : Path,
prompts: List[str],
generation_config : GenerationConfig,
use_cb : bool = False
) -> List[GenerationResult]:
properties = get_default_properties()
if use_cb:
properties['scheduler_config'] = SchedulerConfig()
ov_pipe = LLMPipeline(models_path, device='CPU', **properties)
generate_outputs : DecodedResults = ov_pipe.generate(inputs=prompts, generation_config=generation_config)
index = 0
generation_results = []
for _ in prompts:
generation_result = GenerationResult()
generation_result.m_generation_ids = generate_outputs.texts[index : index + generation_config.num_return_sequences]
# sequences_scores are available only for beam search case
if generation_config.is_beam_search():
generation_result.m_scores = generate_outputs.scores[index : index + generation_config.num_return_sequences]
generation_results.append(generation_result)
index += generation_config.num_return_sequences
del ov_pipe
shutil.rmtree(models_path)
return generation_results

because it covers significantly more cases (but limit it to cases when we have a single batch and num_return_sequences is 1)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. But it's present only in master. This PR is for release branch.
I will do that when cherry-pick to master.

pipe.generate(prompt, generation_config=get_greedy(), streamer=streamer)
result_from_streamer = []
res = pipe.generate(prompt, generation_config=get_greedy(), streamer=streamer)
assert res.texts[0] == ''.join(result_from_streamer)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need to duplicate it in VLM pipeline, because internally both VLM and LLM use the same TextStreamer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This VLM model is a good example where this streamer was broken in the first place but since we were not comparing here we didn't catch it from the beginning. This assert comparisons costs us almost nothing, but can prevent from mistakes.

…10-llamafied_ov and all existing UTF8 problems
@andrei-kochin andrei-kochin modified the milestones: 2024.6, 2025.0 Jan 13, 2025
@pavel-esir pavel-esir added the port to master PR needs to be ported to master from release branch label Jan 13, 2025
@ilya-lavrenov ilya-lavrenov added this pull request to the merge queue Jan 13, 2025
Merged via the queue into openvinotoolkit:releases/2024/6 with commit 9cf1601 Jan 13, 2025
51 of 52 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working category: LLM LLM pipeline (stateful, static) category: samples GenAI samples category: visual language Visual language pipeline no-match-files port to master PR needs to be ported to master from release branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants