Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sampling method for LLM output tokens does not sample from random distribution #1341

Closed
e87tn95h opened this issue Dec 9, 2024 · 1 comment · Fixed by #1347
Closed

Sampling method for LLM output tokens does not sample from random distribution #1341

e87tn95h opened this issue Dec 9, 2024 · 1 comment · Fixed by #1347
Assignees
Labels
bug Something isn't working category: LLM LLM pipeline (stateful, static)
Milestone

Comments

@e87tn95h
Copy link

e87tn95h commented Dec 9, 2024

Describe the bug

In OpenVINO GenAI 2024.5.0, even if do_sample parameter LLMPipeline.generate() or GenerationConfig is True, LLM generated text is the same though run-to-run. It looks like that 2024.4.0 behavior match to my expectation.

Test Model: Converted TinyLlama, got it with below command line.

huggingface-cli download "OpenVINO/TinyLlama-1.1B-Chat-v1.0-int8-ov" --local-dir "TinyLlama-1.1B-Chat-v1.0-int8-ov"

Reproducer Python Script: generate text 3 times.

import openvino_genai as ov_genai
pipe = ov_genai.LLMPipeline("TinyLlama-1.1B-Chat-v1.0-int8-ov", "CPU")

if __name__ == "__main__":
    print(ov_genai.__version__)
    prompt = "The Sun is yellow because"
    print(f"prompt:  {prompt}")
    for i in range(3):
        print(f"--- response {i:02} ---")
        print(pipe.generate(prompt, do_sample=True, max_new_tokens=32))

Outputs with OpenVINO GenAI 2024.5.0 (openvino-genai==2024.5.0)

2024.5.0.0
prompt:  The Sun is yellow because
--- response 00 ---
of its orange hue.
YELLOW2 A miniature tulip, made from clay.
YELLOW3 My Red
--- response 01 ---
of its orange hue.
YELLOW2 A miniature tulip, made from clay.
YELLOW3 My Red
--- response 02 ---
of its orange hue.
YELLOW2 A miniature tulip, made from clay.
YELLOW3 My Red

Outputs with OpenVINO GenAI 2024.4.0 (openvino-genai==2024.4.0)

2024.4.0.0
prompt:  The Sun is yellow because
--- response 00 ---
the concentration of magnesium ion in it is high than others. This results in the excess chloride ion in the salt pile could leach mag
--- response 01 ---
its spectrum    (Visible lights): Sunlight falls on the surface of the earth, turning it orange. AcdeThe Sun is orange because sunlight caused suf
--- response 02 ---
it is hotter than most magista using this teenager. To stick to synechocystis. Anaerobic conditions cannot be called

Note

I may be mistaken, but the root of issue may be here:

ov::genai::EncodedResults result;
if (config.is_beam_search() && is_chat_conversation) {
std::tie(result, m_selected_beam) = beam_search(m_model_runner, input_ids, concatenated_attention_mask,
config, position_ids, m_selected_beam);
} else {
std::vector<SequenceGroup::Ptr> requests;
size_t block_size = 1;
bool enable_prefix_caching = false;
for (size_t request_id = 0; request_id < batch_size; request_id++) {
SequenceGroup::Ptr sequence_group;
if (is_chat_conversation && !m_is_cache_empty) {
sequence_group = std::make_shared<SequenceGroup>(request_id, m_tokenized_chat_history.input_ids, config, block_size, enable_prefix_caching);
} else {
size_t seq_len = input_ids.get_shape().at(1);
size_t batch_offset = request_id * seq_len;
const int64_t* prompt_start = input_ids.data<const int64_t>() + batch_offset;
std::vector<int64_t> tokenized_prompt(prompt_start, prompt_start + seq_len);
sequence_group = std::make_shared<SequenceGroup>(request_id, tokenized_prompt, config, block_size, enable_prefix_caching);
}
sequence_group->set_sequence_group_ptr(sequence_group);
requests.push_back(sequence_group);
}
Sampler sampler = Sampler(m_tokenizer);
std::tie(result, m_selected_beam) = ov::genai::get_lm_encoded_results(m_model_runner, input_ids, concatenated_attention_mask, streamer_ptr,
sampler, requests, position_ids, std::nullopt, m_selected_beam);
}

In this code, Sampler (L284) will be made as local instance for each generate() and it has own RNG (Random Number Generator) instance. However, each RNG objects of C++ will produce the same sequence if not given seed.

Thank you for reading and regards,

@ilya-lavrenov ilya-lavrenov added bug Something isn't working category: LLM LLM pipeline (stateful, static) labels Dec 9, 2024
@ilya-lavrenov
Copy link
Contributor

ilya-lavrenov commented Dec 9, 2024

@e87tn95h thank you for reporting the issue!

As you have investigated the issue, could you please create PR with the fix? Looks like it's required to make sampler to be a class field.

ilya-lavrenov pushed a commit that referenced this issue Dec 10, 2024
For fix wrong behavior in case of the random sampling
#1341
@ilya-lavrenov ilya-lavrenov added this to the 2024.6 milestone Jan 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working category: LLM LLM pipeline (stateful, static)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants