Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor async engine & turbomind IO #2968

Merged
merged 44 commits into from
Jan 10, 2025
Merged

Conversation

lzhangzz
Copy link
Collaborator

@lzhangzz lzhangzz commented Dec 27, 2024

TODO

  • output logits
  • output logprobs
  • output hidden states

@zhyncs
Copy link
Collaborator

zhyncs commented Dec 27, 2024

It’s amazing!

@lvhan028
Copy link
Collaborator

FINALLY!!

@lvhan028
Copy link
Collaborator

lvhan028 commented Jan 6, 2025

when backend is pt engine, the pipeline cannot be destroyed successfully.

from lmdeploy import pipeline, PytorchEngineConfig
model_path =internlm2_5-7b-chatengine_config = PytorchEngineConfig()
pipe = pipeline(model_path, backend_config=engine_config, log_level='INFO')
response = pipe('hi')

@lvhan028
Copy link
Collaborator

lvhan028 commented Jan 7, 2025

I will fix get_ppl and saving csv in profile_throughput in another PR

@lvhan028 lvhan028 removed the WIP label Jan 9, 2025
@zhulinJulia24
Copy link
Collaborator

@lzhangzz output tokens doubles when ignore_eos is true

from lmdeploy.messages import (GenerationConfig, PytorchEngineConfig,
                               TurbomindEngineConfig)
from lmdeploy import pipeline

gen_config = GenerationConfig(ignore_eos=True, max_new_tokens=10)
response = pipe(['Hi, pls intro yourself', 'Shanghai is'], gen_config=gen_config)
print(response)

I suppose the generate_token_len shoud be 10. but it's 20 actually, the output is:

[Response(text='你好!我是书生·浦语,由你好!我是书生·浦语,由', generate_token_len=20, input_token_len=108, finish_reason='length', token_ids=[77230, 60477, 68734, 60628, 60384, 60721, 62442, 60752, 60353, 60620, 77230, 60477, 68734, 60628, 60384, 60721, 62442, 60752, 60353, 60620], logprobs=None, logits=None, last_hidden_state=None, index=0), Response(text='上海,作为中国最大的城市之一,不仅是中国上海,作为中国最大的城市之一,不仅是中国', generate_token_len=20, input_token_len=105, finish_reason='length', token_ids=[68589, 60353, 68429, 68277, 69410, 68494, 68538, 60353, 68710, 70543, 68589, 60353, 68429, 68277, 69410, 68494, 68538, 60353, 68710, 70543], logprobs=None, logits=None, last_hidden_state=None, index=1)]

@lvhan028 lvhan028 mentioned this pull request Jan 9, 2025
1 task
@lvhan028 lvhan028 merged commit c25381f into InternLM:main Jan 10, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants