Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

djl-inference:0.29.0-tensorrtllm0.11.0-cu124 regression: has no attribute 'to_word_list_format' #2293

Open
lxning opened this issue Aug 7, 2024 · 4 comments
Assignees
Labels
bug Something isn't working

Comments

@lxning
Copy link

lxning commented Aug 7, 2024

Description

(A clear and concise description of what the bug is.)

There are 2 different behavior in LMI trtllm containers during testing gsm8k dataset via lm_eval_harness on model llama-2-7b.

  • djl-inference:0.28.0-tensorrtllm0.9.0-cu122: lm_eval_harness is able to generate report
  • djl-inference:0.29.0-tensorrtllm0.11.0-cu124: lm_eval_harness failed due to error from container

Expected Behavior

(what's the expected behavior?)
Expect lm_eval_harness is able to generate report when the djl-inference:0.29.0-tensorrtllm0.11.0-cu124 is applied.

Error Message

(Paste the complete error message, including stack trace.)

Error log in djl-inference:0.29.0-tensorrtllm0.11.0-cu124

[INFO ] 2024-08-07 17:45:57 ModelServer - Initialize BOTH server with: EpollServerSocketChannel.
[INFO ] 2024-08-07 17:45:57 ModelServer - BOTH API bind to: http://0.0.0.0:8080
[WARN ] 2024-08-07 18:01:48 PyProcess - W-20055-model-stderr: [1,0]<stderr>:No chat template is set for this tokenizer, falling back to a default class-level template. This is very error-prone, because models are often trained with templates different from the class default! Default chat templates are a legacy feature and will be removed in Transformers v4.43, at which point any code depending on them will stop working. We recommend setting a valid chat template before then to ensure that this model continues working without issues.
[INFO ] 2024-08-07 18:01:48 PyProcess - W-20055-model-stdout: [1,0]<stdout>:Rolling batch inference error
[INFO ] 2024-08-07 18:01:48 PyProcess - W-20055-model-stdout: [1,0]<stdout>:Traceback (most recent call last):
[INFO ] 2024-08-07 18:01:48 PyProcess - W-20055-model-stdout: [1,0]<stdout>:  File "/tmp/.djl.ai/python/0.29.0/djl_python/rolling_batch/rolling_batch.py", line 48, in try_catch_handling
[INFO ] 2024-08-07 18:01:48 PyProcess - W-20055-model-stdout: [1,0]<stdout>:    return func(self, *args, **kwargs)
[INFO ] 2024-08-07 18:01:48 PyProcess - W-20055-model-stdout: [1,0]<stdout>:  File "/tmp/.djl.ai/python/0.29.0/djl_python/rolling_batch/trtllm_rolling_batch.py", line 108, in inference
[INFO ] 2024-08-07 18:01:48 PyProcess - W-20055-model-stdout: [1,0]<stdout>:    response = self.model.generate(request.input_text, **param)
[INFO ] 2024-08-07 18:01:48 PyProcess - W-20055-model-stdout: [1,0]<stdout>:  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm_toolkit/trtllmmodel/modelbuilder.py", line 268, in generate
[INFO ] 2024-08-07 18:01:48 PyProcess - W-20055-model-stdout: [1,0]<stdout>:    final_kwargs = self._prepare_inputs_for_generation(inputs, **parameters)
[INFO ] 2024-08-07 18:01:48 PyProcess - W-20055-model-stdout: [1,0]<stdout>:  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm_toolkit/trtllmmodel/modelbuilder.py", line 341, in _prepare_inputs_for_generation
[INFO ] 2024-08-07 18:01:48 PyProcess - W-20055-model-stdout: [1,0]<stdout>:    parameters["stop_words_list"] = tensorrt_llm.runtime.to_word_list_format(stop_sequences, self.tokenizer)
[INFO ] 2024-08-07 18:01:48 PyProcess - W-20055-model-stdout: [1,0]<stdout>:AttributeError: module 'tensorrt_llm.runtime' has no attribute 'to_word_list_format'
[INFO ] 2024-08-07 18:01:49 PyProcess - W-20055-model-stdout: [1,0]<stdout>:Rolling batch inference error
[INFO ] 2024-08-07 18:01:49 PyProcess - W-20055-model-stdout: [1,0]<stdout>:Traceback (most recent call last):
[INFO ] 2024-08-07 18:01:49 PyProcess - W-20055-model-stdout: [1,0]<stdout>:  File "/tmp/.djl.ai/python/0.29.0/djl_python/rolling_batch/rolling_batch.py", line 48, in try_catch_handling
[INFO ] 2024-08-07 18:01:49 PyProcess - W-20055-model-stdout: [1,0]<stdout>:    return func(self, *args, **kwargs)
[INFO ] 2024-08-07 18:01:49 PyProcess - W-20055-model-stdout: [1,0]<stdout>:  File "/tmp/.djl.ai/python/0.29.0/djl_python/rolling_batch/trtllm_rolling_batch.py", line 108, in inference
[INFO ] 2024-08-07 18:01:49 PyProcess - W-20055-model-stdout: [1,0]<stdout>:    response = self.model.generate(request.input_text, **param)
[INFO ] 2024-08-07 18:01:49 PyProcess - W-20055-model-stdout: [1,0]<stdout>:  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm_toolkit/trtllmmodel/modelbuilder.py", line 268, in generate
[INFO ] 2024-08-07 18:01:49 PyProcess - W-20055-model-stdout: [1,0]<stdout>:    final_kwargs = self._prepare_inputs_for_generation(inputs, **parameters)
[INFO ] 2024-08-07 18:01:49 PyProcess - W-20055-model-stdout: [1,0]<stdout>:  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm_toolkit/trtllmmodel/modelbuilder.py", line 341, in _prepare_inputs_for_generation
[INFO ] 2024-08-07 18:01:49 PyProcess - W-20055-model-stdout: [1,0]<stdout>:    parameters["stop_words_list"] = tensorrt_llm.runtime.to_word_list_format(stop_sequences, self.tokenizer)
[INFO ] 2024-08-07 18:01:49 PyProcess - W-20055-model-stdout: [1,0]<stdout>:AttributeError: module 'tensorrt_llm.runtime' has no attribute 'to_word_list_format'
[INFO ] 2024-08-07 18:01:50 PyProcess - W-20055-model-stdout: [1,0]<stdout>:Rolling batch inference error
[INFO ] 2024-08-07 18:01:50 PyProcess - W-20055-model-stdout: [1,0]<stdout>:Traceback (most recent call last):
[INFO ] 2024-08-07 18:01:50 PyProcess - W-20055-model-stdout: [1,0]<stdout>:  File "/tmp/.djl.ai/python/0.29.0/djl_python/rolling_batch/rolling_batch.py", line 48, in try_catch_handling
[INFO ] 2024-08-07 18:01:50 PyProcess - W-20055-model-stdout: [1,0]<stdout>:    return func(self, *args, **kwargs)
[INFO ] 2024-08-07 18:01:50 PyProcess - W-20055-model-stdout: [1,0]<stdout>:  File "/tmp/.djl.ai/python/0.29.0/djl_python/rolling_batch/trtllm_rolling_batch.py", line 108, in inference
[INFO ] 2024-08-07 18:01:50 PyProcess - W-20055-model-stdout: [1,0]<stdout>:    response = self.model.generate(request.input_text, **param)
[INFO ] 2024-08-07 18:01:50 PyProcess - W-20055-model-stdout: [1,0]<stdout>:  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm_toolkit/trtllmmodel/modelbuilder.py", line 268, in generate
[INFO ] 2024-08-07 18:01:50 PyProcess - W-20055-model-stdout: [1,0]<stdout>:    final_kwargs = self._prepare_inputs_for_generation(inputs, **parameters)
[INFO ] 2024-08-07 18:01:50 PyProcess - W-20055-model-stdout: [1,0]<stdout>:  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm_toolkit/trtllmmodel/modelbuilder.py", line 341, in _prepare_inputs_for_generation
[INFO ] 2024-08-07 18:01:50 PyProcess - W-20055-model-stdout: [1,0]<stdout>:    parameters["stop_words_list"] = tensorrt_llm.runtime.to_word_list_format(stop_sequences, self.tokenizer)
[INFO ] 2024-08-07 18:01:50 PyProcess - W-20055-model-stdout: [1,0]<stdout>:AttributeError: module 'tensorrt_llm.runtime' has no attribute 'to_word_list_format'

How to Reproduce?

(If you developed your own code, please provide a short script that reproduces the error. For existing examples, please provide link.)

Steps to reproduce

(Paste the commands you ran that produced the error.)

  1. aws s3 sync s3://djl-llm/llama-2-7b-hf/ llama-2-7b-hf/

  2. docker run -it --gpus all --shm-size 20g -v /home/ubuntu/trtllm/llama-2-7b:/opt/ml/model -p 8080:8080 763104351884.dkr.ecr.us-east-1.amazonaws.com/djl-inference:0.29.0-tensorrtllm0.11.0-cu124

  3. lm_eval --model local-chat-completions --tasks gsm8k_cot_zeroshot --model_args model=meta-llama/Meta-Llama-2-7B,base_url=http://localhost:8080/v1/chat/completions/model,tokenized_requests=True --limit 10 --apply_chat_template --write_out --log_samples --output_path ~/trtllm/lm_eval/output_llama-2-7b-gsm8k_cot_zeroshot_v11

2024-08-07:18:01:48,442 INFO     [evaluator_utils.py:200] Request: Instance(request_type='generate_until', doc={'question': "Janet’s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?", 'answer': 'Janet sells 16 - 3 - 4 = <<16-3-4=9>>9 duck eggs a day.\nShe makes 9 * 2 = $<<9*2=18>>18 every day at the farmer’s market.\n#### 18'}, arguments=(JsonChatStr(prompt='[{"role": "user", "content": "Q: Janet\\u2019s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers\' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers\' market?\\nA: Let\'s think step by step."}]'), {'until': ['Q:', '</s>', '<|im_end|>'], 'do_sample': False}), idx=0, metadata=('gsm8k_cot_zeroshot', 0, 1), resps=[], filtered_resps={}, task_name='gsm8k_cot_zeroshot', doc_id=0, repeats=1)
2024-08-07:18:01:48,442 INFO     [evaluator.py:457] Running generate_until requests
Requesting API:   0%|                                                           | 0/10 [00:00<?, ?it/s]2024-08-07:18:01:48,539 WARNING  [api_models.py:342] API request failed with error message: {"error":"module \u0027tensorrt_llm.runtime\u0027 has no attribute \u0027to_word_list_format\u0027","code":424}. Retrying...
2024-08-07:18:01:49,550 WARNING  [api_models.py:342] API request failed with error message: {"error":"module \u0027tensorrt_llm.runtime\u0027 has no attribute \u0027to_word_list_format\u0027","code":424}. Retrying...
2024-08-07:18:01:50,558 WARNING  [api_models.py:342] API request failed with error message: {"error":"module \u0027tensorrt_llm.runtime\u0027 has no attribute \u0027to_word_list_format\u0027","code":424}. Retrying...
Traceback (most recent call last):
  File "/opt/conda/envs/py310/bin/lm_eval", line 8, in <module>
    sys.exit(cli_evaluate())
  File "/home/ubuntu/lm-evaluation-harness/lm_eval/__main__.py", line 382, in cli_evaluate
    results = evaluator.simple_evaluate(
  File "/home/ubuntu/lm-evaluation-harness/lm_eval/utils.py", line 397, in _wrapper
    return fn(*args, **kwargs)
  File "/home/ubuntu/lm-evaluation-harness/lm_eval/evaluator.py", line 296, in simple_evaluate
    results = evaluate(
  File "/home/ubuntu/lm-evaluation-harness/lm_eval/utils.py", line 397, in _wrapper
    return fn(*args, **kwargs)
  File "/home/ubuntu/lm-evaluation-harness/lm_eval/evaluator.py", line 468, in evaluate
    resps = getattr(lm, reqtype)(cloned_reqs)
  File "/home/ubuntu/lm-evaluation-harness/lm_eval/models/api_models.py", line 562, in generate_until
    outputs = retry(
  File "/opt/conda/envs/py310/lib/python3.10/site-packages/tenacity/__init__.py", line 336, in wrapped_f
    return copy(f, *args, **kw)
  File "/opt/conda/envs/py310/lib/python3.10/site-packages/tenacity/__init__.py", line 475, in __call__
    do = self.iter(retry_state=retry_state)
  File "/opt/conda/envs/py310/lib/python3.10/site-packages/tenacity/__init__.py", line 376, in iter
    result = action(retry_state)
  File "/opt/conda/envs/py310/lib/python3.10/site-packages/tenacity/__init__.py", line 418, in exc_check
    raise retry_exc.reraise()
  File "/opt/conda/envs/py310/lib/python3.10/site-packages/tenacity/__init__.py", line 185, in reraise
    raise self.last_attempt.result()
  File "/opt/conda/envs/py310/lib/python3.10/concurrent/futures/_base.py", line 451, in result
    return self.__get_result()
  File "/opt/conda/envs/py310/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/opt/conda/envs/py310/lib/python3.10/site-packages/tenacity/__init__.py", line 478, in __call__
    result = fn(*args, **kwargs)
  File "/home/ubuntu/lm-evaluation-harness/lm_eval/models/api_models.py", line 345, in model_call
    response.raise_for_status()
  File "/opt/conda/envs/py310/lib/python3.10/site-packages/requests/models.py", line 1024, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 424 Client Error: module 'tensorrt_llm.runtime' has no attribute 'to_word_list_format' for url: http://localhost:8080/v1/chat/completions/model
Requesting API:   0%|

What have you tried to solve it?

@lxning lxning added the bug Something isn't working label Aug 7, 2024
@sindhuvahinis sindhuvahinis self-assigned this Aug 12, 2024
@sindhuvahinis
Copy link
Contributor

Thanks for reporting this. Will take a look at it today.

@pdtgct
Copy link

pdtgct commented Aug 18, 2024

I can confirm seeing this issue in djl-inference:0.29.0-tensorrtllm0.11.0-cu124.

Steps to reproduce:

Send a POST request with the stop parameter:

{
  "inputs": "<|begin_of_text|><|start_header_id|>user<|end_header_id|>\nYou are rolling a 12-sided dice twice.\n\nQuestion: Can I win more than once?\n<|eot_id|>\n\n<|start_header_id|>assistant<|end_header_id|> Answer:",
  "parameters": {
    "do_sample": false,
    "details": false,
    "temperature": 0.7,
    "top_p": 0.92,
    "max_new_tokens": 220,
    "stop": ["<|eot_id|>"]
  }
}

Note: the model does not stop on "<|eot_id|>" so the stop parameter is needed.

@sindhuvahinis
Copy link
Contributor

We fixed it the image and released the patched image @lxning try it now.

@pdtgct Could you try with stop_sequences instead of just stop?

@pdtgct
Copy link

pdtgct commented Sep 11, 2024

Thanks, @sindhuvahinis - will try to find some time to confirm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants