djl-inference:0.29.0-tensorrtllm0.11.0-cu124 regression: has no attribute 'to_word_list_format' #2293

lxning · 2024-08-07T22:29:19Z

Description

(A clear and concise description of what the bug is.)

There are 2 different behavior in LMI trtllm containers during testing gsm8k dataset via lm_eval_harness on model llama-2-7b.

djl-inference:0.28.0-tensorrtllm0.9.0-cu122: lm_eval_harness is able to generate report
djl-inference:0.29.0-tensorrtllm0.11.0-cu124: lm_eval_harness failed due to error from container

Expected Behavior

(what's the expected behavior?)
Expect lm_eval_harness is able to generate report when the djl-inference:0.29.0-tensorrtllm0.11.0-cu124 is applied.

Error Message

(Paste the complete error message, including stack trace.)

Error log in djl-inference:0.29.0-tensorrtllm0.11.0-cu124

[INFO ] 2024-08-07 17:45:57 ModelServer - Initialize BOTH server with: EpollServerSocketChannel.
[INFO ] 2024-08-07 17:45:57 ModelServer - BOTH API bind to: http://0.0.0.0:8080
[WARN ] 2024-08-07 18:01:48 PyProcess - W-20055-model-stderr: [1,0]<stderr>:No chat template is set for this tokenizer, falling back to a default class-level template. This is very error-prone, because models are often trained with templates different from the class default! Default chat templates are a legacy feature and will be removed in Transformers v4.43, at which point any code depending on them will stop working. We recommend setting a valid chat template before then to ensure that this model continues working without issues.
[INFO ] 2024-08-07 18:01:48 PyProcess - W-20055-model-stdout: [1,0]<stdout>:Rolling batch inference error
[INFO ] 2024-08-07 18:01:48 PyProcess - W-20055-model-stdout: [1,0]<stdout>:Traceback (most recent call last):
[INFO ] 2024-08-07 18:01:48 PyProcess - W-20055-model-stdout: [1,0]<stdout>:  File "/tmp/.djl.ai/python/0.29.0/djl_python/rolling_batch/rolling_batch.py", line 48, in try_catch_handling
[INFO ] 2024-08-07 18:01:48 PyProcess - W-20055-model-stdout: [1,0]<stdout>:    return func(self, *args, **kwargs)
[INFO ] 2024-08-07 18:01:48 PyProcess - W-20055-model-stdout: [1,0]<stdout>:  File "/tmp/.djl.ai/python/0.29.0/djl_python/rolling_batch/trtllm_rolling_batch.py", line 108, in inference
[INFO ] 2024-08-07 18:01:48 PyProcess - W-20055-model-stdout: [1,0]<stdout>:    response = self.model.generate(request.input_text, **param)
[INFO ] 2024-08-07 18:01:48 PyProcess - W-20055-model-stdout: [1,0]<stdout>:  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm_toolkit/trtllmmodel/modelbuilder.py", line 268, in generate
[INFO ] 2024-08-07 18:01:48 PyProcess - W-20055-model-stdout: [1,0]<stdout>:    final_kwargs = self._prepare_inputs_for_generation(inputs, **parameters)
[INFO ] 2024-08-07 18:01:48 PyProcess - W-20055-model-stdout: [1,0]<stdout>:  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm_toolkit/trtllmmodel/modelbuilder.py", line 341, in _prepare_inputs_for_generation
[INFO ] 2024-08-07 18:01:48 PyProcess - W-20055-model-stdout: [1,0]<stdout>:    parameters["stop_words_list"] = tensorrt_llm.runtime.to_word_list_format(stop_sequences, self.tokenizer)
[INFO ] 2024-08-07 18:01:48 PyProcess - W-20055-model-stdout: [1,0]<stdout>:AttributeError: module 'tensorrt_llm.runtime' has no attribute 'to_word_list_format'
[INFO ] 2024-08-07 18:01:49 PyProcess - W-20055-model-stdout: [1,0]<stdout>:Rolling batch inference error
[INFO ] 2024-08-07 18:01:49 PyProcess - W-20055-model-stdout: [1,0]<stdout>:Traceback (most recent call last):
[INFO ] 2024-08-07 18:01:49 PyProcess - W-20055-model-stdout: [1,0]<stdout>:  File "/tmp/.djl.ai/python/0.29.0/djl_python/rolling_batch/rolling_batch.py", line 48, in try_catch_handling
[INFO ] 2024-08-07 18:01:49 PyProcess - W-20055-model-stdout: [1,0]<stdout>:    return func(self, *args, **kwargs)
[INFO ] 2024-08-07 18:01:49 PyProcess - W-20055-model-stdout: [1,0]<stdout>:  File "/tmp/.djl.ai/python/0.29.0/djl_python/rolling_batch/trtllm_rolling_batch.py", line 108, in inference
[INFO ] 2024-08-07 18:01:49 PyProcess - W-20055-model-stdout: [1,0]<stdout>:    response = self.model.generate(request.input_text, **param)
[INFO ] 2024-08-07 18:01:49 PyProcess - W-20055-model-stdout: [1,0]<stdout>:  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm_toolkit/trtllmmodel/modelbuilder.py", line 268, in generate
[INFO ] 2024-08-07 18:01:49 PyProcess - W-20055-model-stdout: [1,0]<stdout>:    final_kwargs = self._prepare_inputs_for_generation(inputs, **parameters)
[INFO ] 2024-08-07 18:01:49 PyProcess - W-20055-model-stdout: [1,0]<stdout>:  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm_toolkit/trtllmmodel/modelbuilder.py", line 341, in _prepare_inputs_for_generation
[INFO ] 2024-08-07 18:01:49 PyProcess - W-20055-model-stdout: [1,0]<stdout>:    parameters["stop_words_list"] = tensorrt_llm.runtime.to_word_list_format(stop_sequences, self.tokenizer)
[INFO ] 2024-08-07 18:01:49 PyProcess - W-20055-model-stdout: [1,0]<stdout>:AttributeError: module 'tensorrt_llm.runtime' has no attribute 'to_word_list_format'
[INFO ] 2024-08-07 18:01:50 PyProcess - W-20055-model-stdout: [1,0]<stdout>:Rolling batch inference error
[INFO ] 2024-08-07 18:01:50 PyProcess - W-20055-model-stdout: [1,0]<stdout>:Traceback (most recent call last):
[INFO ] 2024-08-07 18:01:50 PyProcess - W-20055-model-stdout: [1,0]<stdout>:  File "/tmp/.djl.ai/python/0.29.0/djl_python/rolling_batch/rolling_batch.py", line 48, in try_catch_handling
[INFO ] 2024-08-07 18:01:50 PyProcess - W-20055-model-stdout: [1,0]<stdout>:    return func(self, *args, **kwargs)
[INFO ] 2024-08-07 18:01:50 PyProcess - W-20055-model-stdout: [1,0]<stdout>:  File "/tmp/.djl.ai/python/0.29.0/djl_python/rolling_batch/trtllm_rolling_batch.py", line 108, in inference
[INFO ] 2024-08-07 18:01:50 PyProcess - W-20055-model-stdout: [1,0]<stdout>:    response = self.model.generate(request.input_text, **param)
[INFO ] 2024-08-07 18:01:50 PyProcess - W-20055-model-stdout: [1,0]<stdout>:  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm_toolkit/trtllmmodel/modelbuilder.py", line 268, in generate
[INFO ] 2024-08-07 18:01:50 PyProcess - W-20055-model-stdout: [1,0]<stdout>:    final_kwargs = self._prepare_inputs_for_generation(inputs, **parameters)
[INFO ] 2024-08-07 18:01:50 PyProcess - W-20055-model-stdout: [1,0]<stdout>:  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm_toolkit/trtllmmodel/modelbuilder.py", line 341, in _prepare_inputs_for_generation
[INFO ] 2024-08-07 18:01:50 PyProcess - W-20055-model-stdout: [1,0]<stdout>:    parameters["stop_words_list"] = tensorrt_llm.runtime.to_word_list_format(stop_sequences, self.tokenizer)
[INFO ] 2024-08-07 18:01:50 PyProcess - W-20055-model-stdout: [1,0]<stdout>:AttributeError: module 'tensorrt_llm.runtime' has no attribute 'to_word_list_format'

How to Reproduce?

(If you developed your own code, please provide a short script that reproduces the error. For existing examples, please provide link.)

Steps to reproduce

(Paste the commands you ran that produced the error.)

aws s3 sync s3://djl-llm/llama-2-7b-hf/ llama-2-7b-hf/
docker run -it --gpus all --shm-size 20g -v /home/ubuntu/trtllm/llama-2-7b:/opt/ml/model -p 8080:8080 763104351884.dkr.ecr.us-east-1.amazonaws.com/djl-inference:0.29.0-tensorrtllm0.11.0-cu124
lm_eval --model local-chat-completions --tasks gsm8k_cot_zeroshot --model_args model=meta-llama/Meta-Llama-2-7B,base_url=http://localhost:8080/v1/chat/completions/model,tokenized_requests=True --limit 10 --apply_chat_template --write_out --log_samples --output_path ~/trtllm/lm_eval/output_llama-2-7b-gsm8k_cot_zeroshot_v11

2024-08-07:18:01:48,442 INFO     [evaluator_utils.py:200] Request: Instance(request_type='generate_until', doc={'question': "Janet’s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?", 'answer': 'Janet sells 16 - 3 - 4 = <<16-3-4=9>>9 duck eggs a day.\nShe makes 9 * 2 = $<<9*2=18>>18 every day at the farmer’s market.\n#### 18'}, arguments=(JsonChatStr(prompt='[{"role": "user", "content": "Q: Janet\\u2019s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers\' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers\' market?\\nA: Let\'s think step by step."}]'), {'until': ['Q:', '</s>', '<|im_end|>'], 'do_sample': False}), idx=0, metadata=('gsm8k_cot_zeroshot', 0, 1), resps=[], filtered_resps={}, task_name='gsm8k_cot_zeroshot', doc_id=0, repeats=1)
2024-08-07:18:01:48,442 INFO     [evaluator.py:457] Running generate_until requests
Requesting API:   0%|                                                           | 0/10 [00:00<?, ?it/s]2024-08-07:18:01:48,539 WARNING  [api_models.py:342] API request failed with error message: {"error":"module \u0027tensorrt_llm.runtime\u0027 has no attribute \u0027to_word_list_format\u0027","code":424}. Retrying...
2024-08-07:18:01:49,550 WARNING  [api_models.py:342] API request failed with error message: {"error":"module \u0027tensorrt_llm.runtime\u0027 has no attribute \u0027to_word_list_format\u0027","code":424}. Retrying...
2024-08-07:18:01:50,558 WARNING  [api_models.py:342] API request failed with error message: {"error":"module \u0027tensorrt_llm.runtime\u0027 has no attribute \u0027to_word_list_format\u0027","code":424}. Retrying...
Traceback (most recent call last):
  File "/opt/conda/envs/py310/bin/lm_eval", line 8, in <module>
    sys.exit(cli_evaluate())
  File "/home/ubuntu/lm-evaluation-harness/lm_eval/__main__.py", line 382, in cli_evaluate
    results = evaluator.simple_evaluate(
  File "/home/ubuntu/lm-evaluation-harness/lm_eval/utils.py", line 397, in _wrapper
    return fn(*args, **kwargs)
  File "/home/ubuntu/lm-evaluation-harness/lm_eval/evaluator.py", line 296, in simple_evaluate
    results = evaluate(
  File "/home/ubuntu/lm-evaluation-harness/lm_eval/utils.py", line 397, in _wrapper
    return fn(*args, **kwargs)
  File "/home/ubuntu/lm-evaluation-harness/lm_eval/evaluator.py", line 468, in evaluate
    resps = getattr(lm, reqtype)(cloned_reqs)
  File "/home/ubuntu/lm-evaluation-harness/lm_eval/models/api_models.py", line 562, in generate_until
    outputs = retry(
  File "/opt/conda/envs/py310/lib/python3.10/site-packages/tenacity/__init__.py", line 336, in wrapped_f
    return copy(f, *args, **kw)
  File "/opt/conda/envs/py310/lib/python3.10/site-packages/tenacity/__init__.py", line 475, in __call__
    do = self.iter(retry_state=retry_state)
  File "/opt/conda/envs/py310/lib/python3.10/site-packages/tenacity/__init__.py", line 376, in iter
    result = action(retry_state)
  File "/opt/conda/envs/py310/lib/python3.10/site-packages/tenacity/__init__.py", line 418, in exc_check
    raise retry_exc.reraise()
  File "/opt/conda/envs/py310/lib/python3.10/site-packages/tenacity/__init__.py", line 185, in reraise
    raise self.last_attempt.result()
  File "/opt/conda/envs/py310/lib/python3.10/concurrent/futures/_base.py", line 451, in result
    return self.__get_result()
  File "/opt/conda/envs/py310/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/opt/conda/envs/py310/lib/python3.10/site-packages/tenacity/__init__.py", line 478, in __call__
    result = fn(*args, **kwargs)
  File "/home/ubuntu/lm-evaluation-harness/lm_eval/models/api_models.py", line 345, in model_call
    response.raise_for_status()
  File "/opt/conda/envs/py310/lib/python3.10/site-packages/requests/models.py", line 1024, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 424 Client Error: module 'tensorrt_llm.runtime' has no attribute 'to_word_list_format' for url: http://localhost:8080/v1/chat/completions/model
Requesting API:   0%|

What have you tried to solve it?

The text was updated successfully, but these errors were encountered:

sindhuvahinis · 2024-08-12T16:55:02Z

Thanks for reporting this. Will take a look at it today.

pdtgct · 2024-08-18T19:59:35Z

I can confirm seeing this issue in djl-inference:0.29.0-tensorrtllm0.11.0-cu124.

Steps to reproduce:

Send a POST request with the stop parameter:

{
  "inputs": "<|begin_of_text|><|start_header_id|>user<|end_header_id|>\nYou are rolling a 12-sided dice twice.\n\nQuestion: Can I win more than once?\n<|eot_id|>\n\n<|start_header_id|>assistant<|end_header_id|> Answer:",
  "parameters": {
    "do_sample": false,
    "details": false,
    "temperature": 0.7,
    "top_p": 0.92,
    "max_new_tokens": 220,
    "stop": ["<|eot_id|>"]
  }
}

Note: the model does not stop on "<|eot_id|>" so the stop parameter is needed.

sindhuvahinis · 2024-09-03T22:49:11Z

We fixed it the image and released the patched image @lxning try it now.

@pdtgct Could you try with stop_sequences instead of just stop?

pdtgct · 2024-09-11T20:33:06Z

Thanks, @sindhuvahinis - will try to find some time to confirm.

lxning added the bug Something isn't working label Aug 7, 2024

sindhuvahinis self-assigned this Aug 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

djl-inference:0.29.0-tensorrtllm0.11.0-cu124 regression: has no attribute 'to_word_list_format' #2293

djl-inference:0.29.0-tensorrtllm0.11.0-cu124 regression: has no attribute 'to_word_list_format' #2293

lxning commented Aug 7, 2024

sindhuvahinis commented Aug 12, 2024

pdtgct commented Aug 18, 2024 •

edited

Loading

sindhuvahinis commented Sep 3, 2024

pdtgct commented Sep 11, 2024

djl-inference:0.29.0-tensorrtllm0.11.0-cu124 regression: has no attribute 'to_word_list_format' #2293

djl-inference:0.29.0-tensorrtllm0.11.0-cu124 regression: has no attribute 'to_word_list_format' #2293

Comments

lxning commented Aug 7, 2024

Description

Expected Behavior

Error Message

How to Reproduce?

Steps to reproduce

What have you tried to solve it?

sindhuvahinis commented Aug 12, 2024

pdtgct commented Aug 18, 2024 • edited Loading

sindhuvahinis commented Sep 3, 2024

pdtgct commented Sep 11, 2024

pdtgct commented Aug 18, 2024 •

edited

Loading