numpy.ndarray' object is not callable in gpt2/1/lib/triton_decoder.py", line 160, in convert_triton_request

### System Info

Docker Image: nvcr.io/nvidia/tritonserver:24.12-trtllm-python-py3
CPU: x86_64
GPU: H100
The container also includes the following:
[Ubuntu 24.04] including [Python 3.12]
[NVIDIA CUDA 12.6.3]
[NVIDIA cuBLAS 12.6.4.1]
[cuDNN 9.6.0.74]
[NVIDIA NCCL 2.23.4]
[NVIDIA TensorRT™ 10.7.0.23]
OpenUCX 1.15.0
GDRCopy 2.4.1
NVIDIA HPC-X 2.21
OpenMPI 4.1.7]]
[nvImageCodec 0.2.0.7]
ONNX Runtime 1.20.1
Intel[ OpenVINO ]
DCGM 3.3.6
[TensorRT-LLM] version [release/0.16.0]
[vLLM] version 0.5.5

### Who can help?

After triton server was launched successfully, check its status by running triton status. It show triton server is running and ready.

Then sending following two requests: 

1 triton infer -m gpt2 --prompt hello -i grpc -u localhost -p 8001
2. genai-perf profile   -m gpt2   --service-kind triton   --backend tensorrtllm   --num-prompts 1000   --random-seed 123   --synthetic-input-tokens-mean 1000   --synthetic-input-tokens-stddev 0   --output-tokens-mean 512   --output-tokens-stddev 0   --output-tokens-mean-deterministic   --tokenizer  /root/models/gpt2/tokenizer    --concurrency 16   --measurement-interval 8000   --profile-export-file my_profile_export.json   --url localhost:8001

it always returned error message described in the actual result.

Is there anyone help on this.

Thanks/Gavin

### Information

- [ ] The official example scripts
- [ ] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

send request:

genai-perf profile   -m gpt2   --service-kind triton   --backend tensorrtllm   --num-prompts 1000   --random-seed 123   --synthetic-input-tokens-mean 1000   --synthetic-input-tokens-stddev 0   --output-tokens-mean 512   --output-tokens-stddev 0   --output-tokens-mean-deterministic   --tokenizer  /root/models/gpt2/tokenizer    --concurrency 16   --measurement-interval 8000   --profile-export-file my_profile_export.json   --url localhost:8001



### Expected behavior

should be like this.

                                                LLM Metrics
┏━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃              Statistic ┃         avg ┃        min ┃         max ┃         p99 ┃         p90 ┃         p75 ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│   Request latency (ns) │ 296,990,497 │ 43,312,449 │ 332,788,242 │ 327,475,292 │ 317,392,767 │ 310,343,333 │
│ Output sequence length │         109 │         11 │         158 │         142 │         118 │         113 │
│  Input sequence length │           1 │          1 │           1 │           1 │           1 │           1 │
└────────────────────────┴─────────────┴────────────┴─────────────┴─────────────┴─────────────┴─────────────┘
Output token throughput (per sec): 366.78
Request throughput (per sec): 3.37

### actual behavior

E0212 21:46:42.323909 655 model.py:120] "Traceback (most recent call last):\n  File \"/root/models/gpt2/1/model.py\", line 88, in execute\n    req = self.decoder.convert_triton_request(request)\n          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File \"/root/models/gpt2/1/lib/triton_decoder.py\", line 160, in convert_triton_request\n    request = Request()\n              ^^^^^^^^^\n  File \"<string>\", line 3, in __init__\nTypeError: 'numpy.ndarray' object is not callable\n"
triton - ERROR - Traceback (most recent call last):
  File "/root/models/gpt2/1/model.py", line 88, in execute
    req = self.decoder.convert_triton_request(request)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/models/gpt2/1/lib/triton_decoder.py", line 160, in convert_triton_request
    request = Request()
              ^^^^^^^^^
  File "<string>", line 3, in __init__
TypeError: 'numpy.ndarray' object is not callable

triton - ERROR - Unexpected error:
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/triton_cli/main.py", line 51, in main
    run()
  File "/usr/local/lib/python3.12/dist-packages/triton_cli/main.py", line 45, in run
    args.func(args)
  File "/usr/local/lib/python3.12/dist-packages/triton_cli/parser.py", line 363, in handle_infer
    client.infer(model=args.model, prompt=args.prompt)
  File "/usr/local/lib/python3.12/dist-packages/triton_cli/client/client.py", line 217, in infer
    self.__async_infer(model, inputs)
  File "/usr/local/lib/python3.12/dist-packages/triton_cli/client/client.py", line 221, in __async_infer
    self.__grpc_async_infer(model, inputs)
  File "/usr/local/lib/python3.12/dist-packages/triton_cli/client/client.py", line 273, in __grpc_async_infer
    raise result
tritonclient.utils.InferenceServerException: Traceback (most recent call last):
  File "/root/models/gpt2/1/model.py", line 88, in execute
    req = self.decoder.convert_triton_request(request)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/models/gpt2/1/lib/triton_decoder.py", line 160, in convert_triton_request
    request = Request()
              ^^^^^^^^^
  File "<string>", line 3, in __init__
TypeError: 'numpy.ndarray' object is not callable

### additional notes

I check the triton_decoder.py in tensorrtllm_backend/infight_batcher_llm. It has the same code as gpt.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

numpy.ndarray' object is not callable in gpt2/1/lib/triton_decoder.py", line 160, in convert_triton_request #702

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

numpy.ndarray' object is not callable in gpt2/1/lib/triton_decoder.py", line 160, in convert_triton_request #702

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions