Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

awscurl: Missing token metrics when -t option specified #2340

Open
CoolFish88 opened this issue Aug 25, 2024 · 7 comments
Open

awscurl: Missing token metrics when -t option specified #2340

CoolFish88 opened this issue Aug 25, 2024 · 7 comments
Labels
bug Something isn't working

Comments

@CoolFish88
Copy link

Description

When requesting token metrics from an endpoint running a LMI container using a vLLM engine, non-zero values are returned for tokenThroughput, totalTokens, and tokenPerRequest (as expected)

When requesting token metrics from an endpoint running a Triton Inference Server, zero values are returned for tokenThroughput, totalTokens, and tokenPerRequest (unexpected). The Triton endpoint was tested successfully to verify that it responds to individual as well as concurrent requests (produces the expected output given the input requests).

One difference between the two setups consists in the schema of the input requests and of the output response. Specifically, the Triton endpoint operates with a different input schema and produces an output structured differently from the LMI endpoint. Do you suspect this might be the reason of the token metrics not being computed?

Expected Behavior

Return token metrics when -t option is specified

Error Message

Zero valued token metrics

How to Reproduce?

TOKENIZER=<path_to_tokenizer> ./awscurl -c 1 -N 10 -X POST -n sagemaker <triton_endpoint> --dataset <path_to_dataset> -H 'Content-Type: application/json' -P --connect-timeout 60

Triton input schema:
{"inputs": [
{"name": str,
"shape": [int],
"datatype": str,
"data: [str]}
]
}
Triton output schema:
{
"model_name": str,
"model_version: str,
"outputs":[
{ "name": str,
"shape": [int],
"datatype": str,
"data: [str]
}
]
}

@CoolFish88 CoolFish88 added the bug Something isn't working label Aug 25, 2024
@frankfliu
Copy link
Contributor

You can use "-j" parameter to define jsonquery for tritonserver output, see this test code: https://github.com/deepjavalibrary/djl-serving/blob/master/awscurl/src/test/java/ai/djl/awscurl/AwsCurlTest.java#L455-L456

@CoolFish88
Copy link
Author

Hello Frank,

Thank you for the valuable suggestion.
The dictionary storing the generated text (within the output array) looks like this:

{'name': 'generated_text',
'datatype': 'BYTES',
'shape': [1],
'data': ['{"response": "moon"}']},

Should a JSON query like "$.outputs[?(@.name=='generated_text')].data[0]": "$.response" work?

Notice that data is a JSON string and I was wondering if the execution code can process it accordingly by extracting the data contained in the response field (so the tokens associated with the key are not counted)

All the best

@CoolFish88
Copy link
Author

I noticed that tokenThroughput = (totalTokens * 1000000000d / totalTime * clients). Could you please tell me the meaning behind the 1000000000d constant and what does tokenThroughput end up measuring?

@frankfliu
Copy link
Contributor

The totalTime is in nano seconds, need to convert to seconds.

@frankfliu
Copy link
Contributor

Hello Frank,

Thank you for the valuable suggestion. The dictionary storing the generated text (within the output array) looks like this:

{'name': 'generated_text', 'datatype': 'BYTES', 'shape': [1], 'data': ['{"response": "moon"}']},

Should a JSON query like "$.outputs[?(@.name=='generated_text')].data[0]": "$.response" work?

Notice that data is a JSON string and I was wondering if the execution code can process it accordingly by extracting the data contained in the response field (so the tokens associated with the key are not counted)

All the best

I don't think we can parse the nested json string.

@CoolFish88
Copy link
Author

Thanks for the reply.

I also opened another awscurl bug ticket related to the tokenizer behavior. I would be very grateful if you could have a look.

All the best!

@CoolFish88
Copy link
Author

Hello Frank,

Token metrics are no longer computed when specifying a json query.
Has something changed with the latest release?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants