-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] Support for llm-emit-token-metric in AIProjectClient ChatCompletionsClient for Azure OpenAI Token Tracking #39385
Comments
Thank you for your feedback. Tagging and routing to the team member best able to assist. |
Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @achauhan-scc @kingernupur @luigiw @needuv @paulshealy1 @singankit. |
Hi @achauhan-scc @kingernupur @luigiw @needuv @paulshealy1 @singankit, I wanted to check in on this request. It has been two weeks since it was opened, and I haven't seen any updates. Could you provide any timeline or status on this? Thanks for your time! |
I second this, have been looking for a way to track token usage with APIM and the new chat completion clients. |
Tagging @dargilco |
Is the ask specific to APIM in some way? Azure AI Services report token usage metrics on the server-side - https://learn.microsoft.com/en-us/azure/azure-monitor/reference/supported-metrics/microsoft-cognitiveservices-accounts-metrics by default. Is there a case when something else is necessary? |
@O-EAI @o1100 in addition to @lmolkova question above, if you say you got it "to work" with REST API calls (per spec https://learn.microsoft.com/en-us/azure/ai-foundry/model-inference/reference/reference-model-inference-api?tabs=rest) and you are asking how to do the equivalent with the SDK, if you send me the details of your REST call I can look at how to configure the Python ChatCompletionsClient to make similar REST API call. It may be possible to do it right now, as you can always add additional HTTP request headers and JSON elements in the request payload. |
For my reference, adding this public doc: "Overview of generative AI gateway capabilities in Azure API Management" https://learn.microsoft.com/en-us/azure/api-management/genai-gateway-capabilities |
Note: @lmolkova Azure AI Services can emit token metrics per deployed model/resource but in the instance where I have a shared endpoint for multiple customers/departments, I can't distinguish how many tokens each department uses. I can only see aggregate token metrics. @dargilco I am able to do a REST API call to the APIM endpoint directly and communicate with the AI Services Endpoint (extracting token metrics and logging results using APIM policies), however the ChatCompletionsClient only lets you communicate with predetermined Azure AI connections. What I am trying to emulate with ChatCompletionsClient: headers = {
"api-key": api_key,
"Content-Type": "application/json",
"billingID": "12345"
}
payload = {
"messages": [
{"role": "user", "content": "Hi"}
],
"max_tokens": 50
}
response = requests.post(apim_endpoint, headers=headers, json=payload)
print(response) Returns 200 OK, and I can see token metrics: ![]() |
@o1100 You can create an instance of the ChatCompletionsClient from azure-ai-inference, provide the inference URL endpoint, provide the api-key, and add any additional headers you want using the code below: client = ChatCompletionsClient(
endpoint="your-inference-endpoint ",
credentials=AzureKeyCredentials("your-api-key"),
headers={"billingID": "12345"} # A dict of additional HTTP request headers
) You can turn on client console logging, to see the actual request payload by doing the following: # Put this at the top of your Python souce file:
import sys
import logging
logger = logging.getLogger("azure")
logger.setLevel(logging.DEBUG)
logger.addHandler(logging.StreamHandler(stream=sys.stdout))
# And add this as an additional input argument to the ChatComletionsClient constructor:
logging_enable=True If you are using AIProjectClient, you can use the If you want to use the project_client.inference.get_chat_completions_client(headers={"billingID": "12345"}) as the keywords that are entered in the call to Note that you also have the option to explicitly specify the connection name if the default does not work for you: project_client.inference.get_chat_completions_client(connection_name="my-connection-name", headers={"billingID": "12345"}) |
Feature Request: Enable
llm-emit-token-metric
orazure-openai-emit-token-metric
in AIProjectClient's ChatCompletionsClientIs your feature request related to a problem? Please describe.
I am looking to use the
llm-emit-token-metric
orazure-openai-emit-token-metric
headers within Azure API Management (APIM) to capture token usage. My setup uses Azure Foundry alongside the newazure.ai.projects
AIProjectClient and its corresponding ChatCompletionsClient.While I understand how to enable token metric headers when performing regular POST requests to Azure OpenAI endpoints, I have been unable to find functionality to do this within the ChatCompletionsClient.
The goal is to configure an APIM instance for the endpoint
https://<deployment>.openai.azure.com/
and log token metrics to Azure Log Analytics.Describe the solution you'd like
I would like the ChatCompletionsClient (from
azure.ai.projects
) to include functionality for enabling and utilizing thellm-emit-token-metric
orazure-openai-emit-token-metric
headers when interacting with Azure OpenAI services.This would allow token usage tracking to seamlessly integrate with APIM and Log Analytics while leveraging the new client library.
Describe alternatives you've considered
llm-emit-token-metric
orazure-openai-emit-token-metric
headers.azure.ai.projects
.Additional context
AIProjectClient
orChatCompletionsClient
from theazure.ai.projects
library.The text was updated successfully, but these errors were encountered: