Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Support for llm-emit-token-metric in AIProjectClient ChatCompletionsClient for Azure OpenAI Token Tracking #39385

Open
o1100 opened this issue Jan 24, 2025 · 11 comments
Assignees
Labels
AI Projects Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. feature-request This issue requires a new behavior in the product in order be resolved. needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team Service Attention Workflow: This issue is responsible by Azure service team.

Comments

@o1100
Copy link

o1100 commented Jan 24, 2025

Feature Request: Enable llm-emit-token-metric or azure-openai-emit-token-metric in AIProjectClient's ChatCompletionsClient


Is your feature request related to a problem? Please describe.

I am looking to use the llm-emit-token-metric or azure-openai-emit-token-metric headers within Azure API Management (APIM) to capture token usage. My setup uses Azure Foundry alongside the new azure.ai.projects AIProjectClient and its corresponding ChatCompletionsClient.

While I understand how to enable token metric headers when performing regular POST requests to Azure OpenAI endpoints, I have been unable to find functionality to do this within the ChatCompletionsClient.

The goal is to configure an APIM instance for the endpoint https://<deployment>.openai.azure.com/ and log token metrics to Azure Log Analytics.


Describe the solution you'd like

I would like the ChatCompletionsClient (from azure.ai.projects) to include functionality for enabling and utilizing the llm-emit-token-metric or azure-openai-emit-token-metric headers when interacting with Azure OpenAI services.

This would allow token usage tracking to seamlessly integrate with APIM and Log Analytics while leveraging the new client library.


Describe alternatives you've considered

  • I have been able to use regular POST requests to Azure OpenAI endpoints and successfully capture token metrics with the llm-emit-token-metric or azure-openai-emit-token-metric headers.
  • However, I cannot find similar functionality within the ChatCompletionsClient of azure.ai.projects.

Additional context

  • I have reviewed the Journey of the Geek blog post for guidance on this matter. While their solution effectively captures token metrics, it does not utilize the new AIProjectClient or ChatCompletionsClient from the azure.ai.projects library.
@o1100 o1100 changed the title Support for llm-emit-token-metric in AIProjectClient ChatCompletionsClient for Azure OpenAI Token Tracking [Feature Request] Support for llm-emit-token-metric in AIProjectClient ChatCompletionsClient for Azure OpenAI Token Tracking Jan 24, 2025
@github-actions github-actions bot added Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team OpenAI question The issue doesn't require a change to the product in order to be resolved. Most issues start as that labels Jan 24, 2025
Copy link

Thank you for your feedback. Tagging and routing to the team member best able to assist.

@kristapratico kristapratico added feature-request This issue requires a new behavior in the product in order be resolved. Service Attention Workflow: This issue is responsible by Azure service team. AI and removed question The issue doesn't require a change to the product in order to be resolved. Most issues start as that OpenAI labels Jan 24, 2025
@kristapratico kristapratico removed their assignment Jan 24, 2025
Copy link

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @achauhan-scc @kingernupur @luigiw @needuv @paulshealy1 @singankit.

@o1100
Copy link
Author

o1100 commented Feb 4, 2025

Hi @achauhan-scc @kingernupur @luigiw @needuv @paulshealy1 @singankit,

I wanted to check in on this request. It has been two weeks since it was opened, and I haven't seen any updates. Could you provide any timeline or status on this?

Thanks for your time!

@O-EAI
Copy link

O-EAI commented Feb 4, 2025

I second this, have been looking for a way to track token usage with APIM and the new chat completion clients.

@kingernupur
Copy link
Member

Tagging @dargilco

@dargilco dargilco self-assigned this Feb 4, 2025
@dargilco dargilco added AI Projects and removed AI labels Feb 4, 2025
@dargilco
Copy link
Member

dargilco commented Feb 4, 2025

@O-EAI @o1100 the issue just now got assigned to the right team. I'll look at it today and respond here. If/when you open another GitHub issue, please mention the package name (azure-ai-projects in this case) for fast triage.

@lmolkova
Copy link
Member

lmolkova commented Feb 5, 2025

Is the ask specific to APIM in some way? Azure AI Services report token usage metrics on the server-side - https://learn.microsoft.com/en-us/azure/azure-monitor/reference/supported-metrics/microsoft-cognitiveservices-accounts-metrics by default. Is there a case when something else is necessary?

@dargilco
Copy link
Member

dargilco commented Feb 5, 2025

@O-EAI @o1100 in addition to @lmolkova question above, if you say you got it "to work" with REST API calls (per spec https://learn.microsoft.com/en-us/azure/ai-foundry/model-inference/reference/reference-model-inference-api?tabs=rest) and you are asking how to do the equivalent with the SDK, if you send me the details of your REST call I can look at how to configure the Python ChatCompletionsClient to make similar REST API call. It may be possible to do it right now, as you can always add additional HTTP request headers and JSON elements in the request payload.

@dargilco
Copy link
Member

dargilco commented Feb 5, 2025

For my reference, adding this public doc: "Overview of generative AI gateway capabilities in Azure API Management" https://learn.microsoft.com/en-us/azure/api-management/genai-gateway-capabilities

@o1100
Copy link
Author

o1100 commented Feb 6, 2025

Note:
I think my original post could have been clearer. I want to use the ChatCompletionsClient but forward requests through an Azure API Management instance so that I can collect azure-openai-emit-token-metric. I can easily do this with direct POST requests, but project_client.inference.get_chat_completions_client() only lets me use predefined connections.

@lmolkova Azure AI Services can emit token metrics per deployed model/resource but in the instance where I have a shared endpoint for multiple customers/departments, I can't distinguish how many tokens each department uses. I can only see aggregate token metrics.

@dargilco I am able to do a REST API call to the APIM endpoint directly and communicate with the AI Services Endpoint (extracting token metrics and logging results using APIM policies), however the ChatCompletionsClient only lets you communicate with predetermined Azure AI connections.

What I am trying to emulate with ChatCompletionsClient:

headers = {
    "api-key": api_key,
    "Content-Type": "application/json",
    "billingID": "12345"
}

payload = {
    "messages": [
        {"role": "user", "content": "Hi"}
    ],
    "max_tokens": 50
}

response = requests.post(apim_endpoint, headers=headers, json=payload)
print(response)

Returns 200 OK, and I can see token metrics:

Image

@dargilco
Copy link
Member

dargilco commented Feb 6, 2025

@o1100 You can create an instance of the ChatCompletionsClient from azure-ai-inference, provide the inference URL endpoint, provide the api-key, and add any additional headers you want using the code below:

client = ChatCompletionsClient(
    endpoint="your-inference-endpoint ", 
    credentials=AzureKeyCredentials("your-api-key"),
    headers={"billingID": "12345"}  # A dict of additional HTTP request headers
)

You can turn on client console logging, to see the actual request payload by doing the following:

# Put this at the top of your Python souce file:
import sys
import logging
logger = logging.getLogger("azure")
logger.setLevel(logging.DEBUG)
logger.addHandler(logging.StreamHandler(stream=sys.stdout))

# And add this as an additional input argument to the ChatComletionsClient constructor:
logging_enable=True

If you are using AIProjectClient, you can use the .connection operations to get the properties of the relevant connection, including the endpoint to be used for inference, and create an instance of the ChatCompletionsClient yourself.

If you want to use the .inference.get_chat_completions_client() to get your authenticated ChatCompletionsClient for the default AIServices connection, you can call:

project_client.inference.get_chat_completions_client(headers={"billingID": "12345"})

as the keywords that are entered in the call to get_chat_completions_client method get passed directly to the constructor of the ChatCompletionsClient.

Note that you also have the option to explicitly specify the connection name if the default does not work for you:

project_client.inference.get_chat_completions_client(connection_name="my-connection-name", headers={"billingID": "12345"})

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AI Projects Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. feature-request This issue requires a new behavior in the product in order be resolved. needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team Service Attention Workflow: This issue is responsible by Azure service team.
Projects
None yet
Development

No branches or pull requests

6 participants