Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request]: Add observability to model input and output for app testing / evaluation #2119

Open
dooriya opened this issue Oct 16, 2024 · 2 comments
Labels
enhancement New feature or request JS & dotnet & Python Change or fix must apply to all three programming languages P1

Comments

@dooriya
Copy link
Member

dooriya commented Oct 16, 2024

Scenario

It is important to add observability to an AI bot built with teams-ai SDK since the AI may have non-deterministic behaviors. Currently it is quite hard to evaluate against an AI bot since developers cannot get the input and output from the AI model. Ideally the AI components in this library should be able to emit such information as structure log and traces (following Open Telemetry specification).

For example, we have a very simple bot app that responds to user questions like an AI assistant:
https://github.com/OfficeDev/teams-toolkit/blob/dev/templates/python/custom-copilot-basic/src/bot.py.tpl#L48-L55

As a developer I want to capture the model's input and output pair (with other metadata, such as prompt, token count, latency, etc) in a structure way. This data can then be used for evaluation to understand how well the app performs.

Solution

As a developer I want to capture the model's input and output pair (with other metadata, such as prompt, token count, latency, etc) in a structure way. This data can then be used for evaluation to understand how well the app performs.

Additional Context

No response

@dooriya dooriya added the enhancement New feature or request label Oct 16, 2024
@singhk97 singhk97 added JS & dotnet & Python Change or fix must apply to all three programming languages P1 labels Oct 29, 2024
@singhk97
Copy link
Collaborator

Thanks for the feature request, we'll add this to our backlog.

@Benjiiim
Copy link
Contributor

I would like to support this request.
From my perspective, app testing / evaluation are not the only use cases that would benefit from a better observability story.
Gathering this data in production can also be valuable.
One of the projects I'm working on involves a multi-tenant custom engine agent deployed to the Microsoft 365 Store by a B2B SaaS vendor (ISV).
They are interested in building a pricing model based on the actual usage of their customers - tenant per tenant and potentially even user per user.
Having a structured and accurate way to measure token counts through out-of-the-box telemetry would be extremely helpful.

This can be related to #2062

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request JS & dotnet & Python Change or fix must apply to all three programming languages P1
Projects
None yet
Development

No branches or pull requests

3 participants