This sample introduces an AI-powered Computer Use Agent (CUA) that integrates with Microsoft Teams to automate tasks such as booking flights, searching for products, and filling out forms. Powered by OpenAI, the agent interacts with computer interfaces—clicking, typing, scrolling, and more—while providing real-time updates through Microsoft Teams. It can handle a variety of computer-based tasks, including web browsing and terminal operations and also handle user inputs when needed.
It leverages OpenAI's Computer Use capabilities.
Check out the LinkedIn post for a video of the sample in action.
- 🖥️ Computer Control: Uses AI to understand and execute computer tasks including terminal operations and web browsing
- 📸 Real-time Visual Feedback: Shows screenshots of agent actions via adaptive cards in Teams
- ✨ Responses API: Uses the OpenAI Responses API to track the state of the agent
- ⏸️ Pausable: Allows users to pause and resume agent operations at any time
- 🔒 Safety First: Uses adaptive cards for user approval of actions if the model wants it
- 🌐 Browser Mode: Can use Playwright browser for web interactions
- 🐳 Dockerized VNC: Includes a Dockerfile that sets up a sandboxed environment with VNC enabled
In this example, the agent is using a VNC connection to control a computer. The screenshots and progress and sent as adaptive cards and displayed in Teams.
- Teams Toolkit CLI
- uv
- Open AI or Azure Open AI keys
- Make sure the model you use has computer-use capabilities
- Docker
- Run
uv sync
in this folder. - Activate the virtual environment (run
source .venv/bin/activate
in the root folder or.venv\Scripts\activate
in the root folder if on Windows) - Copy the samples.env file and rename it to .env
- Update the .env file with your own values:
- Set either Azure Open AI or Open AI credentials
- For Azure Open AI, make sure your model has computer-use capabilities
- Set either Azure Open AI or Open AI credentials
- Set up Local computer or Local browser
- Use Teams-Toolkit to run the app locally - Check out these on how to run the sample using teams-toolkit.
This is the default mode for the agent. It sets up a sandboxed dockerized container with VNC enabled. Then the local agent will connect to this container and control it.
There is a Dockerfile in the root of this repo that sets up a sandboxed environment with VNC enabled. You can use this to build a container and deploy it to Azure App Service or run it locally.
- Build the Docker image:
docker build -t cua_mode_image .
- Some commands for the container:
# Run the container
docker run -d --name cua_mode_container -p 5900:5900 -p 6080:6080 cua_mode_image
# Stop the container
docker stop cua_mode_container
# Restart the container
docker restart cua_mode_container
# Remove the container
docker rm cua_mode_container
# Remove the image
docker rmi cua_mode_image
- By default, we have noVNC enabled for the container. You can access the VNC server at
http://localhost:6080/vnc.html
. Use password "secret" to login and view the desktop of your container!
This mode will use Playwright to open a browser and interact with it. This is not the default mode and needs to be enabled by setting USE_BROWSER=true
in the .env file.
- Run
playwright install
to install the browsers on your machine - Set
USE_BROWSER=true
in the .env file
- Open a 1:1 chat with the agent or include it in a group chat.
- Send the agent a query, e.g. "What is the weather in Tokyo?" or "Create a new directory called 'test'"
- The agent will perform the requested actions, showing screenshots and asking for approval when needed
- You can pause the agent at any time during its operation
- The agent will display results in adaptive cards
- Make sure devtunnel is installed.
- Run
devtunnel create <tunnel-name>
to create a new tunnel. - Run
devtunnel port create <tunnel-name> -p <port-number>
to create a new port for the tunnel. - Run
devtunnel access create <tunnel-name> -p <port-number> --anonymous
to set up anonymous access to the tunnel.