diff --git a/openhands/agenthub/codeact_agent/README.md b/openhands/agenthub/codeact_agent/README.md index 357a723b650c..45ccb42ba085 100644 --- a/openhands/agenthub/codeact_agent/README.md +++ b/openhands/agenthub/codeact_agent/README.md @@ -10,3 +10,57 @@ The conceptual idea is illustrated below. At each turn, the agent can: - Execute any valid `Python` code with [an interactive Python interpreter](https://ipython.org/). This is simulated through `bash` command, see plugin system below for more details. ![image](https://github.com/All-Hands-AI/OpenHands/assets/38853559/92b622e3-72ad-4a61-8f41-8c040b6d5fb3) + +## Adding New Tools + +The CodeAct agent uses a function calling interface to define tools that the agent can use. Tools are defined in `function_calling.py` using the `ChatCompletionToolParam` class from `litellm`. Each tool consists of: + +1. A description string that explains what the tool does and how to use it +2. A tool definition using `ChatCompletionToolParam` that specifies: + - The tool's name + - The tool's parameters and their types + - Required vs optional parameters + +Here's an example of how a tool is defined: + +```python +MyTool = ChatCompletionToolParam( + type='function', + function=ChatCompletionToolParamFunctionChunk( + name='my_tool', + description='Description of what the tool does and how to use it', + parameters={ + 'type': 'object', + 'properties': { + 'param1': { + 'type': 'string', + 'description': 'Description of parameter 1', + }, + 'param2': { + 'type': 'integer', + 'description': 'Description of parameter 2', + }, + }, + 'required': ['param1'], # List required parameters here + }, + ), +) +``` + +To add a new tool: + +1. Define your tool in `function_calling.py` following the pattern above +2. Add your tool to the `get_tools()` function in `function_calling.py` +3. Implement the corresponding action handler in the agent to process the tool's invocation + +The agent currently supports several built-in tools: +- `execute_bash`: Execute bash commands +- `execute_ipython_cell`: Run Python code in IPython +- `browser`: Interact with a web browser +- `str_replace_editor`: Edit files using string replacement +- `edit_file`: Edit files using LLM-based editing + +Tools can be enabled/disabled through configuration parameters: +- `codeact_enable_browsing`: Enable browser interaction +- `codeact_enable_jupyter`: Enable IPython code execution +- `codeact_enable_llm_editor`: Enable LLM-based file editing (if disabled, uses string replacement editor instead)