Allow us to use MCP servers to extend OpenHand's functionality #5781

orangejon · 2024-12-24T12:01:21Z

What problem or use case are you trying to solve?

OpenHands' functionality is currently fairly limited, but Anthropic's MCP standard provides a way for LLMs to interact with many additional services and use them as "tools". This could allow for much more complex workflows, e.g. to use Puppeteer or Playwright to test the code in the browser, then if it fails use OpenAI o1 (via MCP) to debug/rewrite it, etc.

Describe the UX of the solution you'd like

I guess the ideal would be to be able to install MCP servers in one click or a prompt. The implementation in Cline is neat:

... but the main thing is to be able to access them. Perhaps a list of the installed servers could be good to verify they have been recognised, like in Claude Desktop:

Do you have thoughts on the technical implementation?

I don't know OpenHands' architecture, but please be sure to add clear documentation with step-by-step instructions so I know any setup that's required to use this functionality.

Describe alternatives you've considered

Using Claude Desktop instead of OpenHands, because I can probably replicate a lot of the same functionality by just combining MCP servers. But the UI probably wouldn't be as good and I'm not sure if it would work as effectively.

Additional context

orangejon · 2024-12-24T12:04:45Z

If it helps anyone, I could offer a small "bounty" payment for implementing this?

UltraInstinct0x · 2024-12-24T17:36:02Z

I must first deal with my GUI & CLI issue, however next thing im planning is this one if no one else is interested.

motin · 2024-12-25T11:38:08Z

Agree that MCP servers in Openhands seems like a necessary table stake in the near future :)
@orangejon Since you are mentioning Cline, have you considered using it as an alternative to Openhands and if so, where if anywere, does it fall short?

UltraInstinct0x · 2024-12-26T14:43:58Z

I figured out GUI & CLI thing. I am working on this right now. @orangejon do you think users should need to add / remove tool by themselves or should OpenHands figure out what kind of tools it might utilize and install them.
I think the latter option is better however we might need to add steps to approve / reject tool installation and usage just like Claude app.
Any thoughts? Also @motin asked a great question, can you elaborate on that please?

orangejon · 2024-12-26T16:08:43Z

I think it could be either, or even both. The way Cline does it with describing a tool by its capability seems ideal, because then I don't have to search online for a suitable tool first. I agree that in this case a confirmation step is probably worthwhile, especially if there are multiple tools that match. I suppose if I have a particular tool in mind then it would be good to be able to just give the name or URL - though I guess that could also be via a prompt. Removing might be easier to just click a button on a list of installed servers though? But any way is fine for me really, so long as there's a reasonably easy and clearly documented way to use MCP servers then I'll figure it out :)

I've not use Cline much yet, but I've got it installed and will be experimenting with it over the next few days. I'll report back!

orangejon · 2024-12-29T12:36:17Z

@motin I've had a chance to use Cline for a few days now, so I can report back my initial experience. So far I used it to create a simple Ruby on Rails web app. Because MacOS really creates headaches with Ruby versions (which Cline tried to find solutions to for over an hour, but only succeeded temporarily), I decided to use a Github codespace (basically a VPS running Ubuntu) and connect Vscode to that, which works really well - effortless setup, fast, reliable, and automatically configures port forwarding so you can see your web app as if it was running on your local machine. This had the nice side effects that it runs faster as the VPS has more resources than my laptop and I don't have to worry about the terminal commands that Cline runs, as the worst case scenario would be wasting a few minutes rebuilding the virtual machine if it really screwed it up (which, so far, it didn't).

The code generated is pretty decent when using Claude Sonnet 3.5 (via OpenRouter) but my attempt to use Gemini was pretty unsuccessful, hitting various errors regardless of which model I selected. Claude can use MCP tools but it doesn't seem to do so unless you directly tell it to in the user prompt; e.g. I added to the "system" prompt that it when encountering an error it should use the search1api MCP client to read the documentation, but it never did. At least it (usually) listened to my instruction to run all unit tests and a Playwright browser-based integration test, so it does usually catch its own errors and fix them before asking for user input. I just swear it would be faster and burn less tokens if it Googled for a solution or documentation rather than just randomly changing the code, sometimes even call functions that don't exist.

Also, although the documentation sounds like you can just prompt Cline with "Add a tool that..." and it will install the correct tool, that's not what it does. Typing that prompt seems to create a new MCP client from scratch which, seeing as it doesn't read the API documentation, is very unlikely to actually work! Instead you have to search for the "configure MCP servers" dialog, which then makes you manually edit the JSON configuration file to insert the MCP client definition. Then it works fine (and displays a nice "status" thing on the dialog with the various functions you can call, like in Claude Desktop) but it's a bit of a faff. I'd rather just paste in the URL of an MCP client definition and it adds it for me.

Still, I have to say, my initial impression of Cline is generally pretty positive. The areas where it falls short currently are:

speed, because it often takes multiple attempts to fix a bug and it seems to be entirely single-threaded; I realise this helps avoid code conflicts but surely there could be background agents (e.g. crawling the latest documentation from the web) and multiple Cline instances that I could task to work on different areas of the codebase, just as I do with human developers. (Though maybe I can just open duplicate Vscode workspaces? I'll try that...)
asks too frequently for human confirmation before running commands; it has never proposed a command I rejected so now I just instinctively hit "accept" without reading it... and as this a VPS anyway, I don't really care
the UIs it creates are ugly. I tried to get it to use the Material Design library and search online for templates but without much success so far. I'll keep trying, but I suspect that this will be one area where it's currently easier for a human to just edit the output until it looks okay, and/or set up a CSS template that Cline can use in future. Or maybe I can find another AI tool that's better at this.
the image upload feature seems to be broken. I'll submit a bug report. I guess if this worked I might be able to upload a screenshot of a site and ask it to copy the styling.

If you've got an questions, let me know. I'm still a fan of the OpenHands approach, and FOSS in general, so I'm happy to help if I can. It's just that Cline is working well for me so I will stick to using it for the time being.

AlexanderDorofeev · 2025-01-09T21:45:48Z

It would be great to have MCP/RAG built-in options for popular knowledge bases like wikis, Confluence, PDFs, and web links.

Key benefits of this feature:

Enhanced knowledge retrieval capabilities
Improved integration with common information sources
Increased efficiency in accessing relevant data

Potential implementation ideas:

Develop connectors for popular wiki platforms and Confluence

Implement PDF parsing and indexing functionality
Create a system for crawling and updating web link content
This enhancement would significantly expand OpenHands' ability to leverage existing knowledge repositories, making it more versatile and powerful for users working with various information sources.

Would love to hear your thoughts on this suggestion!

orangejon · 2025-01-10T11:05:59Z

I think the point of MCP is not that these types of functionality are "built in", the idea is that you can you add MCP clients for whatever you need, then OpenHands will access whatever it needs. (At least in theory; I've been using Cline + Claude Sonnet 3.5 recently, which supports MCP, and it rarely ever uses any MCP clients, no matter how much I prompt it to!)

…

On Thu, 9 Jan 2025 at 23:46, Alexander Dorofeev ***@***.***> wrote: It would be great to have MCP/RAG built-in options for popular knowledge bases like wikis, Confluence, PDFs, and web links. Key benefits of this feature: 1. Enhanced knowledge retrieval capabilities 2. Improved integration with common information sources 3. Increased efficiency in accessing relevant data Potential implementation ideas: Develop connectors for popular wiki platforms and Confluence 1. Implement PDF parsing and indexing functionality 2. Create a system for crawling and updating web link content 3. This enhancement would significantly expand OpenHands' ability to leverage existing knowledge repositories, making it more versatile and powerful for users working with various information sources. Would love to hear your thoughts on this suggestion! — Reply to this email directly, view it on GitHub <#5781 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AWZJGWJPW3HQ3WIIT7EPR332J3U2FAVCNFSM6AAAAABUETS6RWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKOBRGMYDINBYGA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

RPirruccio · 2025-01-12T18:13:40Z

Agreed, integrating MCP is essential. However, how can we design MCP usage to be model-agnostic? It doesn’t seem like a good approach to develop a feature that only works with Claude 3.5/Sonnet.

orangejon · 2025-01-13T07:51:12Z

MCP is (at least theoretically) an open standard that other LLMs can implement. As far as I know it's only Anthropic's models than implemented it so far, though.

Sucran · 2025-01-14T09:50:11Z

I think there is a way to implement a middleware like MCP-Bridge by https://github.com/SecretiveShell/MCP-Bridge, which main idea is to provide an openAI compatible endpoint that can call MCP tools @orangejon. However, whether it is appropriate still requires in-depth evaluation.

UltraInstinct0x · 2025-01-16T07:53:35Z

I've been looking into this and I think we can implement this pretty easily. We already use LiteLLM which lets us do tool/function calling with any model - just like how librechat handles tools globally (they do it with langchain) regardless of which provider or model you're using.
@RPirruccio - good point about model agnostic design. That's exactly why we don't need MCP-Bridge here - LiteLLM already handles the compatibility layer for us.
Before I start implementing this, I'd like to hear from everyone:

How do you prefer to add tools - should users manually configure them, or should OpenHands try to discover and suggest relevant tools?
Should we add an approval step for tool installation like Claude Desktop does?

I'm leaning towards automated discovery with approval prompt since it would make things easier for users while keeping them in control. But let me know what you think would work best for your use cases.
We may also need to configure headless version to be able to configure its own tools but IMO those tools available to it should be limited for preventing any skynet becomes self aware moment.

orangejon · 2025-01-16T08:30:30Z

Automated discovery sounds great if it works well, because then if I'm coding something and realise I need a tool (or the LLM realises?) then I don't have to go off to search the web for a solution. However, if there are multiple MCP tools then it might be preferable to select one manually.. not necessarily for fear of skynet situations but more because some of the MCP tools are pretty flakey! Also there's the case that's been more common for me so far: I'm browsing the web looking for tools that can improve my workflow, and want to add one that I've found. So it's not necessarily something that is essential in that moment or that OpenHands (or Cline) can't operate without, but it's something that seems generally useful (e.g. web search and scraping). Also most tools need me to create an account, add payment details and get an API key, so unless OpenHands will do that automatically, there's not a significant benefit in the discovery step happening automatically. In short, being able to install (and uninstall) tools manually would certainly be useful, and presumably it's easier to implement, so it might make sense to add that first.

…

On Thu, 16 Jan 2025 at 09:53, Goku ***@***.***> wrote: I've been looking into this and I think we can implement this pretty easily. We already use LiteLLM which lets us do tool/function calling with any model - just like how librechat handles tools globally (they do it with langchain) regardless of which provider or model you're using. @RPirruccio <https://github.com/RPirruccio> - good point about model agnostic design. That's exactly why we don't need MCP-Bridge here - LiteLLM already handles the compatibility layer for us. Before I start implementing this, I'd like to hear from everyone: How do you prefer to add tools - should users manually configure them, or should OpenHands try to discover and suggest relevant tools? Should we add an approval step for tool installation like Claude Desktop does? I'm leaning towards automated discovery with approval prompt since it would make things easier for users while keeping them in control. But let me know what you think would work best for your use cases. We may also need to configure headless version to be able to configure its own tools but IMO those tools available to it should be limited for preventing any *skynet becomes self aware* moment. — Reply to this email directly, view it on GitHub <#5781 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AWZJGWKYG25RSVGZ4QHNRR32K5QRLAVCNFSM6AAAAABUETS6RWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKOJUG42TIMZSGU> . You are receiving this because you were mentioned.Message ID: ***@***.***>

UltraInstinct0x · 2025-01-16T21:02:19Z

Ok I'm working on it.

anzax · 2025-02-04T19:50:06Z

Interesting conversation here! I just want to add a few ideas based on my experience crafting coding agents using MCP with Claude Desktop.

There’s a belief that adding more tools makes LLMs smarter, but more often, it just creates confusion and fills the context with noise. In my opinion, additional tools should be part of a message and include extra in-context learning materials. Since I can’t do this with Claude Desktop, I started looking for alternatives.

A few ideas worth exploring:

Augmenting microagents with tools (MCP), so a tool is only exposed when a relevant microagent is triggered.
Using a separate reasoning thread where the LLM first determines if any tools can assist with the user’s request, and if so, dynamically adds them to the chat (like RAG for tools).
Leveraging CLI-based tools instead of complex integrations. Claude is very effective at exploring codebases with rg (ripgrep) and doing API testing with httpie. I initially used [argc](https://github.com/sigoden/argc) to wrap APIs into CLIs and expose them when needed as MCP servers with [llm-functions](https://github.com/sigoden/llm-functions/tree/main/mcp/server). However, I now realize that All-Hands already provides built-in shell access and the ability to create bespoke sandboxes with custom CLIs. Microagents seem like a perfect fit for adding extra in-context learning. After writing this, I’m keen to try it myself—everything needed is already there!

This also makes me think: the shell is an underutilized platform for LLM tools. It’s straightforward to provide RAG, web search, and many other functions via CLIs. If an LLM can call native tools, why wouldn't it use shell-based tools with the same efficiency?

I’d love to hear critical opinions on this approach. What am I missing? Are there hidden downsides?

enyst · 2025-02-04T20:27:00Z

I think MCP is cool and we can benefit from adding it to openhands.

Just to note quickly, @anzax I do agree.

Re: Augmenting microagents with tools (MCP) - we need MCP first IMO, once MCP is integrated, we can already use this I think
Re: Using a separate reasoning thread - underlying support for reasoning llm and workflow is coming
Re: Leveraging CLI-based tools - we had what we call agent skills, implemented in python, ran via Jupyter server in the runtime. We consider some of them deprecated right now. I don't know how new tools would look like, but if you want to try it, please feel free to!

orangejon · 2025-02-05T08:36:50Z

I switched to using Cline (with Claude Sonnet 3.5) mostly because it supports MCP tools, but I have been disappointed how infrequently it uses them. I suspect it's because the LLM was mainly trained on content like StackOverflow, where people offer solutions to the problem the developer is currently facing. These solutions don't often say "Now Google for the latest API documentation and check your API calls are correct" because the solutions offered were correct at the time of writing. I guess having a separate reasoning thread might help, because, if prompted appropriately, it could encourage the LLM to first plan out how to approach a problem methodically, as a good developer would, instead of just randomly making changes that are just as likely to cause new problems as to fix the bug!

…

On Tue, 4 Feb 2025 at 22:27, Engel Nyst ***@***.***> wrote: I think MCP is cool and we can benefit from adding it to openhands. Just to note quickly, @anzax <https://github.com/anzax> I do agree. - Re: Augmenting microagents with tools (MCP) - we need MCP first IMO, once MCP is integrated, we can already use this I think - Re: Using a separate reasoning thread - underlying support for reasoning llm and workflow is coming <#6189> - Re: Leveraging CLI-based tools - we had what we call agent skills <https://github.com/All-Hands-AI/OpenHands/tree/main/openhands/runtime/plugins/agent_skills>, implemented in python, ran via Jupyter server in the runtime. We consider some of them deprecated right now. I don't know how new tools would look like, but if you want to try it, please feel free to! — Reply to this email directly, view it on GitHub <#5781 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AWZJGWLYPHOVFMGZRAANA6T2OEPC3AVCNFSM6AAAAABUETS6RWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMZUHE4DSMBUHA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

orangejon added the enhancement New feature or request label Dec 24, 2024

mamoodi mentioned this issue Jan 9, 2025

Will this will support MCP and can be used as MCP client? #6174

Closed

mamoodi mentioned this issue Jan 11, 2025

An MCP Server would be cool #5760

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow us to use MCP servers to extend OpenHand's functionality #5781

Allow us to use MCP servers to extend OpenHand's functionality #5781

orangejon commented Dec 24, 2024 •

edited

Loading

orangejon commented Dec 24, 2024

UltraInstinct0x commented Dec 24, 2024

motin commented Dec 25, 2024

UltraInstinct0x commented Dec 26, 2024

orangejon commented Dec 26, 2024 •

edited

Loading

orangejon commented Dec 29, 2024

AlexanderDorofeev commented Jan 9, 2025

orangejon commented Jan 10, 2025 via email

RPirruccio commented Jan 12, 2025

orangejon commented Jan 13, 2025

Sucran commented Jan 14, 2025

UltraInstinct0x commented Jan 16, 2025

orangejon commented Jan 16, 2025 via email

UltraInstinct0x commented Jan 16, 2025

anzax commented Feb 4, 2025 •

edited

Loading

enyst commented Feb 4, 2025

orangejon commented Feb 5, 2025 via email

Allow us to use MCP servers to extend OpenHand's functionality #5781

Allow us to use MCP servers to extend OpenHand's functionality #5781

Comments

orangejon commented Dec 24, 2024 • edited Loading

orangejon commented Dec 24, 2024

UltraInstinct0x commented Dec 24, 2024

motin commented Dec 25, 2024

UltraInstinct0x commented Dec 26, 2024

orangejon commented Dec 26, 2024 • edited Loading

orangejon commented Dec 29, 2024

AlexanderDorofeev commented Jan 9, 2025

orangejon commented Jan 10, 2025 via email

RPirruccio commented Jan 12, 2025

orangejon commented Jan 13, 2025

Sucran commented Jan 14, 2025

UltraInstinct0x commented Jan 16, 2025

orangejon commented Jan 16, 2025 via email

UltraInstinct0x commented Jan 16, 2025

anzax commented Feb 4, 2025 • edited Loading

enyst commented Feb 4, 2025

orangejon commented Feb 5, 2025 via email

orangejon commented Dec 24, 2024 •

edited

Loading

orangejon commented Dec 26, 2024 •

edited

Loading

anzax commented Feb 4, 2025 •

edited

Loading