Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow us to use MCP servers to extend OpenHand's functionality #5781

Open
orangejon opened this issue Dec 24, 2024 · 17 comments
Open

Allow us to use MCP servers to extend OpenHand's functionality #5781

orangejon opened this issue Dec 24, 2024 · 17 comments
Labels
enhancement New feature or request

Comments

@orangejon
Copy link

orangejon commented Dec 24, 2024

What problem or use case are you trying to solve?

OpenHands' functionality is currently fairly limited, but Anthropic's MCP standard provides a way for LLMs to interact with many additional services and use them as "tools". This could allow for much more complex workflows, e.g. to use Puppeteer or Playwright to test the code in the browser, then if it fails use OpenAI o1 (via MCP) to debug/rewrite it, etc.

Describe the UX of the solution you'd like

I guess the ideal would be to be able to install MCP servers in one click or a prompt. The implementation in Cline is neat:

Screenshot 2024-12-24 at 17 56 56

... but the main thing is to be able to access them. Perhaps a list of the installed servers could be good to verify they have been recognised, like in Claude Desktop:

Screenshot 2024-12-24 at 13 53 01

Do you have thoughts on the technical implementation?

I don't know OpenHands' architecture, but please be sure to add clear documentation with step-by-step instructions so I know any setup that's required to use this functionality.

Describe alternatives you've considered

Using Claude Desktop instead of OpenHands, because I can probably replicate a lot of the same functionality by just combining MCP servers. But the UI probably wouldn't be as good and I'm not sure if it would work as effectively.

Additional context

@orangejon orangejon added the enhancement New feature or request label Dec 24, 2024
@orangejon
Copy link
Author

If it helps anyone, I could offer a small "bounty" payment for implementing this?

@UltraInstinct0x
Copy link

I must first deal with my GUI & CLI issue, however next thing im planning is this one if no one else is interested.

@motin
Copy link
Contributor

motin commented Dec 25, 2024

Agree that MCP servers in Openhands seems like a necessary table stake in the near future :)
@orangejon Since you are mentioning Cline, have you considered using it as an alternative to Openhands and if so, where if anywere, does it fall short?

@UltraInstinct0x
Copy link

I figured out GUI & CLI thing. I am working on this right now. @orangejon do you think users should need to add / remove tool by themselves or should OpenHands figure out what kind of tools it might utilize and install them.
I think the latter option is better however we might need to add steps to approve / reject tool installation and usage just like Claude app.
Any thoughts? Also @motin asked a great question, can you elaborate on that please?

@orangejon
Copy link
Author

orangejon commented Dec 26, 2024

I think it could be either, or even both. The way Cline does it with describing a tool by its capability seems ideal, because then I don't have to search online for a suitable tool first. I agree that in this case a confirmation step is probably worthwhile, especially if there are multiple tools that match. I suppose if I have a particular tool in mind then it would be good to be able to just give the name or URL - though I guess that could also be via a prompt. Removing might be easier to just click a button on a list of installed servers though? But any way is fine for me really, so long as there's a reasonably easy and clearly documented way to use MCP servers then I'll figure it out :)

I've not use Cline much yet, but I've got it installed and will be experimenting with it over the next few days. I'll report back!

@orangejon
Copy link
Author

@motin I've had a chance to use Cline for a few days now, so I can report back my initial experience. So far I used it to create a simple Ruby on Rails web app. Because MacOS really creates headaches with Ruby versions (which Cline tried to find solutions to for over an hour, but only succeeded temporarily), I decided to use a Github codespace (basically a VPS running Ubuntu) and connect Vscode to that, which works really well - effortless setup, fast, reliable, and automatically configures port forwarding so you can see your web app as if it was running on your local machine. This had the nice side effects that it runs faster as the VPS has more resources than my laptop and I don't have to worry about the terminal commands that Cline runs, as the worst case scenario would be wasting a few minutes rebuilding the virtual machine if it really screwed it up (which, so far, it didn't).

The code generated is pretty decent when using Claude Sonnet 3.5 (via OpenRouter) but my attempt to use Gemini was pretty unsuccessful, hitting various errors regardless of which model I selected. Claude can use MCP tools but it doesn't seem to do so unless you directly tell it to in the user prompt; e.g. I added to the "system" prompt that it when encountering an error it should use the search1api MCP client to read the documentation, but it never did. At least it (usually) listened to my instruction to run all unit tests and a Playwright browser-based integration test, so it does usually catch its own errors and fix them before asking for user input. I just swear it would be faster and burn less tokens if it Googled for a solution or documentation rather than just randomly changing the code, sometimes even call functions that don't exist.

Also, although the documentation sounds like you can just prompt Cline with "Add a tool that..." and it will install the correct tool, that's not what it does. Typing that prompt seems to create a new MCP client from scratch which, seeing as it doesn't read the API documentation, is very unlikely to actually work! Instead you have to search for the "configure MCP servers" dialog, which then makes you manually edit the JSON configuration file to insert the MCP client definition. Then it works fine (and displays a nice "status" thing on the dialog with the various functions you can call, like in Claude Desktop) but it's a bit of a faff. I'd rather just paste in the URL of an MCP client definition and it adds it for me.

Still, I have to say, my initial impression of Cline is generally pretty positive. The areas where it falls short currently are:

  • speed, because it often takes multiple attempts to fix a bug and it seems to be entirely single-threaded; I realise this helps avoid code conflicts but surely there could be background agents (e.g. crawling the latest documentation from the web) and multiple Cline instances that I could task to work on different areas of the codebase, just as I do with human developers. (Though maybe I can just open duplicate Vscode workspaces? I'll try that...)
  • asks too frequently for human confirmation before running commands; it has never proposed a command I rejected so now I just instinctively hit "accept" without reading it... and as this a VPS anyway, I don't really care
  • the UIs it creates are ugly. I tried to get it to use the Material Design library and search online for templates but without much success so far. I'll keep trying, but I suspect that this will be one area where it's currently easier for a human to just edit the output until it looks okay, and/or set up a CSS template that Cline can use in future. Or maybe I can find another AI tool that's better at this.
  • the image upload feature seems to be broken. I'll submit a bug report. I guess if this worked I might be able to upload a screenshot of a site and ask it to copy the styling.

If you've got an questions, let me know. I'm still a fan of the OpenHands approach, and FOSS in general, so I'm happy to help if I can. It's just that Cline is working well for me so I will stick to using it for the time being.

@AlexanderDorofeev
Copy link

It would be great to have MCP/RAG built-in options for popular knowledge bases like wikis, Confluence, PDFs, and web links.

Key benefits of this feature:

  1. Enhanced knowledge retrieval capabilities
  2. Improved integration with common information sources
  3. Increased efficiency in accessing relevant data

Potential implementation ideas:

Develop connectors for popular wiki platforms and Confluence

  1. Implement PDF parsing and indexing functionality
  2. Create a system for crawling and updating web link content
  3. This enhancement would significantly expand OpenHands' ability to leverage existing knowledge repositories, making it more versatile and powerful for users working with various information sources.

Would love to hear your thoughts on this suggestion!

@orangejon
Copy link
Author

orangejon commented Jan 10, 2025 via email

@RPirruccio
Copy link

Agreed, integrating MCP is essential. However, how can we design MCP usage to be model-agnostic? It doesn’t seem like a good approach to develop a feature that only works with Claude 3.5/Sonnet.

@orangejon
Copy link
Author

MCP is (at least theoretically) an open standard that other LLMs can implement. As far as I know it's only Anthropic's models than implemented it so far, though.

@Sucran
Copy link

Sucran commented Jan 14, 2025

I think there is a way to implement a middleware like MCP-Bridge by https://github.com/SecretiveShell/MCP-Bridge, which main idea is to provide an openAI compatible endpoint that can call MCP tools @orangejon. However, whether it is appropriate still requires in-depth evaluation.

@UltraInstinct0x
Copy link

I've been looking into this and I think we can implement this pretty easily. We already use LiteLLM which lets us do tool/function calling with any model - just like how librechat handles tools globally (they do it with langchain) regardless of which provider or model you're using.
@RPirruccio - good point about model agnostic design. That's exactly why we don't need MCP-Bridge here - LiteLLM already handles the compatibility layer for us.
Before I start implementing this, I'd like to hear from everyone:

How do you prefer to add tools - should users manually configure them, or should OpenHands try to discover and suggest relevant tools?
Should we add an approval step for tool installation like Claude Desktop does?

I'm leaning towards automated discovery with approval prompt since it would make things easier for users while keeping them in control. But let me know what you think would work best for your use cases.
We may also need to configure headless version to be able to configure its own tools but IMO those tools available to it should be limited for preventing any skynet becomes self aware moment.

@orangejon
Copy link
Author

orangejon commented Jan 16, 2025 via email

@UltraInstinct0x
Copy link

Ok I'm working on it.

@anzax
Copy link

anzax commented Feb 4, 2025

Interesting conversation here! I just want to add a few ideas based on my experience crafting coding agents using MCP with Claude Desktop.

There’s a belief that adding more tools makes LLMs smarter, but more often, it just creates confusion and fills the context with noise. In my opinion, additional tools should be part of a message and include extra in-context learning materials. Since I can’t do this with Claude Desktop, I started looking for alternatives.

A few ideas worth exploring:

  • Augmenting microagents with tools (MCP), so a tool is only exposed when a relevant microagent is triggered.
  • Using a separate reasoning thread where the LLM first determines if any tools can assist with the user’s request, and if so, dynamically adds them to the chat (like RAG for tools).
  • Leveraging CLI-based tools instead of complex integrations. Claude is very effective at exploring codebases with rg (ripgrep) and doing API testing with httpie. I initially used [argc](https://github.com/sigoden/argc) to wrap APIs into CLIs and expose them when needed as MCP servers with [llm-functions](https://github.com/sigoden/llm-functions/tree/main/mcp/server). However, I now realize that All-Hands already provides built-in shell access and the ability to create bespoke sandboxes with custom CLIs. Microagents seem like a perfect fit for adding extra in-context learning. After writing this, I’m keen to try it myself—everything needed is already there!

This also makes me think: the shell is an underutilized platform for LLM tools. It’s straightforward to provide RAG, web search, and many other functions via CLIs. If an LLM can call native tools, why wouldn't it use shell-based tools with the same efficiency?

I’d love to hear critical opinions on this approach. What am I missing? Are there hidden downsides?

@enyst
Copy link
Collaborator

enyst commented Feb 4, 2025

I think MCP is cool and we can benefit from adding it to openhands.

Just to note quickly, @anzax I do agree.

  • Re: Augmenting microagents with tools (MCP) - we need MCP first IMO, once MCP is integrated, we can already use this I think
  • Re: Using a separate reasoning thread - underlying support for reasoning llm and workflow is coming
  • Re: Leveraging CLI-based tools - we had what we call agent skills, implemented in python, ran via Jupyter server in the runtime. We consider some of them deprecated right now. I don't know how new tools would look like, but if you want to try it, please feel free to!

@orangejon
Copy link
Author

orangejon commented Feb 5, 2025 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

8 participants