Skip to content

feat: support images understaning in debugging assistant for local model #440

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: development
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/debugging_assistant.md
Original file line number Diff line number Diff line change
@@ -22,7 +22,7 @@ The ROS 2 Debugging Assistant is an interactive tool that helps developers inspe

```sh
source setup_shell.sh
streamlit run src/rai/rai/tools/debugging_assistant.py
streamlit run src/rai_core/rai/tools/debugging_assistant.py
```

## Usage Examples
51 changes: 50 additions & 1 deletion docs/vendors.md
Original file line number Diff line number Diff line change
@@ -2,14 +2,63 @@

## Ollama

For installation see: https://ollama.com/
For installation see: https://ollama.com/. Then start
[ollama server](https://github.com/ollama/ollama?tab=readme-ov-file#start-ollama) with
`ollama serve` command.

```python
from langchain_community.chat_models import ChatOllama

llm = ChatOllama(model='llava')
```

### Configure ollama wth OpenAI compatible API

Ollama supports OpenAI compatible APIs (see [details](https://ollama.com/blog/openai-compatibility)).

> [!TIP]
> Such a setup might be more convenient if you frequently switch between OpenAI API and
> local models.

To configure ollama through OpenAI API in `rai`:

1. Add `base_url` to [config.toml](../config.toml)

```toml
[openai]
simple_model = "llama3.2"
complex_model = "llama3.2"
...
base_url = "http://localhost:11434/v1"
```

### Example of setting up vision models with tool calling

In this example `llama3.2-vision` will be used.

1. Create a custom ollama `Modelfile` and load the model

> [!NOTE]
> Such setup is not officially supported by Ollama and it's not guaranteed to be
> working in all cases.

```shell
ollama pull llama3.2
echo FROM llama3.2-vision > Modelfile
echo 'TEMPLATE """'"$(ollama show --template llama3.2)"'"""' >> Modelfile
ollama create llama3.2-vision-tools
```

3. Configure the model through an OpenAI compatible API in [config.toml](../config.toml)

```toml
[openai]
simple_model = "llama3.2-vision-tools"
complex_model = "llama3.2-vision-tools"
...
base_url = "http://localhost:11434/v1"
```

## OpenAI

```bash
17 changes: 16 additions & 1 deletion src/rai_core/rai/tools/debugging_assistant.py
Original file line number Diff line number Diff line change
@@ -12,11 +12,13 @@
# See the License for the specific language governing permissions and
# limitations under the License.

import rclpy
import streamlit as st
from langchain_core.messages import AIMessage, HumanMessage, ToolMessage

from rai.agents.conversational_agent import create_conversational_agent
from rai.agents.integrations.streamlit import get_streamlit_cb, streamlit_invoke
from rai.communication.ros2.connectors import ROS2ARIConnector
from rai.tools.ros.cli import (
ros2_action,
ros2_interface,
@@ -25,15 +27,28 @@
ros2_service,
ros2_topic,
)
from rai.tools.ros2.topics import GetROS2ImageTool
from rai.utils.model_initialization import get_llm_model


@st.cache_resource
def initialize_graph():
rclpy.init()
llm = get_llm_model(model_type="complex_model", streaming=True)

connector = ROS2ARIConnector()

agent = create_conversational_agent(
llm,
[ros2_topic, ros2_interface, ros2_node, ros2_service, ros2_action, ros2_param],
[
ros2_topic,
ros2_interface,
ros2_node,
ros2_service,
ros2_action,
ros2_param,
GetROS2ImageTool(connector=connector),
],
system_prompt="""You are a ROS 2 expert helping a user with their ROS 2 questions. You have access to various tools that allow you to query the ROS 2 system.
Be proactive and use the tools to answer questions. Retrieve as much information from the ROS 2 system as possible.
""",