Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat Support image URLs in tool outputs for Langchain::Assistant #894

Open
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

Eth3rnit3
Copy link

@Eth3rnit3 Eth3rnit3 commented Dec 3, 2024

Tool Response Standardization

Overview

This PR introduces a standardized way to handle tool responses in the Langchain library, making it easier to manage both content and image responses while maintaining backward compatibility.

Key Changes

1. New Classes and Modules

ToolResponse Class

  • Encapsulates both content and image responses
  • Provides a standardized interface for tool outputs
  • Includes helper methods for string conversion and comparison
class ToolResponse
  attr_reader :content, :image_url

  def initialize(content: nil, image_url: nil)
    raise ArgumentError, "Either content or image_url must be provided" if content.nil? && image_url.nil?
    @content = content
    @image_url = image_url
  end
end

ToolHelpers Module

  • Provides a convenient method for creating tool responses
  • Simplifies the implementation in individual tools
module ToolHelpers
  def tool_response(content: nil, image_url: nil)
    Langchain::ToolResponse.new(content: content, image_url: image_url)
  end
end

2. Tool Updates

All tools have been updated to:

  • Include the ToolHelpers module
  • Return ToolResponse objects instead of raw values
  • Handle both success and error cases consistently

Example from the Calculator tool:

def execute(input:)
  result = Eqn::Calculator.calc(input)
  tool_response(content: result)
rescue Eqn::ParseError
  tool_response(content: "\"#{input}\" is an invalid mathematical expression")
end

3. Assistant Integration

The Assistant class now handles both new and legacy response formats:

def run_tool(tool_call)
  # ... tool execution code ...
  
  output = tool_instance.send(method_name, **tool_arguments)

  if output.is_a?(ToolResponse)
    add_message(
      role: @llm_adapter.tool_role,
      content: output.content,
      image_url: output.image_url,
      tool_call_id: tool_call_id
    )
  else
    submit_tool_output(tool_call_id: tool_call_id, output: output)
  end
end

4. Test Coverage

  • Added comprehensive tests for ToolResponse
  • Updated all tool tests to verify ToolResponse usage
  • Added assistant tests for both response formats

Future Improvements

In a future PR, we plan to:

  1. Move response formatting logic to LLM adapters
  2. Allow each adapter to handle ToolResponse objects differently
  3. Make the code more modular and easier to extend

Breaking Changes

None. The assistant maintains backward compatibility by supporting both:

  • New ToolResponse objects
  • Legacy raw value returns

Migration Guide

To update your custom tools to use the new format:

  1. Include the ToolHelpers module:
include Langchain::ToolHelpers
  1. Use tool_response in your methods:
def your_method
  result = # your computation
  tool_response(content: result)
end

@andreibondarev
Copy link
Collaborator

andreibondarev commented Dec 4, 2024

@Eth3rnit3 Thank you for your PR. Your thoughts on the below... ?

What if we modified this to:

content, image_url = tool_instance.send(method_name, **tool_arguments)

# Rename the method parameter from output: to content:
submit_tool_output(tool_call_id: tool_call_id, content: content, image_url: image_url)

Something else to think about is that we might want to support the base64 encoded image re-presentation in the future, something like:

        {
          "type": "image",
          "source": {
            "type": "base64",
            "media_type": "image/jpeg",
            "data": "/9j/4AAQSkZJRg...",
          }
        }

@Eth3rnit3
Copy link
Author

Yes you're right @andreibondarev it's better this way, it's more implicit and avoids parsing what is output from the tool.

  • Works with Anthropic
  • Works with Mistral

Here's the error message for OpenAPI, which doesn't support:image_url in a message with the role: "tool"

OpenAI HTTP Error (spotted in ruby-openai 7.3.1): {"error"=>{"message"=>"Invalid 'messages[3]'. Image URLs are only allowed for messages with role 'user', but this message with role 'tool' contains an image URL.", "type"=>"invalid_request_error", "param"=>"messages[3]", "code"=>"invalid_value"}}

content.to_s
end

def to_str
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this method? Is it used anywhere?

to_s
end

def include?(other)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this one used?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants