Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multimodal (Vision) support #175

Open
liwii opened this issue Feb 7, 2025 · 1 comment
Open

Multimodal (Vision) support #175

liwii opened this issue Feb 7, 2025 · 1 comment

Comments

@liwii
Copy link
Contributor

liwii commented Feb 7, 2025

Now some OpenAI LLMs support image inputs, so it'd be great if we could support evaluation with image inputs.

Ref: https://platform.openai.com/docs/guides/vision

The goal is to update EvalClient interface and allow metrics with image inputs like the following:

prompts = ["What is in the image?", "What is in the image?",...]
generated_outputs = ["A green parrot flying away from...", "A huge robot woking for...", ...]
images = [
  # Link to the image or base64 encoded image
  "https://...",
  "data:image/jpeg;base64,b656...",
  ...
]

results = langcheck.metrics.answer_relevance_with_input_image(
  generated_outputs=generated_outputs,
  prompts=prompts,
  image_urls=image_urls,
  eval_model=eval_client
)

Probably there are more things to discuss before actually shipping this feature, but we can prototype and run some metrics first.

@liwii
Copy link
Contributor Author

liwii commented Feb 7, 2025

@matchcase

This is a bit abstract, but it would be also interesting to work on this!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant