Multimodal (Vision) support #175

liwii · 2025-02-07T08:28:42Z

Now some OpenAI LLMs support image inputs, so it'd be great if we could support evaluation with image inputs.

Ref: https://platform.openai.com/docs/guides/vision

The goal is to update EvalClient interface and allow metrics with image inputs like the following:

prompts = ["What is in the image?", "What is in the image?",...]
generated_outputs = ["A green parrot flying away from...", "A huge robot woking for...", ...]
images = [
  # Link to the image or base64 encoded image
  "https://...",
  "data:image/jpeg;base64,b656...",
  ...
]

results = langcheck.metrics.answer_relevance_with_input_image(
  generated_outputs=generated_outputs,
  prompts=prompts,
  image_urls=image_urls,
  eval_model=eval_client
)

Probably there are more things to discuss before actually shipping this feature, but we can prototype and run some metrics first.

The text was updated successfully, but these errors were encountered:

liwii · 2025-02-07T08:54:50Z

@matchcase

This is a bit abstract, but it would be also interesting to work on this!!

liwii mentioned this issue Feb 7, 2025

Added Langcheck __version__ #173

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multimodal (Vision) support #175

Multimodal (Vision) support #175

liwii commented Feb 7, 2025

liwii commented Feb 7, 2025

Multimodal (Vision) support #175

Multimodal (Vision) support #175

Comments

liwii commented Feb 7, 2025

liwii commented Feb 7, 2025