Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve accuracy of evaluate relevance blocks #543

Open
RobotSail opened this issue Feb 5, 2025 · 1 comment
Open

Improve accuracy of evaluate relevance blocks #543

RobotSail opened this issue Feb 5, 2025 · 1 comment
Labels
bug Something isn't working

Comments

@RobotSail
Copy link
Member

In the evaluate_relevancy block, the LLM begins assigning scores immediately after seeing the response. This can lead to suboptimal evaluations, as the model typically needs to reason first on what feedback to even give before it can provide an accurate score.

We should adjust this block to instead do the following:

  [Start of Question]
  How does photosynthesis work?
  [End of Question]

  [Start of Response]
  Plants require sunlight and water to grow.
  [End of Response]

  [Start of Feedback]
  - Subject Matter Relevance:
      reasoning: The response is related to plant growth, but does not specifically address the process of photosynthesis.
      score: 0
  - Alignment with Query's Focus:
      reasoning: The response fails to detail the photosynthesis process, missing the specific focus of the query.
      score: 0  
  [End of Feedback]

  [Start of Score]
  0
  [End of Score]

With this minimal change, we can condition the model to provide a more accurate score by conditioning itself on the reasoning for which score it should give.

@RobotSail RobotSail added the bug Something isn't working label Feb 5, 2025
@bbrowning
Copy link
Contributor

This seems like a reasonable suggestion, although we may need to get someone to quantify the impact of changing this and how much it helps or hurts the evaluations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants