What are some key things to remember while designing a metric? #1464

ArslanS1997 · 2024-09-07T08:15:05Z

So I wanted to know if you guys can provide any resources on best practices around metrics for DSPy.

The LLM program I am optimizing generated Python code. So I have decided that the metric be a score from 0-100.
With binary 50 points if the code does not run and 25 points for relevance, 25 points for correct data handling.

Would this be a good metric?

acse-yl2020 · 2024-09-09T10:12:09Z

I am not sure if you are using LLM to evaluate the relevance and data handling, if so, it is the way I am designing it rn but I feel it may not be the best way of doing so, as the LLM tends to give good scores to answers as long as the answer appears 'reasonable' or 'relevant'.

If you find any better design of metric, you can share it again in the thread.

ArslanS1997 · 2024-09-09T10:22:27Z

Hi so it is a code execution problem, you can test whether the code runs. For relevance I personally judged the relevance of the answer for now. For data handling, so basic pandas operations for dealing with ints/strings turn things into the correct format etc can be checked via an LM.

Although it sortof works but I am not sure if I am doing everything right, are there any constraints on how metrics should be? Do they have to be less than 1 or continuous etc.

acse-yl2020 · 2024-09-09T10:45:33Z

I basically follow the guide from https://dspy-docs.vercel.app/docs/building-blocks/metrics, also some of the guides from DSPy Assertions (but this is for output). Based on my experience, it can be weight, which means that it is flexible, i.e.,
weights = {
'relevance': 0.3,
'mathematical_rigor': 0.4,
'completeness': 0.2,
'clarity': 0.05,
'comparison_to_gold': 0.05
}
final_score = sum(weights[aspect] * scores[aspect] for aspect in weights)

But still, this may not be the best way of designing the metric at least for me. Like I mentioned, LLM tends to provide decent scores as long as the answer is 'relevant' or 'complete'. I am manually optimizing the prompts of evaluating each characteristic based on some existing research, but no luck atm.

Happy to hear from wise opinions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What are some key things to remember while designing a metric? #1464

What are some key things to remember while designing a metric? #1464

ArslanS1997 commented Sep 7, 2024

acse-yl2020 commented Sep 9, 2024

ArslanS1997 commented Sep 9, 2024

acse-yl2020 commented Sep 9, 2024

What are some key things to remember while designing a metric? #1464

What are some key things to remember while designing a metric? #1464

Comments

ArslanS1997 commented Sep 7, 2024

acse-yl2020 commented Sep 9, 2024

ArslanS1997 commented Sep 9, 2024

acse-yl2020 commented Sep 9, 2024