-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimized prompt for multi-class classification contains only a subset of classifiers #1509
Comments
Hi @aaronbriel , The optimized_program currently includes few-shot examples from only 8 of the classifiers because the BootstrapWithRandomSearch configuration is set to select: To get unique few-shot examples for all 41 classifiers, you can increase these parameters to 41. However, note that the selection of fewshot examples in BootstrapFewShot doesn't guarantee uniqueness in all 41 few-shot demos (the optimizer just selects a set of 41 few-shots that pass the metric): Some potential solutions for this could be:
the 2nd solution is likely more expensive but may ensure some more diversity by providing multiple sets of few-shots for the unique classifiers, which can potentially raise performance Let me know if this helps! |
@arnavsinghvi11 thank for the quick response! I will try this and let you know the results. Thanks! |
@arnavsinghvi11 I keep running into the error below. I thought I had resolved it by adding Do you know of any other tricks people have used to resolve this?
|
Using the recommended solution in (1) above, the resulting prompt was still missing 20 intents so that is not a feasible solution for a production release. The "Student must be uncompiled" issue may have not occurred due to certain data in a missed intent not being encountered. I'm going to have to hold off on leveraging this tool until I or another person can find a solution to said issue. |
@aaronbriel this may be helpful https://github.com/KarelDO/xmc.dspy |
I followed the tutorials for optimizing a DSPy program for the task of multi-class classification and the "optimized" prompt resulted in a small subset of the available classifiers, making it unsuitable for consideration in a production environment.
I'll provide the relevant chunks of notebook code but I won't be able to actually show the prompt itself as it contains production data. Hopefully this is sufficient for identification of what may be the issue.
ISSUE 1:
The main issue is that the final "optimized" prompt only contains single few-shot samples for 8 of the 41 classifiers (with one of the classifiers having 2 samples). I expected it to contain multiple few-shot samples for each of the 41 classifiers.
ISSUE 2:
The secondary issue was that the evaluation metric showed a rather low score of 64.34. I expected this to be much higher since I trained with a decent size ground truth dataset (that was manually curated for accuracy) of 50 samples per classifier.
I'm guessing this is related to my optimizer configuration but I'm not sure what to adjust. Please advise. Thank you!
This resulted in successful "training", running in 8 sets. I then completed an evaluation:
I then checked the optimized prompt by doing:
ISSUE 1:
The resulting
optimized_intent_classifier.json
had single few-shot samples for only 8 intents, with one of the intents having 2 samples. There are 41 intents, so I expected multiple few-shot samples for each of the 41 intents.ISSUE 2:
This showed a final score of 64.34, which was admittedly far lower than expected as I provided a ground truth dataset of 50 samples per intent.
The text was updated successfully, but these errors were encountered: