-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Llm evaluation #117
base: main
Are you sure you want to change the base?
Llm evaluation #117
Conversation
Integrating deepeval to senselab
… into llm_evaluation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice use of style and cleanliness, some quick points -
-Still failing checks
-We have a script_line data structure available already in utils/datastructures, is there a reason you are rewriting rather than importing this?
-write a file called metrics.py, with an abstract base class called Metric, define the abstractmethods you want to be inherited by the different implementations of Metric (see https://docs.python.org/3/library/abc.html), then define various implementations (rouge, etc)
-give some various options for calculating overall_score in evaluate_conversation, perhaps harmonic mean, etc
by the time of the review, all tests failed.
Senselab project integrates deepeval for evaluating conversations,
using an api.py script to interface with deep_eval.py,
which includes a custom ROUGE metric for comprehensive evaluation.
The ScriptLine class standardizes input data, and unit tests ensure accurate functionality,
making Senselab a robust wrapper for deepeval and other tools.