You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
An initial literature review is available here: https://kaiko-ai.atlassian.net/l/cp/Xg1JNjZJ.
Its primary objective was picking the initial task but I will continue adding more datasets as I come across them.
I have preliminarily chosen to use PubMedQA as the first text task for eva. First, its classification task with yes/no/maybe questions maps naturally to eva’s classification task, ensuring seamless integration. PubMedQA provides over 1,000 high-quality, human-annotated questions, offering reliable and accurate benchmarks. Being fully open-source, it allows easy access and use without licensing issues. The QA format aligns perfectly with our internal focus on question–answer tasks, making it highly relevant to our needs. Additionally, PubMedQA is not too easy, providing a meaningful challenge that effectively differentiates between simple models and advanced LLMs. Finally, as a widely used and trusted dataset in the medical NLP community, PubMedQA ensures that our evaluations are based on a credible and respected benchmark. These factors make PubMedQA an ideal initial choice for expanding eva’s capabilities to include text-based evaluations.
This is an epic ticket where I will document the integration of the NLP modality to eva.
High-Level Tasks:
The text was updated successfully, but these errors were encountered: