Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate NLP Modality into EVA #739

Open
1 of 4 tasks
kurbanrita opened this issue Dec 24, 2024 · 1 comment
Open
1 of 4 tasks

Integrate NLP Modality into EVA #739

kurbanrita opened this issue Dec 24, 2024 · 1 comment

Comments

@kurbanrita
Copy link
Collaborator

kurbanrita commented Dec 24, 2024

This is an epic ticket where I will document the integration of the NLP modality to eva.

High-Level Tasks:

  • Literature review of relevant datasets that can be used as eva tasks.
  • Adding a custom text dataset to a new language folder.
  • Define relevant metrics for text classification, free-form text generation, etc.
  • End-to-end pipeline set up, including relevant modificaitons to the trainer, configurations, etc.
@kurbanrita
Copy link
Collaborator Author

kurbanrita commented Dec 24, 2024

An initial literature review is available here: https://kaiko-ai.atlassian.net/l/cp/Xg1JNjZJ.
Its primary objective was picking the initial task but I will continue adding more datasets as I come across them.

I have preliminarily chosen to use PubMedQA as the first text task for eva. First, its classification task with yes/no/maybe questions maps naturally to eva’s classification task, ensuring seamless integration. PubMedQA provides over 1,000 high-quality, human-annotated questions, offering reliable and accurate benchmarks. Being fully open-source, it allows easy access and use without licensing issues. The QA format aligns perfectly with our internal focus on question–answer tasks, making it highly relevant to our needs. Additionally, PubMedQA is not too easy, providing a meaningful challenge that effectively differentiates between simple models and advanced LLMs. Finally, as a widely used and trusted dataset in the medical NLP community, PubMedQA ensures that our evaluations are based on a credible and respected benchmark. These factors make PubMedQA an ideal initial choice for expanding eva’s capabilities to include text-based evaluations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant