Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Safety evals, probably need more comprehensive and generic testing! #1

Open
Bhardwaj-Rishabh opened this issue Mar 8, 2024 · 2 comments
Labels
documentation Improvements or additions to documentation

Comments

@Bhardwaj-Rishabh
Copy link

Bhardwaj-Rishabh commented Mar 8, 2024

Great work!

Regarding safety evals using inthewild_jailbreak_llms, not sure which roleplay dataset is this but it seems to be dedicated to jailbreak ChatGPT. Additionally, curious if there are any reasons to use this (not seem so popular) dataset for safety evaluations?

Location of the file: https://github.com/raga-ai-hub/raga-llm-hub/blob/main/src/raga_llm_hub/utils/data_files/inthewild_jailbreak_llms.txt

While it may not be very harmful to overlook certain aspects in other assessments, safety evaluations need to be very comprehensive and cover a range of test cases, at least matching the current open-source benchmark standards.

@Bhardwaj-Rishabh Bhardwaj-Rishabh changed the title Jailbreak file Safety evals, probably need more generic and comprehensive testing here Mar 8, 2024
@Bhardwaj-Rishabh Bhardwaj-Rishabh changed the title Safety evals, probably need more generic and comprehensive testing here Safety evals, probably need more comprehensive and generic testing! Mar 8, 2024
@kiran-raga kiran-raga added the documentation Improvements or additions to documentation label Mar 11, 2024
@kiran-raga
Copy link

Thanks for your feedback and for highlighting the importance of comprehensive safety evaluations. The choice of the inthewild_jailbreak_llms dataset was based on some popular packages and papers, aiming to explore diverse scenarios that might not be covered by more popular datasets. However, we recognize the need for a broad range of test cases to meet and exceed current open-source benchmark standards.

Improving test quality and coverage is a priority for our next update. We're actively exploring additional datasets and methodologies to enhance our safety evaluations. If you have any suggestions or resources you believe could contribute to this goal, we'd love to hear from you. Your input is invaluable as we strive to make our tool safer and more reliable for everyone. Stay tuned for updates, and thanks again for your constructive feedback.

@Bhardwaj-Rishabh
Copy link
Author

Yes certainly, we (as a research lab) are working in this direction too. Please feel free to reach out to me at [email protected] I can suggest some (open-source) datasets that we and others have built.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

2 participants