Safety evals, probably need more comprehensive and generic testing! #1

Bhardwaj-Rishabh · 2024-03-08T15:21:51Z

Great work!

Regarding safety evals using inthewild_jailbreak_llms, not sure which roleplay dataset is this but it seems to be dedicated to jailbreak ChatGPT. Additionally, curious if there are any reasons to use this (not seem so popular) dataset for safety evaluations?

Location of the file: https://github.com/raga-ai-hub/raga-llm-hub/blob/main/src/raga_llm_hub/utils/data_files/inthewild_jailbreak_llms.txt

While it may not be very harmful to overlook certain aspects in other assessments, safety evaluations need to be very comprehensive and cover a range of test cases, at least matching the current open-source benchmark standards.

kiran-raga · 2024-03-11T06:28:47Z

Thanks for your feedback and for highlighting the importance of comprehensive safety evaluations. The choice of the inthewild_jailbreak_llms dataset was based on some popular packages and papers, aiming to explore diverse scenarios that might not be covered by more popular datasets. However, we recognize the need for a broad range of test cases to meet and exceed current open-source benchmark standards.

Improving test quality and coverage is a priority for our next update. We're actively exploring additional datasets and methodologies to enhance our safety evaluations. If you have any suggestions or resources you believe could contribute to this goal, we'd love to hear from you. Your input is invaluable as we strive to make our tool safer and more reliable for everyone. Stay tuned for updates, and thanks again for your constructive feedback.

Bhardwaj-Rishabh · 2024-03-11T09:24:37Z

Yes certainly, we (as a research lab) are working in this direction too. Please feel free to reach out to me at [email protected] I can suggest some (open-source) datasets that we and others have built.

Bhardwaj-Rishabh changed the title ~~Jailbreak file~~ Safety evals, probably need more generic and comprehensive testing here Mar 8, 2024

Bhardwaj-Rishabh changed the title ~~Safety evals, probably need more generic and comprehensive testing here~~ Safety evals, probably need more comprehensive and generic testing! Mar 8, 2024

kiran-raga added the documentation Improvements or additions to documentation label Mar 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Safety evals, probably need more comprehensive and generic testing! #1

Safety evals, probably need more comprehensive and generic testing! #1

Bhardwaj-Rishabh commented Mar 8, 2024 •

edited

Loading

kiran-raga commented Mar 11, 2024

Bhardwaj-Rishabh commented Mar 11, 2024

Safety evals, probably need more comprehensive and generic testing! #1

Safety evals, probably need more comprehensive and generic testing! #1

Comments

Bhardwaj-Rishabh commented Mar 8, 2024 • edited Loading

kiran-raga commented Mar 11, 2024

Bhardwaj-Rishabh commented Mar 11, 2024

Bhardwaj-Rishabh commented Mar 8, 2024 •

edited

Loading