You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Regarding safety evals using inthewild_jailbreak_llms, not sure which roleplay dataset is this but it seems to be dedicated to jailbreak ChatGPT. Additionally, curious if there are any reasons to use this (not seem so popular) dataset for safety evaluations?
While it may not be very harmful to overlook certain aspects in other assessments, safety evaluations need to be very comprehensive and cover a range of test cases, at least matching the current open-source benchmark standards.
The text was updated successfully, but these errors were encountered:
Bhardwaj-Rishabh
changed the title
Jailbreak file
Safety evals, probably need more generic and comprehensive testing here
Mar 8, 2024
Bhardwaj-Rishabh
changed the title
Safety evals, probably need more generic and comprehensive testing here
Safety evals, probably need more comprehensive and generic testing!
Mar 8, 2024
Thanks for your feedback and for highlighting the importance of comprehensive safety evaluations. The choice of the inthewild_jailbreak_llms dataset was based on some popular packages and papers, aiming to explore diverse scenarios that might not be covered by more popular datasets. However, we recognize the need for a broad range of test cases to meet and exceed current open-source benchmark standards.
Improving test quality and coverage is a priority for our next update. We're actively exploring additional datasets and methodologies to enhance our safety evaluations. If you have any suggestions or resources you believe could contribute to this goal, we'd love to hear from you. Your input is invaluable as we strive to make our tool safer and more reliable for everyone. Stay tuned for updates, and thanks again for your constructive feedback.
Yes certainly, we (as a research lab) are working in this direction too. Please feel free to reach out to me at [email protected] I can suggest some (open-source) datasets that we and others have built.
Great work!
Regarding safety evals using inthewild_jailbreak_llms, not sure which roleplay dataset is this but it seems to be dedicated to jailbreak ChatGPT. Additionally, curious if there are any reasons to use this (not seem so popular) dataset for safety evaluations?
Location of the file: https://github.com/raga-ai-hub/raga-llm-hub/blob/main/src/raga_llm_hub/utils/data_files/inthewild_jailbreak_llms.txt
While it may not be very harmful to overlook certain aspects in other assessments, safety evaluations need to be very comprehensive and cover a range of test cases, at least matching the current open-source benchmark standards.
The text was updated successfully, but these errors were encountered: