You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi @chujiezheng, I am a fan of your works! We would love to add new models. Could you give us more information on the model you want to add? Currently we are just putting a very lightweight leaderboard on README doc.
Due to the API and GPU limits, currently I have only ran the evaluation for Starling-LM-7B-beta-ExPO, which obtains a score of 24.9 and a 95% CI of (-2.2, 1.8). I attach the evaluation output files here. I will appreciate it if you could add Starling-LM-7B-beta-ExPO to the leaderboard. I will also greatly appreciate it if you could help evaluate the above other models and add them to the leaderboard.
BTW, as many research work has built their evaluation on Arena-Hard, do you have plans to build a leaderboard website like AlpacaEval?
Thanks for your great work. Can I request for evaluation for new models to add into the leaderboard?
The text was updated successfully, but these errors were encountered: