Berkeley Function Calling Leaderboard Updates (v1.0)
Highlights
🏆 We are thrilled to announce the stable v1.0 release of the Berkeley Function Calling Leaderboard data-set and eval-pipeline! A heartfelt thank you to all our contributors and users for your enthusiastic engagement and support throughout v1. We are just getting started! Buckle-up for v2 🚀 🚀 🚀
What's Changed
- better handle float value comparison by @vandyxiaowei in #407
- Bump pymysql from 1.1.0 to 1.1.1 in /goex by @dependabot in #453
- Fixes For NexusHandler by @VenkatKS in #437
- [BFCL] PR#407 Evaluation Pipeline Robustness Patch by @HuanzhiMao in #462
- Add firefunction-v2 to the leaderboard by @pgarbacki in #470
- [BFCL] Add Claude 3.5 Sonnet Function Calling Infernece Inference by @Fanjia-Yan in #480
- [BFCL] Standardize Model Name Among handler_map and eval_runner_helper by @HuanzhiMao in #439
- Remove redundant tokens from GPT-handler by @hellovai in #490
- [GoEx] Undo Minor Bug Fix + README Minor Improvement by @royh02 in #468
- [BFCL] Add ability to evaluate Nemotron-4-340B-Instruct by @Fanjia-Yan in #489
- fix some data issues in parallel/parallel multiple answers by @vandyxiaowei in #423
- [BFCL] Add Support for GLM-4-9B function calling inference by @Fanjia-Yan in #474
- [BFCL] Sanity check is now optional by @ShishirPatil in #496
- [BFCL] Improved tree-sitter java, javascript installation by @CharlieJCJ in #505
- [BFCL] Fix Possible Answer for AST Parallel and Parallel_Multiple Category by @HuanzhiMao in #503
- [BFCL] Add Test Dataset to Repository by @HuanzhiMao in #504
- [BFCL] Support Category-Specific Generation for OSS Model, Remove eval_data_compilation Step by @HuanzhiMao in #512
- [BFCL] Fix Double-Casting Issue in model_handler for Java and JS category. by @HuanzhiMao in #516
- [BFCL] Fix Dataset Issue for executable_parallel_multiple Category by @HuanzhiMao in #522
- [BFCL] add ibm-granite-20b-functioncallling model by @MayankAgarwal in #525
- [BFCL] Overhaul apply_function_credential_config.py for Enhanced Usability by @HuanzhiMao in #508
- Fixed the warning message "Setting
pad_token_id
toeos_token_id
:1… by @dineshkumarsarangapani in #110 - [BFCL] Specify package version in requirements.txt by @HuanzhiMao in #515
- [BFCL] Standardize TEST_CATEGORY Among eval_runner.py and openfunctions_evaluation.py by @HuanzhiMao in #506
- fix line return by @fantasist in #531
- [BFCL] Apply Fix to Newly Introduced Model Handler Missed in Previous PR Merge by @HuanzhiMao in #536
- [RAFT] Fix Datapoint Field in Formatter for Data Generation by @HuanzhiMao in #535
- [BFCL] Fix language_specific_pre_processing for Java and JavaScript Test Category by @HuanzhiMao in #538
- [BFCL] Patch Generation Script for Locally Hosted OSS model by @HuanzhiMao in #537
- [BFCL] Support Multi-Model Multi-Category Generation; Add Index to Dataset; Handle vLLM Benign Error by @HuanzhiMao in #540
- Add NousResearch/{Hermes-2-Pro-Llama-3-8B,Hermes-2-Theta-Llama-3-8B} models by @alonsosilvaallende in #542
- [BFCL] Fix Dataset Pre-Processing for Java and JavaScript Test Category, Part 2 by @HuanzhiMao in #545
- Add Salesforce xLAM handler and fix minor issues by @zuxin666 in #532
- Add NousResearch/Hermes-2-{Pro-Llama-3-80B,Theta-Llama-3-80B} by @alonsosilvaallende in #556
- Add Yi Handler by @fantasist in #543
- Add more descriptive error message in eval_runner.py by @alonsosilvaallende in #552
- [BFCL] Fix JS type converter to handle dictionaries with array values by @CharlieJCJ in #549
- [BFCL] Handling rate limits by @ShishirPatil in #559
- [BFCL] Fix Dataset and Possible Answer Issue by @HuanzhiMao in #557
- [BFCL] Dataset Question Fix for Executable Parallel Category by @HuanzhiMao in #568
- [BFCL] Add New Model gpt-4o-2024-08-06, gpt-4o-mini-2024-07-18 by @HuanzhiMao in #569
- [BFCL] Add New Model open-mistral-nemo-2407, open-mixtral-8x22b, open-mixtral-8x7b by @HuanzhiMao in #570
- [BFCL] Improve Warning Message when Aggregating Results by @HuanzhiMao in #517
- [BFCL] Add New Model functionary-small-v3.1, functionary-small-v3.2, functionary-medium-v3.1; Update Token Price by @HuanzhiMao in #573
- [BFCL] Set Model Temperature to 0.001 for All Models by @HuanzhiMao in #574
- [BFCL] Support Parallel Inference for Hosted Models by @HuanzhiMao in #571
- [BFCL Chore] Fix Functionary Medium 3.1 model name & add readme parallel inference by @CharlieJCJ in #577
New Contributors
- @dependabot made their first contribution in #453
- @VenkatKS made their first contribution in #437
- @pgarbacki made their first contribution in #470
- @hellovai made their first contribution in #490
- @MayankAgarwal made their first contribution in #525
- @dineshkumarsarangapani made their first contribution in #110
- @fantasist made their first contribution in #531
- @alonsosilvaallende made their first contribution in #542
Full Changelog: v0.3...v1.0