Skip to content

Berkeley Function Calling Leaderboard Updates (v1.0)

Compare
Choose a tag to compare
@ShishirPatil ShishirPatil released this 15 Aug 04:35
· 126 commits to main since this release
9df5c34

Highlights

🏆 We are thrilled to announce the stable v1.0 release of the Berkeley Function Calling Leaderboard data-set and eval-pipeline! A heartfelt thank you to all our contributors and users for your enthusiastic engagement and support throughout v1. We are just getting started! Buckle-up for v2 🚀 🚀 🚀

What's Changed

  • better handle float value comparison by @vandyxiaowei in #407
  • Bump pymysql from 1.1.0 to 1.1.1 in /goex by @dependabot in #453
  • Fixes For NexusHandler by @VenkatKS in #437
  • [BFCL] PR#407 Evaluation Pipeline Robustness Patch by @HuanzhiMao in #462
  • Add firefunction-v2 to the leaderboard by @pgarbacki in #470
  • [BFCL] Add Claude 3.5 Sonnet Function Calling Infernece Inference by @Fanjia-Yan in #480
  • [BFCL] Standardize Model Name Among handler_map and eval_runner_helper by @HuanzhiMao in #439
  • Remove redundant tokens from GPT-handler by @hellovai in #490
  • [GoEx] Undo Minor Bug Fix + README Minor Improvement by @royh02 in #468
  • [BFCL] Add ability to evaluate Nemotron-4-340B-Instruct by @Fanjia-Yan in #489
  • fix some data issues in parallel/parallel multiple answers by @vandyxiaowei in #423
  • [BFCL] Add Support for GLM-4-9B function calling inference by @Fanjia-Yan in #474
  • [BFCL] Sanity check is now optional by @ShishirPatil in #496
  • [BFCL] Improved tree-sitter java, javascript installation by @CharlieJCJ in #505
  • [BFCL] Fix Possible Answer for AST Parallel and Parallel_Multiple Category by @HuanzhiMao in #503
  • [BFCL] Add Test Dataset to Repository by @HuanzhiMao in #504
  • [BFCL] Support Category-Specific Generation for OSS Model, Remove eval_data_compilation Step by @HuanzhiMao in #512
  • [BFCL] Fix Double-Casting Issue in model_handler for Java and JS category. by @HuanzhiMao in #516
  • [BFCL] Fix Dataset Issue for executable_parallel_multiple Category by @HuanzhiMao in #522
  • [BFCL] add ibm-granite-20b-functioncallling model by @MayankAgarwal in #525
  • [BFCL] Overhaul apply_function_credential_config.py for Enhanced Usability by @HuanzhiMao in #508
  • Fixed the warning message "Setting pad_token_id to eos_token_id:1… by @dineshkumarsarangapani in #110
  • [BFCL] Specify package version in requirements.txt by @HuanzhiMao in #515
  • [BFCL] Standardize TEST_CATEGORY Among eval_runner.py and openfunctions_evaluation.py by @HuanzhiMao in #506
  • fix line return by @fantasist in #531
  • [BFCL] Apply Fix to Newly Introduced Model Handler Missed in Previous PR Merge by @HuanzhiMao in #536
  • [RAFT] Fix Datapoint Field in Formatter for Data Generation by @HuanzhiMao in #535
  • [BFCL] Fix language_specific_pre_processing for Java and JavaScript Test Category by @HuanzhiMao in #538
  • [BFCL] Patch Generation Script for Locally Hosted OSS model by @HuanzhiMao in #537
  • [BFCL] Support Multi-Model Multi-Category Generation; Add Index to Dataset; Handle vLLM Benign Error by @HuanzhiMao in #540
  • Add NousResearch/{Hermes-2-Pro-Llama-3-8B,Hermes-2-Theta-Llama-3-8B} models by @alonsosilvaallende in #542
  • [BFCL] Fix Dataset Pre-Processing for Java and JavaScript Test Category, Part 2 by @HuanzhiMao in #545
  • Add Salesforce xLAM handler and fix minor issues by @zuxin666 in #532
  • Add NousResearch/Hermes-2-{Pro-Llama-3-80B,Theta-Llama-3-80B} by @alonsosilvaallende in #556
  • Add Yi Handler by @fantasist in #543
  • Add more descriptive error message in eval_runner.py by @alonsosilvaallende in #552
  • [BFCL] Fix JS type converter to handle dictionaries with array values by @CharlieJCJ in #549
  • [BFCL] Handling rate limits by @ShishirPatil in #559
  • [BFCL] Fix Dataset and Possible Answer Issue by @HuanzhiMao in #557
  • [BFCL] Dataset Question Fix for Executable Parallel Category by @HuanzhiMao in #568
  • [BFCL] Add New Model gpt-4o-2024-08-06, gpt-4o-mini-2024-07-18 by @HuanzhiMao in #569
  • [BFCL] Add New Model open-mistral-nemo-2407, open-mixtral-8x22b, open-mixtral-8x7b by @HuanzhiMao in #570
  • [BFCL] Improve Warning Message when Aggregating Results by @HuanzhiMao in #517
  • [BFCL] Add New Model functionary-small-v3.1, functionary-small-v3.2, functionary-medium-v3.1; Update Token Price by @HuanzhiMao in #573
  • [BFCL] Set Model Temperature to 0.001 for All Models by @HuanzhiMao in #574
  • [BFCL] Support Parallel Inference for Hosted Models by @HuanzhiMao in #571
  • [BFCL Chore] Fix Functionary Medium 3.1 model name & add readme parallel inference by @CharlieJCJ in #577

New Contributors

Full Changelog: v0.3...v1.0