From 9df5c346ee0556c8a7cb09fd7206a39aadd904c2 Mon Sep 17 00:00:00 2001 From: Charlie Cheng-Jie Ji <55744150+CharlieJCJ@users.noreply.github.com> Date: Wed, 14 Aug 2024 21:18:01 -0700 Subject: [PATCH] [BFCL Chore] Fix Functionary Medium 3.1 model name & add readme parallel inference (#577) Changes: - Fix Functionary Medium 3.1 model version name in `eval_runner_helper.py` - add readme parallel inference --------- Co-authored-by: Huanzhi (Hans) Mao --- berkeley-function-call-leaderboard/README.md | 10 +++++++--- .../eval_checker/eval_runner_helper.py | 2 +- 2 files changed, 8 insertions(+), 4 deletions(-) diff --git a/berkeley-function-call-leaderboard/README.md b/berkeley-function-call-leaderboard/README.md index d84e07bdd..736c70cb3 100644 --- a/berkeley-function-call-leaderboard/README.md +++ b/berkeley-function-call-leaderboard/README.md @@ -66,11 +66,12 @@ If decided to run OSS model, the generation script uses vllm and therefore requi ### Generating LLM Responses -Use the following command for LLM inference of the evaluation dataset with specific models +Use the following command for LLM inference of the evaluation dataset with specific models. ```bash -python openfunctions_evaluation.py --model MODEL_NAME --test-category TEST_CATEGORY +python openfunctions_evaluation.py --model MODEL_NAME --test-category TEST_CATEGORY --num-threads 1 ``` +You can optionally specify the number of threads to use for *parallel inference* by setting the `--num-threads` flag to speed up inference for **hosted models**, not applicable for OSS models. For available options for `MODEL_NAME` and `TEST_CATEGORY`, please refer to the [Models Available](#models-available) and [Available Test Category](#available-test-category) section below. @@ -222,7 +223,7 @@ Some companies have proposed some optimization strategies in their models' handl * [August 8, 2024] [#574](https://github.com/ShishirPatil/gorilla/pull/574): Set temperature to 0.001 for all models for consistency and reproducibility. * [August 7, 2024] [#571](https://github.com/ShishirPatil/gorilla/pull/571): Support parallel inference for hosted models. User can specify the number of threads to use for parallel inference by setting the `--num-threads` flag. The default is 1, which means no parallel inference. -* [August 6, 2024] [#569](https://github.com/ShishirPatil/gorilla/pull/569), [#570](https://github.com/ShishirPatil/gorilla/pull/570): Add the following new models to the leaderboard: +* [August 6, 2024] [#569](https://github.com/ShishirPatil/gorilla/pull/569), [#570](https://github.com/ShishirPatil/gorilla/pull/570), [#573](https://github.com/ShishirPatil/gorilla/pull/573): Add the following new models to the leaderboard: * `open-mistral-nemo-2407` * `open-mistral-nemo-2407-FC-Any` * `open-mistral-nemo-2407-FC-Auto` @@ -234,6 +235,9 @@ Some companies have proposed some optimization strategies in their models' handl * `gpt-4o-mini-2024-07-18-FC` * `gpt-4o-2024-08-06` * `gpt-4o-2024-08-06-FC` + * `meetkai/functionary-medium-v3.1-FC` + * `meetkai/functionary-small-v3.1-FC` + * `meetkai/functionary-small-v3.2-FC` * [August 5, 2024] [#568](https://github.com/ShishirPatil/gorilla/pull/568): Rephrase the question prompt for the `executable_parallel_function` category to remove potentially misleading information implying multi-turn function calls. * [August 4, 2024] [#557](https://github.com/ShishirPatil/gorilla/pull/557): Bug fix in the possible answers. * simple: 7 affected diff --git a/berkeley-function-call-leaderboard/eval_checker/eval_runner_helper.py b/berkeley-function-call-leaderboard/eval_checker/eval_runner_helper.py index 078173e76..54456b979 100644 --- a/berkeley-function-call-leaderboard/eval_checker/eval_runner_helper.py +++ b/berkeley-function-call-leaderboard/eval_checker/eval_runner_helper.py @@ -253,7 +253,7 @@ "MIT", ], "meetkai/functionary-medium-v3.1-FC": [ - "Functionary-Medium-v3.0 (FC)", + "Functionary-Medium-v3.1 (FC)", "https://huggingface.co/meetkai/functionary-medium-v3.1", "MeetKai", "MIT",