diff --git a/assets/img/blog_post_8_data_composition.png b/assets/img/blog_post_8_data_composition.png index f98ea3efd..b8203edf0 100644 Binary files a/assets/img/blog_post_8_data_composition.png and b/assets/img/blog_post_8_data_composition.png differ diff --git a/blogs/7_open_functions_v2.html b/blogs/7_open_functions_v2.html index 7b2889ab5..a79e62f43 100644 --- a/blogs/7_open_functions_v2.html +++ b/blogs/7_open_functions_v2.html @@ -86,7 +86,6 @@

With the latest iteration of Gorilla OpenFunctions-v2, we are delighted to mark significant advancements in function calling for LLMs within the open-source community. As a direct substitute for its predecessor, Gorilla OpenFunctions-v2 retains its open-source ethos while introducing exciting enhancements. These include support for multiple programming languages such as Python, Java, JavaScript, and REST API - the first among both open-source and closed-source models, alongside the ability to handle multiple and parallel function calls, and the ability to determine function relevance. This update cements Gorilla OpenFunctions-v2's position at the forefront of function calling capabilities among LLMs. Moreover, the drop-in replacement allows for seamless integration of OpenFunctions into a diverse range of applications, from social media platforms like Instagram to delivery services like DoorDash, as well as utility tools including Google Calendar and Stripe.

-

See What's New!! 🚀

@@ -138,7 +137,7 @@

See What's New!! 🚀

web-demo
  • Check out the project: GitHub
  • -
  • Model (7B) on HuggingFace: Model (6.91B) on HuggingFace 🤗: gorilla-llm/gorilla-openfunctions-v2
  • @@ -346,8 +345,8 @@

    Performance on Berkeley Function-Calling Leaderboard €

    OpenFunctions Data Composition & Training 🍕

    - Gorilla OpenFunctions-v2 is a 7B parameter model trained further upon on the - Deepseek-Coder-7B-Instruct-v1.5 model. To trian the model, we collect in total of 65,283 + Gorilla OpenFunctions-v2 is a 6.91B parameter model trained further upon on the + Deepseek-Coder-7B-Instruct-v1.5 6.91B model. To trian the model, we collect in total of 65,283 question-function-answer pairs from three different sources: Python packages (19,353), Java repositories (16,586), Javascript Repositories (4,245), public-API (6,009), and Command Line Tools (19,090) from various cloud providers. The data composition is shown in the figure below. @@ -413,7 +412,7 @@

    OpenFunctions Data Composition & Training 🍕Conclusion

    -

    We are happy to release gorilla-openfunctions-v2, a 7B parameter model trained on +

    We are happy to release gorilla-openfunctions-v2, a 6.91B parameter model trained on top of the Deepseek-Coder-7B-Instruct-v1.5 LLM. It takes-in the users prompt along with multiple API calls and returns the functions with the right arguments. With OpenFunctions we extended native diff --git a/blogs/8_berkeley_function_calling_leaderboard.html b/blogs/8_berkeley_function_calling_leaderboard.html index 66581af2b..bc62f6242 100644 --- a/blogs/8_berkeley_function_calling_leaderboard.html +++ b/blogs/8_berkeley_function_calling_leaderboard.html @@ -111,7 +111,17 @@

    function calls with multiple programming languages and parallel and multiple function calls. We also provide a specific debugging feature that when the provided function is not suitable for your task, the model will output an “Error Message”. - + +

    +

    + Quick Links: +

    @@ -180,7 +190,7 @@

    Evaluation Categories 📊

    Python Evaluation
    @@ -207,9 +217,9 @@
    Python Evaluation
    Non-Python Evaluation

    While the previous categories consist of the majority of our evaluations, we include other - specific categories, namely REST API, SQL, Java, and JavaScript, to evaluate model performance on diverse scenarios and support of multiple programming languages, and are resilient + specific categories, namely Chatting Capability, Function Relevance Detection, REST API, SQL, Java, and JavaScript, to evaluate model performance on diverse scenarios and support of multiple programming languages, and are resilient to irrelevant questions and function documentations.

    - +

    Chatting Capability: In Chatting Capability, we design scenarios where no functions are passed in, and the users ask generic questions - this is similar to using the model as a general-purpose chatbot. We evaluate if the model is able to output chat messages and recognize that it does not need to invoke any functions. Note the difference with “Relevance” where the model is expected to also evaluate if any of the function input are relevant or not. We include this category for internal model evaluation and exclude the statistics from live leaderboard. We currently are working on better evaluation on chatability and ensure the chat is relevant and coherent with users' request and open to suggestions and feedback from the community.

    Function Relevance Detection: In function relevance detection, we design scenarios where none of the provided functions are relevant and supposed to be invoked. We expect the model's output to be no function call. This scenario provides insight to whether a model @@ -239,12 +249,11 @@

    Non-Python Evaluation

    Java + Javascript: Despite function calling formats being the same across most - programming languages, each programming language has language specific types. For example, C - has pointer type, Java has HashMap type. The goal of this test category is to understand how - well the function calling model can be extended to not just JSON and Python type but all the + programming languages, each programming language has language specific types. For example, Java has HashMap type. The goal of this test category is to understand how + well the function calling model can be extended to not just Python type but all the language specific typings. We included 100 examples for Java AST evaluation and 70 examples for Javascript AST evaluation.

    -

    The three categories outlined above provide insight into the performance of different models across popular API call scenarios, offering valuable perspectives on the potential of function-calling models. +

    The categories outlined above provide insight into the performance of different models across popular API call scenarios, offering valuable perspectives on the potential of function-calling models.

    @@ -406,6 +415,12 @@
    Abstract Syntax Tree (AST) Evaluation 🌳
  • Possible Anything ["Manchester United", "Man United", "Man U", "MUFC"]
  • +
  • Handling Optional Parameters in Function Calls: + +
  • @@ -497,7 +512,7 @@

    Model Manual

    Gorilla OpenFunctions-v2 - 7B + 6.91B Gorilla LLM Apache 2.0