[Bug Fix] data composition, optional parameter handling (#255)

Blog 8 - Fix data composition numbers - Included optional parameter handling descriptions - parameter 6.91B fixed - add quick links --------- Co-authored-by: Huanzhi Mao <[email protected]>
ShishirPatil · Mar 12, 2024 · 4feb55e · 4feb55e
1 parent 69e10f3
commit 4feb55e
Show file tree

Hide file tree

Showing 3 changed files with 28 additions and 14 deletions.
diff --git a/assets/img/blog_post_8_data_composition.png b/assets/img/blog_post_8_data_composition.png
diff --git a/blogs/7_open_functions_v2.html b/blogs/7_open_functions_v2.html
@@ -86,7 +86,6 @@ <h4 class="text-center" style="margin: 0px;">
                     <p>
                         With the latest iteration of Gorilla OpenFunctions-v2, we are delighted to mark significant advancements in function calling for LLMs within the open-source community. As a direct substitute for its predecessor, Gorilla OpenFunctions-v2 retains its open-source ethos while introducing exciting enhancements. These include support for multiple programming languages such as Python, Java, JavaScript, and REST API - the first among both open-source and closed-source models, alongside the ability to handle multiple and parallel function calls, and the ability to determine function relevance. This update cements Gorilla OpenFunctions-v2's position at the forefront of function calling capabilities among LLMs. Moreover, the drop-in replacement allows for seamless integration of OpenFunctions into a diverse range of applications, from social media platforms like Instagram to delivery services like DoorDash, as well as utility tools including Google Calendar and Stripe.
                     </p>
-
                 </div>
 
                 <h4 id="whats_new"> See What's New!! &#128640 </h4>
@@ -138,7 +137,7 @@ <h4 id="whats_new"> See What's New!! &#128640 </h4>
                                 web-demo</a></li>
                         <li style="margin-bottom: 5px;">Check out the project: <a
                                 href="https://github.com/ShishirPatil/gorilla/tree/main/openfunctions">GitHub</a></li>
-                        <li style="margin-bottom: 5px;">Model (7B) on HuggingFace: <a
+                        <li style="margin-bottom: 5px;">Model (6.91B) on HuggingFace 🤗: <a
                                 href="https://huggingface.co/gorilla-llm/gorilla-openfunctions-v2">gorilla-llm/gorilla-openfunctions-v2</a>
                         </li>
                     </ul>
@@ -346,8 +345,8 @@ <h4 id="benchmarking">Performance on Berkeley Function-Calling Leaderboard &#128
                 <h4 id="data_composition"> OpenFunctions Data Composition & Training &#127829</h4>
                 <div class="body">
                     <p>
-                        Gorilla OpenFunctions-v2 is a 7B parameter model trained further upon on the
-                        Deepseek-Coder-7B-Instruct-v1.5 model. To trian the model, we collect in total of <b>65,283
+                        Gorilla OpenFunctions-v2 is a 6.91B parameter model trained further upon on the
+                        Deepseek-Coder-7B-Instruct-v1.5 6.91B model. To trian the model, we collect in total of <b>65,283
                         question-function-answer pairs</b> from three different sources: Python packages (19,353), Java
                         repositories (16,586), Javascript Repositories (4,245), public-API (6,009), and Command Line
                         Tools (19,090) from various cloud providers. The data composition is shown in the figure below.
@@ -413,7 +412,7 @@ <h4 id="data_composition"> OpenFunctions Data Composition & Training &#127829</h
                 <h4 id="conclusion">Conclusion</h4>
 
                 <div class="body">
-                    <p>We are happy to release <code style="background-color: #eee; padding: 2px 4px; border-radius: 4px;">gorilla-openfunctions-v2</code>, a 7B parameter model trained on
+                    <p>We are happy to release <code style="background-color: #eee; padding: 2px 4px; border-radius: 4px;">gorilla-openfunctions-v2</code>, a 6.91B parameter model trained on
                         top of the Deepseek-Coder-7B-Instruct-v1.5 LLM.
                         It takes-in the users prompt along with multiple API calls and returns the functions with the
                         right arguments. With OpenFunctions we extended native

diff --git a/blogs/8_berkeley_function_calling_leaderboard.html b/blogs/8_berkeley_function_calling_leaderboard.html
@@ -111,7 +111,17 @@ <h4 class="text-center" style="margin: 0px;">
                         function calls with multiple programming languages and parallel and multiple function calls. We
                         also provide a specific debugging feature that when the provided function is not suitable for
                         your task, the model will output an “Error Message”.
-
+
+                    </p>
+                    <p>
+                        Quick Links:
+                        <ul>
+                            <li>Live Leaderboard: <a href="https://gorilla.cs.berkeley.edu/leaderboard.html">Website</a></li>
+                            <li>BFCL Evaluation Dataset: <a href="https://huggingface.co/datasets/gorilla-llm/Berkeley-Function-Calling-Leaderboard"> HuggingFace Dataset 🤗</a></li>
+                            <li>Gradio Demo: <a href="https://huggingface.co/spaces/gorilla-llm/berkeley-function-calling-leaderboard"> HuggingFace Space 🤗 </a></li>
+                            <li>Reproducibility: <a href="https://github.com/ShishirPatil/gorilla/tree/main/berkeley-function-call-leaderboard">Github Code</a></li>
+                            <li>OpenFunctions-v2 (6.91B) on HuggingFace 🤗: <a href="https://huggingface.co/gorilla-llm/gorilla-openfunctions-v2">gorilla-llm/gorilla-openfunctions-v2</a></li>
+                        </ul>
                     </p>
                 </div>
 
@@ -180,7 +190,7 @@ <h4 id="categories">Evaluation Categories 📊</h4>
                         <ul>
                             <li style="margin-bottom: 5px;"><b>Python</b>: Simple Function, Multiple Function, Parallel Function, Parallel Multiple
                                 Function</li>
-                            <li style="margin-bottom: 5px;"><b>Non-Python</b>: Function Relevance Detection, REST API, SQL, Java, Javascript</li>
+                            <li style="margin-bottom: 5px;"><b>Non-Python</b>: Chatting Capability, Function Relevance Detection, REST API, SQL, Java, Javascript</li>
                         </ul>
                         </div>
                         <h5 id="benchmarking">Python Evaluation</h5>
@@ -207,9 +217,9 @@ <h5 id="benchmarking">Python Evaluation</h5>
                         <h5 id="benchmarking">Non-Python Evaluation</h5>
                         <div class="body">
                         <p>While the previous categories consist of the majority of our evaluations, we include other
-                            specific categories, namely REST API, SQL, Java, and JavaScript, to evaluate model performance on diverse scenarios and support of multiple programming languages, and are resilient
+                            specific categories, namely Chatting Capability, Function Relevance Detection, REST API, SQL, Java, and JavaScript, to evaluate model performance on diverse scenarios and support of multiple programming languages, and are resilient
                             to irrelevant questions and function documentations.</p>
-
+                        <p><strong>Chatting Capability:</strong> In Chatting Capability, we design scenarios where no functions are passed in, and the users ask generic questions - this is similar to using the model as a general-purpose chatbot. We evaluate if the model is able to output chat messages and recognize that it does not need to invoke any functions. Note the difference with “Relevance” where the model is expected to also evaluate if any of the function input are relevant or not. We include this category for internal model evaluation and exclude the statistics from live leaderboard. We currently are working on better evaluation on chatability and ensure the chat is relevant and coherent with users' request and open to suggestions and feedback from the community. </p>
                         <p><strong>Function Relevance Detection:</strong> In function relevance detection, we design scenarios
                             where none of the provided functions are relevant and supposed to be invoked. We expect the
                             model's output to be no function call. This scenario provides insight to whether a model
@@ -239,12 +249,11 @@ <h5 id="benchmarking">Non-Python Evaluation</h5>
 
 
                             <p><strong>Java + Javascript:</strong> Despite function calling formats being the same across most
-                            programming languages, each programming language has language specific types. For example, C
-                            has  <code style="background-color: #eee; padding: 2px 4px; border-radius: 4px;">pointer</code> type, Java has <code style="background-color: #eee; padding: 2px 4px; border-radius: 4px;">HashMap</code> type. The goal of this test category is to understand how
-                            well the function calling model can be extended to not just JSON and Python type but all the
+                            programming languages, each programming language has language specific types. For example, Java has <code style="background-color: #eee; padding: 2px 4px; border-radius: 4px;">HashMap</code> type. The goal of this test category is to understand how
+                            well the function calling model can be extended to not just Python type but all the
                             language specific typings. We included 100 examples for Java AST evaluation and 70 examples for Javascript AST evaluation. 
                         </p>
-                        <p>The three categories outlined above provide insight into the performance of different models across popular API call scenarios, offering valuable perspectives on the potential of function-calling models.
+                        <p>The categories outlined above provide insight into the performance of different models across popular API call scenarios, offering valuable perspectives on the potential of function-calling models.
 
                         </p>
                         </div>
@@ -406,6 +415,12 @@ <h5>Abstract Syntax Tree (AST) Evaluation 🌳</h5>
                                     <li style="margin-bottom: 5px;">Possible Anything <code style="background-color: #eee; padding: 2px 4px; border-radius: 4px;">["Manchester United", "Man United", "Man U", "MUFC"]</code> </li>
                                 </ul>
                             </li>
+                            <li style="margin-bottom: 5px;">Handling Optional Parameters in Function Calls:
+                                <ul>
+                                    <li style="margin-bottom: 5px;">When optional parameters are not provided in the function calls (meaning they should use their default values), we represent their absence with <code style="background-color: #eee; padding: 2px 4px; border-radius: 4px;">""</code>. This notation tells the AST (Abstract Syntax Tree) checker that the function call intentionally omits these parameters.</li>
+                                    <li style="margin-bottom: 5px;">Additionally, we specify the default values as stated in the function's description. This approach allows us to correctly recognize both cases: when an optional parameter is omitted and when it is included with its default value. This ensures that responses are marked as correct in either scenario.</li>
+                                </ul>
+                            </li>
                         </ul>
 
 
@@ -497,7 +512,7 @@ <h2>Model Manual</h2>
                                         <td>
                                             <a href=''>Gorilla OpenFunctions-v2</a>
                                         </td>
-                                        <td>7B</td>
+                                        <td>6.91B</td>
                                         <td>Gorilla LLM</td>
                                         <td>Apache 2.0</td>
                                         <td><a href="https://gorilla.cs.berkeley.edu/">&#10003;</a></td>