Skip to content

Commit

Permalink
[Bug Fix] data composition, optional parameter handling (#255)
Browse files Browse the repository at this point in the history
Blog 8
- Fix data composition numbers
- Included optional parameter handling descriptions
- parameter 6.91B fixed
- add quick links

---------
Co-authored-by: Huanzhi Mao <[email protected]>
  • Loading branch information
CharlieJCJ authored Mar 12, 2024
1 parent 69e10f3 commit 4feb55e
Show file tree
Hide file tree
Showing 3 changed files with 28 additions and 14 deletions.
Binary file modified assets/img/blog_post_8_data_composition.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
9 changes: 4 additions & 5 deletions blogs/7_open_functions_v2.html
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,6 @@ <h4 class="text-center" style="margin: 0px;">
<p>
With the latest iteration of Gorilla OpenFunctions-v2, we are delighted to mark significant advancements in function calling for LLMs within the open-source community. As a direct substitute for its predecessor, Gorilla OpenFunctions-v2 retains its open-source ethos while introducing exciting enhancements. These include support for multiple programming languages such as Python, Java, JavaScript, and REST API - the first among both open-source and closed-source models, alongside the ability to handle multiple and parallel function calls, and the ability to determine function relevance. This update cements Gorilla OpenFunctions-v2's position at the forefront of function calling capabilities among LLMs. Moreover, the drop-in replacement allows for seamless integration of OpenFunctions into a diverse range of applications, from social media platforms like Instagram to delivery services like DoorDash, as well as utility tools including Google Calendar and Stripe.
</p>

</div>

<h4 id="whats_new"> See What's New!! &#128640 </h4>
Expand Down Expand Up @@ -138,7 +137,7 @@ <h4 id="whats_new"> See What's New!! &#128640 </h4>
web-demo</a></li>
<li style="margin-bottom: 5px;">Check out the project: <a
href="https://github.com/ShishirPatil/gorilla/tree/main/openfunctions">GitHub</a></li>
<li style="margin-bottom: 5px;">Model (7B) on HuggingFace: <a
<li style="margin-bottom: 5px;">Model (6.91B) on HuggingFace 🤗: <a
href="https://huggingface.co/gorilla-llm/gorilla-openfunctions-v2">gorilla-llm/gorilla-openfunctions-v2</a>
</li>
</ul>
Expand Down Expand Up @@ -346,8 +345,8 @@ <h4 id="benchmarking">Performance on Berkeley Function-Calling Leaderboard &#128
<h4 id="data_composition"> OpenFunctions Data Composition & Training &#127829</h4>
<div class="body">
<p>
Gorilla OpenFunctions-v2 is a 7B parameter model trained further upon on the
Deepseek-Coder-7B-Instruct-v1.5 model. To trian the model, we collect in total of <b>65,283
Gorilla OpenFunctions-v2 is a 6.91B parameter model trained further upon on the
Deepseek-Coder-7B-Instruct-v1.5 6.91B model. To trian the model, we collect in total of <b>65,283
question-function-answer pairs</b> from three different sources: Python packages (19,353), Java
repositories (16,586), Javascript Repositories (4,245), public-API (6,009), and Command Line
Tools (19,090) from various cloud providers. The data composition is shown in the figure below.
Expand Down Expand Up @@ -413,7 +412,7 @@ <h4 id="data_composition"> OpenFunctions Data Composition & Training &#127829</h
<h4 id="conclusion">Conclusion</h4>

<div class="body">
<p>We are happy to release <code style="background-color: #eee; padding: 2px 4px; border-radius: 4px;">gorilla-openfunctions-v2</code>, a 7B parameter model trained on
<p>We are happy to release <code style="background-color: #eee; padding: 2px 4px; border-radius: 4px;">gorilla-openfunctions-v2</code>, a 6.91B parameter model trained on
top of the Deepseek-Coder-7B-Instruct-v1.5 LLM.
It takes-in the users prompt along with multiple API calls and returns the functions with the
right arguments. With OpenFunctions we extended native
Expand Down
33 changes: 24 additions & 9 deletions blogs/8_berkeley_function_calling_leaderboard.html
Original file line number Diff line number Diff line change
Expand Up @@ -111,7 +111,17 @@ <h4 class="text-center" style="margin: 0px;">
function calls with multiple programming languages and parallel and multiple function calls. We
also provide a specific debugging feature that when the provided function is not suitable for
your task, the model will output an “Error Message”.


</p>
<p>
Quick Links:
<ul>
<li>Live Leaderboard: <a href="https://gorilla.cs.berkeley.edu/leaderboard.html">Website</a></li>
<li>BFCL Evaluation Dataset: <a href="https://huggingface.co/datasets/gorilla-llm/Berkeley-Function-Calling-Leaderboard"> HuggingFace Dataset 🤗</a></li>
<li>Gradio Demo: <a href="https://huggingface.co/spaces/gorilla-llm/berkeley-function-calling-leaderboard"> HuggingFace Space 🤗 </a></li>
<li>Reproducibility: <a href="https://github.com/ShishirPatil/gorilla/tree/main/berkeley-function-call-leaderboard">Github Code</a></li>
<li>OpenFunctions-v2 (6.91B) on HuggingFace 🤗: <a href="https://huggingface.co/gorilla-llm/gorilla-openfunctions-v2">gorilla-llm/gorilla-openfunctions-v2</a></li>
</ul>
</p>
</div>

Expand Down Expand Up @@ -180,7 +190,7 @@ <h4 id="categories">Evaluation Categories 📊</h4>
<ul>
<li style="margin-bottom: 5px;"><b>Python</b>: Simple Function, Multiple Function, Parallel Function, Parallel Multiple
Function</li>
<li style="margin-bottom: 5px;"><b>Non-Python</b>: Function Relevance Detection, REST API, SQL, Java, Javascript</li>
<li style="margin-bottom: 5px;"><b>Non-Python</b>: Chatting Capability, Function Relevance Detection, REST API, SQL, Java, Javascript</li>
</ul>
</div>
<h5 id="benchmarking">Python Evaluation</h5>
Expand All @@ -207,9 +217,9 @@ <h5 id="benchmarking">Python Evaluation</h5>
<h5 id="benchmarking">Non-Python Evaluation</h5>
<div class="body">
<p>While the previous categories consist of the majority of our evaluations, we include other
specific categories, namely REST API, SQL, Java, and JavaScript, to evaluate model performance on diverse scenarios and support of multiple programming languages, and are resilient
specific categories, namely Chatting Capability, Function Relevance Detection, REST API, SQL, Java, and JavaScript, to evaluate model performance on diverse scenarios and support of multiple programming languages, and are resilient
to irrelevant questions and function documentations.</p>

<p><strong>Chatting Capability:</strong> In Chatting Capability, we design scenarios where no functions are passed in, and the users ask generic questions - this is similar to using the model as a general-purpose chatbot. We evaluate if the model is able to output chat messages and recognize that it does not need to invoke any functions. Note the difference with “Relevance” where the model is expected to also evaluate if any of the function input are relevant or not. We include this category for internal model evaluation and exclude the statistics from live leaderboard. We currently are working on better evaluation on chatability and ensure the chat is relevant and coherent with users' request and open to suggestions and feedback from the community. </p>
<p><strong>Function Relevance Detection:</strong> In function relevance detection, we design scenarios
where none of the provided functions are relevant and supposed to be invoked. We expect the
model's output to be no function call. This scenario provides insight to whether a model
Expand Down Expand Up @@ -239,12 +249,11 @@ <h5 id="benchmarking">Non-Python Evaluation</h5>


<p><strong>Java + Javascript:</strong> Despite function calling formats being the same across most
programming languages, each programming language has language specific types. For example, C
has <code style="background-color: #eee; padding: 2px 4px; border-radius: 4px;">pointer</code> type, Java has <code style="background-color: #eee; padding: 2px 4px; border-radius: 4px;">HashMap</code> type. The goal of this test category is to understand how
well the function calling model can be extended to not just JSON and Python type but all the
programming languages, each programming language has language specific types. For example, Java has <code style="background-color: #eee; padding: 2px 4px; border-radius: 4px;">HashMap</code> type. The goal of this test category is to understand how
well the function calling model can be extended to not just Python type but all the
language specific typings. We included 100 examples for Java AST evaluation and 70 examples for Javascript AST evaluation.
</p>
<p>The three categories outlined above provide insight into the performance of different models across popular API call scenarios, offering valuable perspectives on the potential of function-calling models.
<p>The categories outlined above provide insight into the performance of different models across popular API call scenarios, offering valuable perspectives on the potential of function-calling models.

</p>
</div>
Expand Down Expand Up @@ -406,6 +415,12 @@ <h5>Abstract Syntax Tree (AST) Evaluation 🌳</h5>
<li style="margin-bottom: 5px;">Possible Anything <code style="background-color: #eee; padding: 2px 4px; border-radius: 4px;">["Manchester United", "Man United", "Man U", "MUFC"]</code> </li>
</ul>
</li>
<li style="margin-bottom: 5px;">Handling Optional Parameters in Function Calls:
<ul>
<li style="margin-bottom: 5px;">When optional parameters are not provided in the function calls (meaning they should use their default values), we represent their absence with <code style="background-color: #eee; padding: 2px 4px; border-radius: 4px;">""</code>. This notation tells the AST (Abstract Syntax Tree) checker that the function call intentionally omits these parameters.</li>
<li style="margin-bottom: 5px;">Additionally, we specify the default values as stated in the function's description. This approach allows us to correctly recognize both cases: when an optional parameter is omitted and when it is included with its default value. This ensures that responses are marked as correct in either scenario.</li>
</ul>
</li>
</ul>


Expand Down Expand Up @@ -497,7 +512,7 @@ <h2>Model Manual</h2>
<td>
<a href=''>Gorilla OpenFunctions-v2</a>
</td>
<td>7B</td>
<td>6.91B</td>
<td>Gorilla LLM</td>
<td>Apache 2.0</td>
<td><a href="https://gorilla.cs.berkeley.edu/">&#10003;</a></td>
Expand Down

0 comments on commit 4feb55e

Please sign in to comment.