diff --git a/README_CN.md b/README_CN.md index 9d61424..009f324 100644 --- a/README_CN.md +++ b/README_CN.md @@ -28,12 +28,30 @@ Rubra 增强了当前最流行的一系列开放权重大模型(LLM)的工 在我们的 [Huggingface Spaces](https://huggingface.co/spaces/sanjay920/rubra-v0.1-dev) 上可以免费试用以上的大模型,不需要登录! -## 本地部署运行 Rubra 模型 +## 在本地部署运行 Rubra 模型 -我们扩展了以下部署工具,以在 OpenAI 风格的API格式下本地运行 Rubra 模型: +查看我们的[文档](https://docs.rubra.ai/category/serving--inferencing)以了解如何在本地运行 Rubra 模型。 +我们扩展了以下部署工具,支持OpenAI的工具调用格式,在本地运行Rubra模型: + +- [llama.cpp](https://github.com/rubra-ai/tools.cpp) +- [vLLM](https://github.com/rubra-ai/vllm) + +**注意**: Llama3 模型,包括8B和70B的gguf版本,在量化(quantization)后会出现perplexity增加和函数调用性能下降的问题。我们建议使用 vLLM 或 fp16或更高(bf16, fp32) 量化来部署运行它们。 + +## 基准测试 + +查看 Rubra 模型及其他模型的完整基准测试结果: https://docs.rubra.ai/benchmark + +| 模型 | 函数调用 | MMLU (5-shot) | GPQA (0-shot) | GSM-8K (8-shot, CoT) | MATH (4-shot, CoT) | MT-bench | +|-----------------------------------------------------------|------------------|---------------|---------------|----------------------|--------------------|----------| +| [**Rubra Llama-3 70B Instruct**](https://huggingface.co/rubra-ai/Meta-Llama-3-70B-Instruct) | 97.85% | 75.90 | 33.93 | 82.26 | 34.24 | 8.36 | +| [**Rubra Llama-3 8B Instruct**](https://huggingface.co/rubra-ai/Meta-Llama-3-8B-Instruct) | 89.28% | 64.39 | 31.70 | 68.99 | 23.76 | 8.03 | +| [**Rubra Qwen2 7B Instruct**](https://huggingface.co/rubra-ai/Qwen2-7B-Instruct) | 85.71% | 68.88 | 30.36 | 75.82 | 28.72 | 8.08 | +| [**Rubra Mistral 7B Instruct v0.3**](https://huggingface.co/rubra-ai/Mistral-7B-Instruct-v0.3) | 73.57% | 59.12 | 29.91 | 43.29 | 11.14 | 7.69 | +| [**Rubra Phi-3 Mini 128k Instruct**](https://huggingface.co/rubra-ai/Phi-3-mini-128k-instruct) | 70.00% | 67.87 | 29.69 | 79.45 | 30.80 | 8.21 | +| [**Rubra Mistral 7B Instruct v0.2**](https://huggingface.co/rubra-ai/Mistral-7B-Instruct-v0.2) | 69.28% | 58.90 | 29.91 | 34.12 | 8.36 | 7.36 | +| [**Rubra Gemma-1.1 2B Instruct**](https://huggingface.co/rubra-ai/gemma-1.1-2b-it) | 45.00% | 38.85 | 24.55 | 6.14 | 2.38 | 5.75 | -- [llama.cpp](https://github.com/ggerganov/llama.cpp) -- [vllm](https://github.com/vllm-project/vllm) ## 贡献 diff --git a/docs/docs/inference/llamacpp.mdx b/docs/docs/inference/llamacpp.mdx index 9f222f5..36c0c7b 100644 --- a/docs/docs/inference/llamacpp.mdx +++ b/docs/docs/inference/llamacpp.mdx @@ -137,11 +137,7 @@ The output should look like this: ChatCompletion(id='chatcmpl-EmHd8kai4DVwBUOyim054GmfcyUbjiLf', choices=[Choice(finish_reason='tool_calls', index=0, logprobs=None, message=ChatCompletionMessage(content=None, role='assistant', function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='e885974b', function=Function(arguments='{"location":"Boston"}', name='get_current_weather'), type='function')]))], created=1719528056, model='rubra-model', object='chat.completion', system_fingerprint=None, usage=CompletionUsage(completion_tokens=29, prompt_tokens=241, total_tokens=270)) ``` -That's it! For more function calling examples, you can check out the [test_llamacpp.ipynb](https://github.com/rubra-ai/tools.cpp/blob/010f4d282e86babe216af6e037ab10bf078415e7/test_llamacpp.ipynb) notebook. - -:::info -Make sure you turn `stream` off when making API calls to the server, as the streaming feature is not supported yet. We will support streaming soon. -::: +That's it! For more function calling examples, you can check out the [test_llamacpp.ipynb](https://github.com/rubra-ai/tools.cpp/blob/010f4d282e86babe216af6e037ab10bf078415e7/test_llamacpp.ipynb) or [test_llamacpp_streaming.ipynb](https://github.com/rubra-ai/tools.cpp/blob/master/test_llamacpp_streaming.ipynb) notebook. ## Choosing a Chat Template for Different Models diff --git a/docs/docs/inference/vllm.md b/docs/docs/inference/vllm.md index 5563673..883bc7b 100644 --- a/docs/docs/inference/vllm.md +++ b/docs/docs/inference/vllm.md @@ -105,7 +105,3 @@ The output should look like this: ``` ChatCompletion(id='chatcmpl-EmHd8kai4DVwBUOyim054GmfcyUbjiLf', choices=[Choice(finish_reason='tool_calls', index=0, logprobs=None, message=ChatCompletionMessage(content=None, role='assistant', function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='e885974b', function=Function(arguments='{"location":"Boston"}', name='get_current_weather'), type='function')]))], created=1719528056, model='rubra-model', object='chat.completion', system_fingerprint=None, usage=CompletionUsage(completion_tokens=29, prompt_tokens=241, total_tokens=270)) ``` - -:::info -Make sure you turn `stream` off when making API calls to the server, as the streaming feature is not supported yet. We will support streaming soon. -::: diff --git a/tools.cpp b/tools.cpp index 89971d6..de003e5 160000 --- a/tools.cpp +++ b/tools.cpp @@ -1 +1 @@ -Subproject commit 89971d6b96ae5e9f3c16ebe839d6d85d8189846f +Subproject commit de003e509611896e63a8ad70e09dcdf2de5774e0 diff --git a/vllm b/vllm index e80374c..b2039be 160000 --- a/vllm +++ b/vllm @@ -1 +1 @@ -Subproject commit e80374c247347614772cd92d3e52230d2b91c12b +Subproject commit b2039bedf3fa6ba8a27109769934ce6142b0f9c0