Skip to content

Server: add MiniCPM chat template #6276

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

ngxson
Copy link
Collaborator

@ngxson ngxson commented Mar 24, 2024

Closes #6236

The GGUF version is taken from: https://huggingface.co/s3nh/MiniCPM-2B-dpo-fp32-GGUF

Input request:

{
    "messages": [
        {"role": "user", "content": "山东省最高的山是哪座山, 它比黄山高还是矮?差距多少?"}
    ],
    "stream": true
}

Formatted:

{"function":"format_chat","level":"VERB","line":143,"msg":"formatted_chat","text":"<用户>山东省最高的山是哪座山, 它比黄山高还是矮?差距多少?<AI>","tid":"139706014631488","timestamp":1711280373}

However, the model always output garbage: 是 reveals创伤tabulartabulartabular

@ngxson ngxson mentioned this pull request Mar 24, 2024
4 tasks
@EZForever
Copy link
Contributor

EZForever commented Apr 11, 2024

Greetings,

I could not reproduce this garbage-outputting problem. I'm using minicpm-2b-dpo-bf16.Q8_0.gguf from https://huggingface.co/vbuhoijymzoi/MiniCPM-2B-dpo-bf16-GGUF, built your fork and tried the OpenAI API, the result looks promising:

$ curl -H 'Content-Type: application/json' -d '{"messages":[{"role":"user","content":"山东省最高的山是哪座山, 它比黄山高还是矮?差距多少?"}]}' http://localhost:8888/v1/chat/completions
{"choices":[{"finish_reason":"stop","index":0,"message":{"content":" 山东省最高的山是泰山,海拔1545米。黄山位于中国安徽省,海拔1864米。因此,泰山比黄山矮,差距为319米。","role":"assistant"}}],"created":1712829611,"id":"chatcmpl-UgJlho1JvOO7xOsynTcOsyN9K1K2lZkb","model":"unknown","object":"chat.completion","usage":{"completion_tokens":40,"prompt_tokens":25,"total_tokens":65}}

However on Windows, if the request is not in UTF-8 (e.g. curl -d @prompt.txt ... where prompt.txt contains the request above in GBK encoding), the server crashes:

C:\>curl -H "Content-Type: application/json" -d @prompt.txt http://localhost:8888/v1/chat/completions
curl: (56) Recv failure: Connection was reset

This may or may not be another instance of #6396, or just that httplib does not support encodings other than UTF-8 (I haven't dig into it yet). (EDIT: It is, sort of; see #6396 (comment)) Either way, it's not the template's problem; maybe check your test environment and request encoding?

@mofosyne mofosyne added Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level enhancement New feature or request labels May 10, 2024
Copy link
Contributor

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3 for phi-2-q4_0: 549 iterations 🚀

Expand details for performance related PR only
  • Concurrent users: 8, duration: 10m
  • HTTP request : avg=8504.31ms p(95)=20119.29ms fails=, finish reason: stop=486 truncated=63
  • Prompt processing (pp): avg=98.6tk/s p(95)=448.93tk/s
  • Token generation (tg): avg=34.29tk/s p(95)=47.33tk/s
  • ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=xsn/minicpm-template commit=11dbcf02ae78d0219c3972e70c3fa6480eac10a1

prompt_tokens_seconds

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 549 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1715411288 --> 1715411920
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 632.75, 632.75, 632.75, 632.75, 632.75, 627.0, 627.0, 627.0, 627.0, 627.0, 654.6, 654.6, 654.6, 654.6, 654.6, 717.82, 717.82, 717.82, 717.82, 717.82, 736.67, 736.67, 736.67, 736.67, 736.67, 738.49, 738.49, 738.49, 738.49, 738.49, 758.3, 758.3, 758.3, 758.3, 758.3, 773.48, 773.48, 773.48, 773.48, 773.48, 792.71, 792.71, 792.71, 792.71, 792.71, 795.3, 795.3, 795.3, 795.3, 795.3, 824.3, 824.3, 824.3, 824.3, 824.3, 839.27, 839.27, 839.27, 839.27, 839.27, 857.42, 857.42, 857.42, 857.42, 857.42, 802.64, 802.64, 802.64, 802.64, 802.64, 805.77, 805.77, 805.77, 805.77, 805.77, 805.3, 805.3, 805.3, 805.3, 805.3, 827.61, 827.61, 827.61, 827.61, 827.61, 828.21, 828.21, 828.21, 828.21, 828.21, 825.49, 825.49, 825.49, 825.49, 825.49, 833.58, 833.58, 833.58, 833.58, 833.58, 835.72, 835.72, 835.72, 835.72, 835.72, 857.35, 857.35, 857.35, 857.35, 857.35, 856.67, 856.67, 856.67, 856.67, 856.67, 854.81, 854.81, 854.81, 854.81, 854.81, 856.54, 856.54, 856.54, 856.54, 856.54, 813.43, 813.43, 813.43, 813.43, 813.43, 810.72, 810.72, 810.72, 810.72, 810.72, 809.49, 809.49, 809.49, 809.49, 809.49, 809.99, 809.99, 809.99, 809.99, 809.99, 816.41, 816.41, 816.41, 816.41, 816.41, 816.86, 816.86, 816.86, 816.86, 816.86, 815.91, 815.91, 815.91, 815.91, 815.91, 819.02, 819.02, 819.02, 819.02, 819.02, 833.46, 833.46, 833.46, 833.46, 833.46, 840.59, 840.59, 840.59, 840.59, 840.59, 849.89, 849.89, 849.89, 849.89, 849.89, 848.67, 848.67, 848.67, 848.67, 848.67, 848.59, 848.59, 848.59, 848.59, 848.59, 850.57, 850.57, 850.57, 850.57, 850.57, 851.25, 851.25, 851.25, 851.25, 851.25, 850.82, 850.82, 850.82, 850.82, 850.82, 850.37, 850.37, 850.37, 850.37, 850.37, 796.11, 796.11, 796.11, 796.11, 796.11, 795.35, 795.35, 795.35, 795.35, 795.35, 794.42, 794.42, 794.42, 794.42, 794.42, 798.35, 798.35, 798.35, 798.35, 798.35, 801.94, 801.94, 801.94, 801.94, 801.94, 802.53, 802.53, 802.53, 802.53, 802.53, 806.62, 806.62, 806.62, 806.62, 806.62, 806.88, 806.88, 806.88, 806.88, 806.88, 807.55, 807.55, 807.55, 807.55, 807.55, 812.31, 812.31, 812.31, 812.31, 812.31, 811.74, 811.74, 811.74, 811.74, 811.74, 817.2, 817.2, 817.2, 817.2, 817.2, 819.28, 819.28, 819.28, 819.28, 819.28, 819.52, 819.52, 819.52, 819.52, 819.52, 821.5, 821.5, 821.5, 821.5, 821.5, 820.82, 820.82, 820.82, 820.82, 820.82, 823.18, 823.18, 823.18, 823.18, 823.18, 825.71, 825.71, 825.71, 825.71, 825.71, 826.09, 826.09]
                    
Loading
predicted_tokens_seconds
More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 549 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1715411288 --> 1715411920
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 34.77, 34.77, 34.77, 34.77, 34.77, 33.89, 33.89, 33.89, 33.89, 33.89, 30.88, 30.88, 30.88, 30.88, 30.88, 32.39, 32.39, 32.39, 32.39, 32.39, 32.75, 32.75, 32.75, 32.75, 32.75, 34.23, 34.23, 34.23, 34.23, 34.23, 35.16, 35.16, 35.16, 35.16, 35.16, 35.32, 35.32, 35.32, 35.32, 35.32, 35.38, 35.38, 35.38, 35.38, 35.38, 35.44, 35.44, 35.44, 35.44, 35.44, 35.38, 35.38, 35.38, 35.38, 35.38, 34.68, 34.68, 34.68, 34.68, 34.68, 34.56, 34.56, 34.56, 34.56, 34.56, 33.38, 33.38, 33.38, 33.38, 33.38, 33.62, 33.62, 33.62, 33.62, 33.62, 33.63, 33.63, 33.63, 33.63, 33.63, 33.69, 33.69, 33.69, 33.69, 33.69, 32.71, 32.71, 32.71, 32.71, 32.71, 32.31, 32.31, 32.31, 32.31, 32.31, 32.14, 32.14, 32.14, 32.14, 32.14, 32.23, 32.23, 32.23, 32.23, 32.23, 32.14, 32.14, 32.14, 32.14, 32.14, 32.08, 32.08, 32.08, 32.08, 32.08, 32.17, 32.17, 32.17, 32.17, 32.17, 32.2, 32.2, 32.2, 32.2, 32.2, 32.19, 32.19, 32.19, 32.19, 32.19, 31.91, 31.91, 31.91, 31.91, 31.91, 31.38, 31.38, 31.38, 31.38, 31.38, 31.53, 31.53, 31.53, 31.53, 31.53, 31.66, 31.66, 31.66, 31.66, 31.66, 31.73, 31.73, 31.73, 31.73, 31.73, 31.88, 31.88, 31.88, 31.88, 31.88, 31.97, 31.97, 31.97, 31.97, 31.97, 31.84, 31.84, 31.84, 31.84, 31.84, 31.78, 31.78, 31.78, 31.78, 31.78, 31.65, 31.65, 31.65, 31.65, 31.65, 31.42, 31.42, 31.42, 31.42, 31.42, 31.55, 31.55, 31.55, 31.55, 31.55, 31.63, 31.63, 31.63, 31.63, 31.63, 31.79, 31.79, 31.79, 31.79, 31.79, 31.95, 31.95, 31.95, 31.95, 31.95, 31.88, 31.88, 31.88, 31.88, 31.88, 31.44, 31.44, 31.44, 31.44, 31.44, 31.27, 31.27, 31.27, 31.27, 31.27, 30.74, 30.74, 30.74, 30.74, 30.74, 30.26, 30.26, 30.26, 30.26, 30.26, 30.22, 30.22, 30.22, 30.22, 30.22, 30.27, 30.27, 30.27, 30.27, 30.27, 30.41, 30.41, 30.41, 30.41, 30.41, 30.45, 30.45, 30.45, 30.45, 30.45, 30.57, 30.57, 30.57, 30.57, 30.57, 30.55, 30.55, 30.55, 30.55, 30.55, 30.3, 30.3, 30.3, 30.3, 30.3, 30.17, 30.17, 30.17, 30.17, 30.17, 30.2, 30.2, 30.2, 30.2, 30.2, 30.4, 30.4, 30.4, 30.4, 30.4, 30.49, 30.49, 30.49, 30.49, 30.49, 30.54, 30.54, 30.54, 30.54, 30.54, 30.64, 30.64, 30.64, 30.64, 30.64, 30.66, 30.66, 30.66, 30.66, 30.66, 30.7, 30.7]
                    
Loading

Details

kv_cache_usage_ratio

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 549 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1715411288 --> 1715411920
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.36, 0.36, 0.36, 0.36, 0.36, 0.28, 0.28, 0.28, 0.28, 0.28, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.12, 0.12, 0.12, 0.12, 0.12, 0.16, 0.16, 0.16, 0.16, 0.16, 0.13, 0.13, 0.13, 0.13, 0.13, 0.14, 0.14, 0.14, 0.14, 0.14, 0.17, 0.17, 0.17, 0.17, 0.17, 0.15, 0.15, 0.15, 0.15, 0.15, 0.22, 0.22, 0.22, 0.22, 0.22, 0.11, 0.11, 0.11, 0.11, 0.11, 0.38, 0.38, 0.38, 0.38, 0.38, 0.2, 0.2, 0.2, 0.2, 0.2, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.31, 0.31, 0.31, 0.31, 0.31, 0.34, 0.34, 0.34, 0.34, 0.34, 0.11, 0.11, 0.11, 0.11, 0.11, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.23, 0.23, 0.23, 0.23, 0.23, 0.12, 0.12, 0.12, 0.12, 0.12, 0.16, 0.16, 0.16, 0.16, 0.16, 0.19, 0.19, 0.19, 0.19, 0.19, 0.32, 0.32, 0.32, 0.32, 0.32, 0.34, 0.34, 0.34, 0.34, 0.34, 0.27, 0.27, 0.27, 0.27, 0.27, 0.14, 0.14, 0.14, 0.14, 0.14, 0.15, 0.15, 0.15, 0.15, 0.15, 0.11, 0.11, 0.11, 0.11, 0.11, 0.15, 0.15, 0.15, 0.15, 0.15, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.18, 0.18, 0.18, 0.18, 0.18, 0.32, 0.32, 0.32, 0.32, 0.32, 0.11, 0.11, 0.11, 0.11, 0.11, 0.18, 0.18, 0.18, 0.18, 0.18, 0.12, 0.12, 0.12, 0.12, 0.12, 0.11, 0.11, 0.11, 0.11, 0.11, 0.19, 0.19, 0.19, 0.19, 0.19, 0.45, 0.45, 0.45, 0.45, 0.45, 0.49, 0.49, 0.49, 0.49, 0.49, 0.45, 0.45, 0.45, 0.45, 0.45, 0.4, 0.4, 0.4, 0.4, 0.4, 0.15, 0.15, 0.15, 0.15, 0.15, 0.16, 0.16, 0.16, 0.16, 0.16, 0.14, 0.14, 0.14, 0.14, 0.14, 0.16, 0.16, 0.16, 0.16, 0.16, 0.14, 0.14, 0.14, 0.14, 0.14, 0.15, 0.15, 0.15, 0.15, 0.15, 0.3, 0.3, 0.3, 0.3, 0.3, 0.2, 0.2, 0.2, 0.2, 0.2, 0.13, 0.13, 0.13, 0.13, 0.13, 0.14, 0.14, 0.14, 0.14, 0.14, 0.08, 0.08, 0.08, 0.08, 0.08, 0.11, 0.11, 0.11, 0.11, 0.11, 0.14, 0.14, 0.14, 0.14, 0.14, 0.12, 0.12, 0.12, 0.12, 0.12, 0.15, 0.15, 0.15, 0.15, 0.15, 0.22, 0.22]
                    
Loading
requests_processing
More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 549 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1715411288 --> 1715411920
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 2.0, 2.0, 2.0, 2.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 2.0, 2.0, 2.0, 2.0, 2.0, 7.0, 7.0, 7.0, 7.0, 7.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 8.0, 8.0, 8.0, 8.0, 8.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 2.0, 2.0, 2.0, 2.0, 2.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 2.0, 2.0, 2.0, 2.0, 2.0, 3.0, 3.0, 3.0, 3.0, 3.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 2.0, 2.0, 2.0, 2.0, 2.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 2.0, 2.0, 2.0, 2.0, 2.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 3.0, 3.0, 3.0, 3.0, 3.0, 1.0, 1.0, 1.0, 1.0, 1.0, 6.0, 6.0, 6.0, 6.0, 6.0, 2.0, 2.0, 2.0, 2.0, 2.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 1.0, 1.0]
                    
Loading

@mofosyne
Copy link
Collaborator

obsolete?

@mofosyne mofosyne added the obsolete? Marker for potentially obsolete PR label May 11, 2024
@ngxson
Copy link
Collaborator Author

ngxson commented May 11, 2024

@mofosyne sorry I almost forgot this, will get it updated soon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request obsolete? Marker for potentially obsolete PR Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MiniCPM Chat Template
3 participants