From 484de606c34cd1a1f47b06897d1dd92ae7fd4fe6 Mon Sep 17 00:00:00 2001 From: alabulei1 Date: Mon, 19 May 2025 18:19:23 +0800 Subject: [PATCH 1/6] Create qwen2-5.md --- docs/user-guide/multimodal/qwen2-5.md | 147 ++++++++++++++++++++++++++ 1 file changed, 147 insertions(+) create mode 100644 docs/user-guide/multimodal/qwen2-5.md diff --git a/docs/user-guide/multimodal/qwen2-5.md b/docs/user-guide/multimodal/qwen2-5.md new file mode 100644 index 0000000..47a1591 --- /dev/null +++ b/docs/user-guide/multimodal/qwen2-5.md @@ -0,0 +1,147 @@ +--- +sidebar_position: 2 +--- + +# Quick start with the Qwen 2.5 VL model + +[Qwen 2.5 VL](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) is the latest vision-language model from the Qwen series, designed to handle a wide range of complex multimodal tasks. It excels at understanding visual content such as text, charts, and layouts, and can act as an intelligent agent capable of interacting with tools and devices. + + +### Step 1: Install WasmEdge + +First off, you'll need WasmEdge, a high-performance, lightweight, and cross-platform LLM runtime. + +``` +curl -sSf https://raw.githubusercontent.com/WasmEdge/WasmEdge/master/utils/install_v2.sh | bash -s +``` + + +### Step 2: Download the LLM model + +Next, you'll need to obtain model files: the **Qwen2.5-VL-7B-Instruct model** and the **mmproj model**. + +``` +curl -LO https://huggingface.co/second-state/Qwen2.5-VL-7B-Instruct-GGUF/resolve/main/Qwen2.5-VL-7B-Instruct-Q5_K_M.gguf +curl -LO https://huggingface.co/second-state/Qwen2.5-VL-7B-Instruct-GGUF/resolve/main/Qwen2.5-VL-7B-Instruct-vision.gguf +``` + +### Step 3: Download a portable API server app + +Next, you need an application that can build and OpenAI-compatible API server for the Qwen 2.5 models +The [LlamaEdge api server app](https://github.com/LlamaEdge/LlamaEdge/tree/main/llama-api-server) is a lightweight and cross-platform Wasm app that works on any device +you might have. Just download the compiled binary app. + +``` +curl -LO https://github.com/second-state/LlamaEdge/releases/latest/download/llama-api-server.wasm +``` + +> The version of the `llama-api-server.wasm` should be v0.18.5 and above. + +### Step 4: Chat with the chatbot UI + +The `llama-api-server.wasm` is a web server with an OpenAI-compatible API. You still need HTML files for the chatbot UI. It's optional and you can use `curl` to send an API request. +Download and unzip the HTML UI files as follows. + +``` +curl -LO https://github.com/LlamaEdge/chatbot-ui/releases/latest/download/chatbot-ui.tar.gz +tar xzf chatbot-ui.tar.gz +rm chatbot-ui.tar.gz +``` + +Then, start the web server. + + +``` +wasmedge --dir .:. \ + --nn-preload default:GGML:AUTO:Qwen2.5-VL-7B-Instruct-Q5_K_M.gguf \ + llama-api-server.wasm \ + --model-name Qwen2.5-VL-7B-Instruct \ + --prompt-template qwen2-vision \ + --llava-mmproj Qwen2.5-VL-7B-Instruct-vision.gguf \ + --ctx-size 4096 +``` + +> The above command line can work on a Macbook with 16GB memory. + +Upon successful execution, you should see output similar to the following: + +``` +[2025-05-18 11:23:09.970] [info] llama_api_server in llama-api-server/src/main.rs:202: LOG LEVEL: info +[2025-05-18 11:23:09.973] [info] llama_api_server in llama-api-server/src/main.rs:205: SERVER VERSION: 0.18.5 +[2025-05-18 11:23:09.976] [info] llama_api_server in llama-api-server/src/main.rs:544: model_name: Qwen2.5-VL-7B-Instruct + +... + +[2025-05-18 11:23:10.531] [info] llama_api_server in llama-api-server/src/main.rs:917: plugin_ggml_version: b5361 (commit cf0a43bb) +[2025-05-18 11:23:10.533] [info] llama_api_server in llama-api-server/src/main.rs:952: Listening on 0.0.0.0:8080 +``` + +Then, go to `http://localhost:8080` on your computer to access the chatbot UI on a web page! You can upload an imange and chat with the model based on the image. + +### Step 5: Send an API request + +You can send an API request to call the model, which is more universal. The following command demonstrates how to send a CURL request to llama-api-server. The request includes a base64-encoded string of an image in the `image_url` field. For demonstration purposes, only a portion of the base64 string is shown here. In practice, you should use the complete base64 string. The full base64 string used in the following request can be found in [image_b64.txt](../assets/image_b64.txt). + +> [!TIP] +> [base64.guru](https://base64.guru/converter/encode/image/jpg) provides a tool for encoding JPG to Base64. +> The Qwen 2.5 VL model supoort system prompt. So you can add a system prompt to guide the model behavior. + +```bash +curl --location 'http://localhost:8080/v1/chat/completions' \ +--header 'Content-Type: application/json' \ +--data '{ + "messages": [ + { + "role": "system", + "content": "You are a helpful assistant that accurately describes the content of images provided by the user." + }, + { + "content": [ + { + "type": "text", + "text": "Describe the picture" + }, + { + "type": "image_url", + "image_url": { + "url": "/9j/4AAQSkZJRgABAQAASABIAAD ... knr+Vb+lWR8oTTNwfujOc/hSuhuSsf//Z" + } + } + ], + "role": "user" + } + ], + "model": "Qwen2.5-VL-7B-Instruct" +}' + +``` + +If the request is processed successfully, you will receive a response similar to the following: + +```bash +{ + "id": "chatcmpl-4367085d-6451-4896-bbd8-a5090604394d", + "object": "chat.completion", + "created": 1747369554, + "model": "Qwen2-VL-2B-Instruct", + "choices": [ + { + "index": 0, + "message": { + "content": "mixed berries in a paper bowl", + "role": "assistant" + }, + "finish_reason": "stop", + "logprobs": null + } + ], + "usage": { + "prompt_tokens": 27, + "completion_tokens": 8, + "total_tokens": 35 + } +} +``` + + +Congratulations! You have now started an multimodal app on your own device. From 5bb85f47163e55ef1052b34848ac5079b7f89452 Mon Sep 17 00:00:00 2001 From: alabulei1 Date: Mon, 19 May 2025 18:24:31 +0800 Subject: [PATCH 2/6] Update get-started-with-llamaedge.md --- docs/user-guide/llm/get-started-with-llamaedge.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/user-guide/llm/get-started-with-llamaedge.md b/docs/user-guide/llm/get-started-with-llamaedge.md index 43dbc72..4d99e33 100644 --- a/docs/user-guide/llm/get-started-with-llamaedge.md +++ b/docs/user-guide/llm/get-started-with-llamaedge.md @@ -28,9 +28,9 @@ curl -LO https://huggingface.co/second-state/Llama-3.2-1B-Instruct-GGUF/resolve/ This command downloads the Llama-3.2-1B-Instruct model from Huggingface, an AI model hosting platform. -### Step 3: Download a portable chatbot app +### Step 3: Download a portable API server app -Next, you need an application that can load the model and provide a UI to interact with the model. +Next, you need an application that can build an OpenAI compatible API server for the model. The [LlamaEdge api server app](https://github.com/LlamaEdge/LlamaEdge/tree/main/llama-api-server) is a lightweight and cross-platform Wasm app that works on any device you might have. Just download the compiled binary app. From 82cfc5cdb756644617ca39eabdaec0dc67ba5c5f Mon Sep 17 00:00:00 2001 From: alabulei1 Date: Mon, 19 May 2025 19:05:32 +0800 Subject: [PATCH 3/6] Create gemma-3.md --- docs/user-guide/multimodal/gemma-3.md | 146 ++++++++++++++++++++++++++ 1 file changed, 146 insertions(+) create mode 100644 docs/user-guide/multimodal/gemma-3.md diff --git a/docs/user-guide/multimodal/gemma-3.md b/docs/user-guide/multimodal/gemma-3.md new file mode 100644 index 0000000..34a16a1 --- /dev/null +++ b/docs/user-guide/multimodal/gemma-3.md @@ -0,0 +1,146 @@ +--- +sidebar_position: 3 +--- + +# Quick start with the Gemma-3 model + +[Gemma 3](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) introduces powerful vision-language capabilities across its 4B, 12B, and 27B models through a custom SigLIP vision encoder, enabling rich interpretation of visual input. It processes fixed-size 896x896 images using a “Pan&Scan” algorithm for adaptive cropping and resizing, balancing detail preservation with computational cost + +### Step 1: Install WasmEdge + +First off, you'll need WasmEdge, a high-performance, lightweight, and cross-platform LLM runtime. + +``` +curl -sSf https://raw.githubusercontent.com/WasmEdge/WasmEdge/master/utils/install_v2.sh | bash -s +``` + + +### Step 2: Download the LLM model + +Next, you'll need to obtain model files: the **Gemma-3 model** and the **mmproj model**. + +> The Gemma-3-1B model is a small-scale language-only model and does not support vision-language capabilities + +``` +curl -LO https://huggingface.co/second-state/gemma-3-4b-it-GGUF/resolve/main/gemma-3-4b-it-Q5_K_M.gguf +curl -LO https://huggingface.co/second-state/gemma-3-4b-it-GGUF/resolve/main/gemma-3-4b-it-mmproj-f16.gguf +``` + +### Step 3: Download a portable API server app + +Next, you need an application that can build and OpenAI-compatible API server for the Gemma-3 models +The [LlamaEdge api server app](https://github.com/LlamaEdge/LlamaEdge/tree/main/llama-api-server) is a lightweight and cross-platform Wasm app that works on any device +you might have. Just download the compiled binary app. + +``` +curl -LO https://github.com/second-state/LlamaEdge/releases/latest/download/llama-api-server.wasm +``` + +> The version of the `llama-api-server.wasm` should be v0.18.5 and above. + +### Step 4: Chat with the chatbot UI + +The `llama-api-server.wasm` is a web server with an OpenAI-compatible API. You still need HTML files for the chatbot UI. It's optional and you can use `curl` to send an API request. +Download and unzip the HTML UI files as follows. + +``` +curl -LO https://github.com/LlamaEdge/chatbot-ui/releases/latest/download/chatbot-ui.tar.gz +tar xzf chatbot-ui.tar.gz +rm chatbot-ui.tar.gz +``` + +Then, start the web server. + + +``` +wasmedge --dir .:. --nn-preload default:GGML:AUTO:gemma-3-4b-it-Q5_K_M.gguf \ + llama-api-server.wasm \ + --prompt-template gemma-instruct \ + --llava-mmproj gemma-3-4b-it-mmproj-f16.gguf \ + --ctx-size 4096 \ + --model-name gemma-3-4b +``` + +> The above command line works on a Macbook with 16 GB memory. + +Upon successful execution, you should see output similar to the following: + +``` +[2025-05-18 11:23:09.970] [info] llama_api_server in llama-api-server/src/main.rs:202: LOG LEVEL: info +[2025-05-18 11:23:09.973] [info] llama_api_server in llama-api-server/src/main.rs:205: SERVER VERSION: 0.18.5 +[2025-05-18 11:23:09.976] [info] llama_api_server in llama-api-server/src/main.rs:544: model_name: Qwen2.5-VL-7B-Instruct + +... + +[2025-05-18 11:23:10.531] [info] llama_api_server in llama-api-server/src/main.rs:917: plugin_ggml_version: b5361 (commit cf0a43bb) +[2025-05-18 11:23:10.533] [info] llama_api_server in llama-api-server/src/main.rs:952: Listening on 0.0.0.0:8080 +``` + +Then, go to `http://localhost:8080` on your computer to access the chatbot UI on a web page! You can upload an imange and chat with the model based on the image. + +### Step 5: Send an API request + +You can send an API request to call the model, which is more universal. The following command demonstrates how to send a CURL request to llama-api-server. The request includes a base64-encoded string of an image in the `image_url` field. For demonstration purposes, only a portion of the base64 string is shown here. In practice, you should use the complete base64 string. The full base64 string used in the following request can be found in [image_b64.txt](../assets/image_b64.txt). + +> [!TIP] +> [base64.guru](https://base64.guru/converter/encode/image/jpg) provides a tool for encoding JPG to Base64. + + +```bash +curl --location 'http://localhost:8080/v1/chat/completions' \ +--header 'Content-Type: application/json' \ +--data '{ + "messages": [ + { + "role": "system", + "content": "You are a helpful assistant that accurately describes the content of images provided by the user." + }, + { + "content": [ + { + "type": "text", + "text": "Describe the picture" + }, + { + "type": "image_url", + "image_url": { + "url": "/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAgFBgcGBQg......X/VaTer/ALzOU/Lg1XMiLMuR3EWMb77/AHsD/DNTIhXPnmvLmwj" + } + } + ], + "role": "user" + } + ], + "model": "gemma-3-4b" +}' +``` + +If the request is processed successfully, you will receive a response similar to the following: + +```bash +{ + "id": "chatcmpl-e5f777db-c913-45ab-b37f-e2c499c8fa0b", + "object": "chat.completion", + "created": 1747652210, + "model": "gemma-3-4b", + "choices": [ + { + "index": 0, + "message": { + "content": "mixed berries in a paper bowl", + "role": "assistant" + }, + "finish_reason": "stop", + "logprobs": null + } + ], + "usage": { + "prompt_tokens": 27, + "completion_tokens": 8, + "total_tokens": 35 + } +} +``` + + +Congratulations! You have now started an multimodal app on your own device. From e8f73afad95f6447639d5a669e8aaf3bae810fff Mon Sep 17 00:00:00 2001 From: alabulei1 Date: Mon, 19 May 2025 19:21:41 +0800 Subject: [PATCH 4/6] Update gemma-3.md --- docs/user-guide/multimodal/gemma-3.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/user-guide/multimodal/gemma-3.md b/docs/user-guide/multimodal/gemma-3.md index 34a16a1..57694ee 100644 --- a/docs/user-guide/multimodal/gemma-3.md +++ b/docs/user-guide/multimodal/gemma-3.md @@ -80,7 +80,7 @@ Then, go to `http://localhost:8080` on your computer to access the chatbot UI on ### Step 5: Send an API request -You can send an API request to call the model, which is more universal. The following command demonstrates how to send a CURL request to llama-api-server. The request includes a base64-encoded string of an image in the `image_url` field. For demonstration purposes, only a portion of the base64 string is shown here. In practice, you should use the complete base64 string. The full base64 string used in the following request can be found in [image_b64.txt](../assets/image_b64.txt). +You can send an API request to call the model, which is more universal. The following command demonstrates how to send a CURL request to llama-api-server. The request includes a base64-encoded string of an image in the `image_url` field. For demonstration purposes, only a portion of the base64 string is shown here. In practice, you should use the complete base64 string. > [!TIP] > [base64.guru](https://base64.guru/converter/encode/image/jpg) provides a tool for encoding JPG to Base64. From ad57e0fed2165ea02880b63a90f8defcb2109881 Mon Sep 17 00:00:00 2001 From: alabulei1 Date: Mon, 19 May 2025 19:22:03 +0800 Subject: [PATCH 5/6] Update qwen2-5.md --- docs/user-guide/multimodal/qwen2-5.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/user-guide/multimodal/qwen2-5.md b/docs/user-guide/multimodal/qwen2-5.md index 47a1591..786a571 100644 --- a/docs/user-guide/multimodal/qwen2-5.md +++ b/docs/user-guide/multimodal/qwen2-5.md @@ -80,7 +80,7 @@ Then, go to `http://localhost:8080` on your computer to access the chatbot UI on ### Step 5: Send an API request -You can send an API request to call the model, which is more universal. The following command demonstrates how to send a CURL request to llama-api-server. The request includes a base64-encoded string of an image in the `image_url` field. For demonstration purposes, only a portion of the base64 string is shown here. In practice, you should use the complete base64 string. The full base64 string used in the following request can be found in [image_b64.txt](../assets/image_b64.txt). +You can send an API request to call the model, which is more universal. The following command demonstrates how to send a CURL request to llama-api-server. The request includes a base64-encoded string of an image in the `image_url` field. For demonstration purposes, only a portion of the base64 string is shown here. In practice, you should use the complete base64 string. > [!TIP] > [base64.guru](https://base64.guru/converter/encode/image/jpg) provides a tool for encoding JPG to Base64. From 0680f8262e23b7e8db37dd9937a94813b99b81b2 Mon Sep 17 00:00:00 2001 From: alabulei1 Date: Mon, 19 May 2025 22:25:17 +0800 Subject: [PATCH 6/6] Update gemma-3.md --- docs/user-guide/multimodal/gemma-3.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/user-guide/multimodal/gemma-3.md b/docs/user-guide/multimodal/gemma-3.md index 57694ee..c5e2d4d 100644 --- a/docs/user-guide/multimodal/gemma-3.md +++ b/docs/user-guide/multimodal/gemma-3.md @@ -55,7 +55,7 @@ Then, start the web server. ``` wasmedge --dir .:. --nn-preload default:GGML:AUTO:gemma-3-4b-it-Q5_K_M.gguf \ llama-api-server.wasm \ - --prompt-template gemma-instruct \ + --prompt-template gemma-3 \ --llava-mmproj gemma-3-4b-it-mmproj-f16.gguf \ --ctx-size 4096 \ --model-name gemma-3-4b