Skip to content

Add Multimodal docs #33

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
May 22, 2025
Merged

Add Multimodal docs #33

merged 6 commits into from
May 22, 2025

Conversation

alabulei1
Copy link
Contributor

No description provided.

Copy link
Contributor

juntao commented May 19, 2025

Hello, I am a PR summary agent on flows.network. Here are my reviews of code commits in this PR.


Overall Summary

Potential Issues and Errors

  1. Consistency in Documentation Titles: There are discrepancies in the titles of steps, specifically around the installation of the chatbot app/API server app which may confuse users.
  2. Redundant References: Both qwen2-5.md and gemma-3.md originally referenced a specific file ([image_b64.txt](../assets/image_b64.txt)), which has been removed in subsequent patches but should be confirmed if this is intentional for clarity.
  3. Model-Specific Parameters: The --prompt-template parameter was updated from gemma-instruct to gemma-3. Ensure consistency across all model documents and that this change does not affect other configurations.

Most Important Findings

  1. Comprehensive Multimodal Documentation: The PR introduces detailed setup instructions for both Qwen 2.5 VL and Gemma-3 multimodal models, enhancing user accessibility.
  2. Clear API Usage Instructions: Both documentation files provide clear steps on how to send API requests using curl with base64 encoded images, including example JSON structures.
  3. Technical Details Included: Specifications for model versions, memory considerations, and technical notes are included to ensure successful deployment of models.

Recommendations

  • Ensure that all references to specific files (like image_b64.txt) are reviewed across the documentation to maintain consistency.
  • Verify that changes in parameter names (gemma-instruct to gemma-3) are applied consistently throughout the documentation and codebase.
  • Consider adding a section on troubleshooting common issues for both models to improve user experience.

Details

Commit 484de606c34cd1a1f47b06897d1dd92ae7fd4fe6

Key Changes and Summary

  1. New Documentation File Added:

    • Created a new Markdown file qwen2-5.md in the docs/user-guide/multimodal/ directory.
  2. Content Overview:

    • Provides step-by-step instructions to set up and use the Qwen 2.5 VL multimodal model.
  3. Steps Detailed:

    • Install WasmEdge: Instructions provided for installing this LLM runtime via a curl script.
    • Download Model Files: Guidance on downloading necessary models from Hugging Face.
    • Download API Server App: Directions to get the llama-api-server.wasm app.
    • Chat with Chatbot UI: Details on setting up and using the chatbot interface through web server commands.
    • Send API Request: Instructions on how to send an API request using curl.
  4. Technical Notes:

    • Specifies required model versions and memory considerations for successful deployment.
    • Includes a sample JSON structure for an API request, demonstrating system prompt usage and image processing.
  5. User Guidance Tips:

    • Offers additional resources such as tips on base64 encoding images.

This patch is essential for users looking to implement the Qwen 2.5 VL model in multimodal applications using WasmEdge technology.

Commit 5bb85f47163e55ef1052b34848ac5079b7f89452

-Key Changes:

  1. Step 3 Title Change: Updated from "Download a portable chatbot app" to "Download a portable API server app."
  2. Step 3 Description Update: Modified to specify that the application builds an OpenAI compatible API server instead of providing a UI for interaction.

Commit 82cfc5cdb756644617ca39eabdaec0dc67ba5c5f

Key Changes:

  1. Created gemma-3.md: Added a new Markdown file in the docs/user-guide/multimodal/ directory.
  2. Gemma-3 Model Documentation: Provided detailed instructions on setting up and using the Gemma-3 multimodal model, including installation of WasmEdge, downloading models, and using the LlamaEdge API server.
  3. Step-by-step Guide: Included steps for installing dependencies, downloading necessary files, and running the application with both UI and API methods.
  4. API Request Example: Demonstrated how to send a CURL request with an image in base64 format and provided expected response structure.

Most Important Finding:

  • The PR introduces comprehensive documentation for users to set up and interact with the Gemma-3 multimodal model, making it accessible for both beginners and experienced developers.

Commit e8f73afad95f6447639d5a669e8aaf3bae810fff

Key Changes:

  • Updated gemma-3.md: Removed the reference to a specific file [image_b64.txt](../assets/image_b64.txt) in the documentation for sending an API request, making it more generic.

Commit ad57e0fed2165ea02880b63a90f8defcb2109881

-### Key Changes Summary

  • Updated qwen2-5.md: Removed a redundant reference to [image_b64.txt](../assets/image_b64.txt), making the instruction clearer and more concise.

Commit 0680f8262e23b7e8db37dd9937a94813b99b81b2

Key Changes:

  • Updated --prompt-template parameter: Changed from gemma-instruct to gemma-3 in the command for starting the web server.

@alabulei1 alabulei1 merged commit db42794 into main May 22, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants