wasm optimization？ #494

137591 · 2024-07-02T06:01:22Z

When I use the web-llm instance (path: /web-llm/examples/simple-chat), and observe the source file (@mlc-ai/web-llm/lib/index.js), I notice that there is a lot of interaction with wasm files, which makes reading the source code somewhat difficult. I would be very grateful if you could inform me of the logical content of all the wasm files! Additionally, I have observed that there seems to be room for optimization in the implementation of model files (for example: "model_lib_url": modelLibURLPrefix + modelVersion + "/Llama-3-8B-Instruct-q4f32_1-ctx1k_cs1k-webgpu.wasm"). May I inquire if I should optimize through modifying the TVM compilation process?

CharlieFRuan · 2024-07-15T17:54:54Z

Thanks for the question! The wasm is composed of various parts, including the kernel of the model (in WGSL), and runtime support (C++ code compiled into WASM).

The kernel is implemented in MLC-LLM and compiled to WGSL: https://llm.mlc.ai/docs/deploy/webllm.html#bring-your-own-model-library
Runtime support from MLC-LLM: https://github.com/mlc-ai/mlc-llm/blob/main/web/emcc/mlc_wasm_runtime.cc
Runtime support from TVM (one of the three files): https://github.com/apache/tvm/blob/main/web/emcc/wasm_runtime.cc
The kernel, the runtime support (compiled into .bc) are then linked together to form the final .wasm file: https://github.com/apache/tvm/blob/main/python/tvm/contrib/emcc.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wasm optimization？ #494

wasm optimization？ #494

137591 commented Jul 2, 2024

CharlieFRuan commented Jul 15, 2024

wasm optimization？ #494

wasm optimization？ #494

Comments

137591 commented Jul 2, 2024

CharlieFRuan commented Jul 15, 2024