Local Chatbot in the browser using Phi3, ONNX Runtime Web and WebGPU

This repository contains an example of running Phi-3-mini-4k-instruct in your browser using ONNX Runtime Web with WebGPU.

You can try out the live demo here.

We keep this example simple and use the onnxruntime-web api directly. ONNX Runtime Web has been powering higher level frameworks like transformers.js.

Getting Started

Prerequisites

Ensure that you have Node.js installed on your machine.

Installation

Install the required dependencies:

npm install

Building the project

Build the project:

npm run build

The output can be found in the dist directory.

Building for developent

npm run dev

This will build the project and start a dev server. Point your browser to http://localhost:8080/.

The Phi3 ONNX Model

The model used in this example is hosted on Hugging Face. It is an optimized ONNX version specific to Web and slightly different than the ONNX model for CUDA or CPU:

The model output 'logits' is kept as float32 (even for float16 models) since Javascript does not support float16.
Our WebGPU implementation uses the custom Multiheaded Attention operator instread of Group Query Attention.
Phi3 is larger then 2GB and we need to use external data files. To keep them cacheable in the browser, both model.onnx and model.onnx.data are kept under 2GB.

If you like to optimize your fine-tuned pytorch Phi-3-min model, you can use Olive which supports float data type conversion and ONNX genai model builder toolkit. An example how to optimize Phi-3-min model for ONNX Runtime Web with Olive can be found here.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.devcontainer		.devcontainer
.gitignore		.gitignore
README.md		README.md
index.html		index.html
llm.js		llm.js
main.css		main.css
main.js		main.js
package-lock.json		package-lock.json
package.json		package.json
webpack.config.js		webpack.config.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Local Chatbot in the browser using Phi3, ONNX Runtime Web and WebGPU

Getting Started

Prerequisites

Installation

Building the project

Building for developent

The Phi3 ONNX Model

About

Releases

Packages

Languages

akutsu-kei/onnxrt-web-phi3.5-igpu

Folders and files

Latest commit

History

Repository files navigation

Local Chatbot in the browser using Phi3, ONNX Runtime Web and WebGPU

Getting Started

Prerequisites

Installation

Building the project

Building for developent

The Phi3 ONNX Model

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages