[docs] Update README

mlc-ai · May 31, 2024 · 66c28db · 66c28db
1 parent 5cb2cbc
commit 66c28db
Show file tree

Hide file tree

Showing 2 changed files with 130 additions and 98 deletions.
diff --git a/README.md b/README.md
@@ -15,13 +15,12 @@
 </div>
 
 ## Overview
-WebLLM is a high-performance in-browser LLM inference engine that directly
-brings language model inference directly onto web browsers with hardware acceleration.
+WebLLM is a high-performance in-browser LLM inference engine that brings language model inference directly onto web browsers with hardware acceleration.
 Everything runs inside the browser with no server support and is accelerated with WebGPU.
 
 WebLLM is **fully compatible with [OpenAI API](https://platform.openai.com/docs/api-reference/chat).**
 That is, you can use the same OpenAI API on **any open source models** locally, with functionalities
-including json-mode, function-calling, streaming, etc.
+including JSON-mode, function-calling, streaming, etc.
 
 We can bring a lot of fun opportunities to build AI assistants for everyone and enable privacy while enjoying GPU acceleration.
 
@@ -36,93 +35,127 @@ This project is a companion project of [MLC LLM](https://github.com/mlc-ai/mlc-l
 
 </div>
 
+## Jumpstart with Examples
+[![Open Demo on JSFiddle](https://img.shields.io/badge/Chat_Demo-JSFiddle-blue?logo=jsfiddle&logoColor=white)](https://jsfiddle.net/neetnestor/4nmgvsa2/)
+[![Open Demo on Codepen](https://img.shields.io/badge/Chat_Demo-Codepen-gainsboro?logo=codepen)](https://codepen.io/neetnestor/pen/vYwgZaG)
+
+More examples are available in the [examples](examples) folder.
+
 ## Get Started
 
 WebLLM offers a minimalist and modular interface to access the chatbot in the browser.
-The WebLLM package itself does not come with UI, and is designed in a
-modular way to hook to any of the UI components. The following code snippet
-demonstrate a simple example that generates a streaming response on a webpage.
-You can check out [examples/get-started](examples/get-started/) to see the complete example.
+The package is designed in a modular way to hook to any of the UI components.
+
+### Installation
+
+#### Package Manager
+
+```sh
+# npm
+npm install @mlc-ai/web-llm
+# yarn
+yarn add @mlc-ai/web-llm
+# or pnpm
+pnpm install @mlc-ai/web-llm
+```
+
+#### CDN Delivery
+
+Thanks to [jsdelivr.com](https://www.jsdelivr.com/package/npm/@mlc-ai/web-llm), WebLLM can be imported directly through URL and work out-of-the-box on cloud development platforms like [jsfiddle.net](https://jsfiddle.net/) and [Codepen.io](https://codepen.io/):
+
+```javascript
+import * as webllm from "https://esm.run/@mlc-ai/web-llm";
+```
+
+### Create MLCEngine
+
+Most operations in WebLLM are invoked through the `MLCEngine` interface. To get started, create an `MLCEngine` instance. 
 
 ```typescript
-import * as webllm from "@mlc-ai/web-llm";
+import { MLCEngine, MLCEngineInterface } from "@mlc-ai/web-llm";
 
-async function main() {
-  const initProgressCallback = (report: webllm.InitProgressReport) => {
-    const label = document.getElementById("init-label");
-    label.innerText = report.text;
-  };
-  const selectedModel = "Llama-3-8B-Instruct-q4f32_1-MLC";
-  const engine: webllm.MLCEngineInterface = await webllm.CreateMLCEngine(
-    selectedModel,
-    /*engineConfig=*/ { initProgressCallback: initProgressCallback },
-  );
+const engine: MLCEngineInterface = new MLCEngine();
+```
 
-  const reply0 = await engine.chat.completions.create({
-    messages: [{ role: "user", content: "Tell me about Pittsburgh." }],
-  });
-  console.log(reply0);
-  console.log(await engine.runtimeStatsText());
-}
+Then, select a model and load it into the `engine`. For the full list of built-in models supported by WebLLM `MLCEngine`, check [Model Support](#model-support) below.
+
+```typescript
+engine.setInitProgressCallback((progress) => {
+  // Update model loading progress
+  console.log(progress);
+});
 
-main();
+const selectedModel = "Llama-3-8B-Instruct-q4f32_1-MLC";
+await engine.reload(selectedModel, chatConfig, appConfig);
 ```
 
-Note that if you need to separate the instantiation of `webllm.MLCEngine` from loading a model, you could substitute
+Alternatively, you can create the engine and load the model at once using `CreateMLCEngine()`.
 
 ```typescript
-const engine: webllm.MLCEngineInterface = await webllm.CreateMLCEngine(
+import { CreateMLCEngine, MLCEngineInterface } from "@mlc-ai/web-llm";
+
+const engine: MLCEngineInterface = await CreateMLCEngine(
   selectedModel,
   /*engineConfig=*/ { initProgressCallback: initProgressCallback },
 );
 ```
 
-with the equivalent
+### Chat Completion
+After successfully initializing the engine, you can now invoke chat completions using OpenAI style chat APIs through the `engine.chat.completions` interface. For the full list of parameters and their descriptions, check [section below](#full-openai-compatibility) and [OpenAI API reference](https://platform.openai.com/docs/api-reference/chat/create).
+
+(Note: The `model` parameter is not supported and will be ignored here. Instead, call `CreateMLCEngine(model)` or `engine.reload(model)` instead as shown in the [Create MLCEngine](#create-mlcengine) above.)
+
 
 ```typescript
-const engine: webllm.MLCEngineInterface = new webllm.MLCEngine();
-engine.setInitProgressCallback(initProgressCallback);
-await engine.reload(selectedModel, chatConfig, appConfig);
+const messages = [
+  { role: "system", content: "You are a helpful AI assistant." },
+  { role: "user", content: "Hello!" },
+]
+
+const reply = await engine.chat.completions.create({
+  messages,
+  temperature: 1,
+});
+console.log(reply.choices[0].message);
+console.log(await engine.runtimeStatsText());
 ```
 
-### CDN Delivery
+### Streaming
 
-Thanks to [jsdelivr.com](https://www.jsdelivr.com/package/npm/@mlc-ai/web-llm), the following Javascript code should work out-of-the-box on sites like [jsfiddle.net](https://jsfiddle.net/):
-
-```javascript
-import * as webllm from "https://esm.run/@mlc-ai/web-llm";
+WebLLM also supports streaming chat completion generating. To use it, simply pass `stream: true` to the `engine.chat.completions.create` call.
 
-async function main() {
-  const initProgressCallback = (report) => {
-    console.log(report.text);
-  };
-  const selectedModel = "TinyLlama-1.1B-Chat-v0.4-q4f16_1-MLC-1k";
-  const engine = await webllm.CreateMLCEngine(selectedModel, {
-    initProgressCallback: initProgressCallback,
-  });
+```typescript
+const messages = [
+  { role: "system", content: "You are a helpful AI assistant." },
+  { role: "user", content: "Hello!" },
+]
+
+const chunks = await engine.chat.completions.create({
+  messages,
+  temperature: 1,
+  stream: true, // <-- Enable streaming
+});
 
-  const reply = await engine.chat.completions.create({
-    messages: [
-      {
-        role: "user",
-        content: "Tell me about Pittsburgh.",
-      },
-    ],
-  });
+let reply = "";
+for await (const chunk of chunks) {
+  reply += chunk.choices[0].delta.content || "";
   console.log(reply);
-  console.log(await engine.runtimeStatsText());
 }
 
-main();
+const fullReply = await engine.getMessage()
+console.log(fullReply);
+console.log(await engine.runtimeStatsText());
 ```
 
+## Advanced Usage
+
 ### Using Web Worker
 
 WebLLM comes with API support for WebWorker so you can hook
 the generation process into a separate worker thread so that
-the compute in the webworker won't disrupt the UI.
+the computing in the worker thread won't disrupt the UI.
 
-We first create a worker script that created a MLCEngine and
+We will first create a worker script with a MLCEngine and
 hook it up to a handler that handles requests.
 
 ```typescript
@@ -142,15 +175,18 @@ implements the same `MLCEngineInterface`. The rest of the logic remains the same
 
 ```typescript
 // main.ts
-import * as webllm from "@mlc-ai/web-llm";
+import { MLCEngineInterface, CreateWebWorkerMLCEngine } from "@mlc-ai/web-llm";
 
 async function main() {
   // Use a WebWorkerMLCEngine instead of MLCEngine here
-  const engine: webllm.MLCEngineInterface =
-    await webllm.CreateWebWorkerMLCEngine(
-      /*worker=*/ new Worker(new URL("./worker.ts", import.meta.url), {
-        type: "module",
-      }),
+  const engine: MLCEngineInterface =
+    await CreateWebWorkerMLCEngine(
+      new Worker(
+        new URL("./worker.ts", import.meta.url), 
+        {
+          type: "module",
+        }
+      ),
       /*modelId=*/ selectedModel,
       /*engineConfig=*/ { initProgressCallback: initProgressCallback },
     );
@@ -164,9 +200,10 @@ WebLLM comes with API support for ServiceWorker so you can hook the generation p
 into a service worker to avoid reloading the model in every page visit and optimize
 your application's offline experience.
 
-We first create a service worker script that created a MLCEngine and hook it up to a handler
+We first create a service worker script with a MLCEngine and hook it up to a handler
 that handles requests when the service worker is ready.
 
+
 ```typescript
 // sw.ts
 import {
@@ -189,70 +226,65 @@ Then in the main logic, we register the service worker and then create the engin
 
 ```typescript
 // main.ts
+import { MLCEngineInterface, CreateServiceWorkerMLCEngine } from "@mlc-ai/web-llm";
+
 if ("serviceWorker" in navigator) {
   navigator.serviceWorker.register(
     /*workerScriptURL=*/ new URL("sw.ts", import.meta.url),
     { type: "module" },
   );
 }
 
-const engine: webllm.MLCEngineInterface =
-  await webllm.CreateServiceWorkerMLCEngine(
+const engine: MLCEngineInterface =
+  await CreateServiceWorkerMLCEngine(
     /*modelId=*/ selectedModel,
     /*engineConfig=*/ { initProgressCallback: initProgressCallback },
   );
 ```
 
 You can find a complete example on how to run WebLLM in service worker in [examples/service-worker](examples/service-worker/).
 
-### Build a ChatApp
-
-You can find a complete chat app example in [examples/simple-chat](examples/simple-chat/).
-
 ### Chrome Extension
-
-You can also find examples on building chrome extension with WebLLM in [examples/chrome-extension](examples/chrome-extension/) and [examples/chrome-extension-webgpu-service-worker](examples/chrome-extension-webgpu-service-worker/). The latter one leverages service worker, so the extension is persistent in the background.
+You can also find examples of building Chrome extension with WebLLM in [examples/chrome-extension](examples/chrome-extension/) and [examples/chrome-extension-webgpu-service-worker](examples/chrome-extension-webgpu-service-worker/). The latter one leverages service worker, so the extension is persistent in the background.
 
 ## Full OpenAI Compatibility
-
-WebLLM is designed to be fully compatible with [OpenAI API](https://platform.openai.com/docs/api-reference/chat). Thus, besides building simple chat bot, you can also have the following functionalities with WebLLM:
+WebLLM is designed to be fully compatible with [OpenAI API](https://platform.openai.com/docs/api-reference/chat). Thus, besides building a simple chatbot, you can also have the following functionalities with WebLLM:
 
 - [streaming](examples/streaming): return output as chunks in real-time in the form of an AsyncGenerator
-- [json-mode](examples/json-mode): efficiently ensure output is in json format, see [OpenAI Reference](https://platform.openai.com/docs/guides/text-generation/chat-completions-api) for more.
+- [json-mode](examples/json-mode): efficiently ensure output is in JSON format, see [OpenAI Reference](https://platform.openai.com/docs/guides/text-generation/chat-completions-api) for more.
 - [function-calling](examples/function-calling): function calling with fields `tools` and `tool_choice`.
-- [seed-to-reproduce](examples/seed-to-reproduce): use seeding to ensure reproducible output with fields `seed`.
+- [seed-to-reproduce](examples/seed-to-reproduce): use seeding to ensure a reproducible output with fields `seed`.
 
 ## Model Support
 
-We export all supported models in `webllm.prebuiltAppConfig`, where you can see a list of models
-that you can simply call `const engine: webllm.MLCEngineInterface = await webllm.CreateMLCEngine(anyModel)` with.
-Prebuilt models include:
-
-- Llama-2
-- Llama-3
+We export all our prebuilt models in [`prebuiltAppConfig`](https://github.com/mlc-ai/web-llm/blob/main/src/config.ts#L291), including
+- Llama 2 and Llama 3
+- Phi 1.5 and Phi 2
 - Gemma
-- Phi-1.5 and Phi-2
-- Mistral-7B-Instruct
-- OpenHermes-2.5-Mistral-7B
-- NeuralHermes-2.5-Mistral-7B
-- TinyLlama
+- Qwen 1.5
+- Zephyr
 - RedPajama
+- Mistral
+- OpenHermes
+- NeuralHermes
+- TinyLlama
 
 Alternatively, you can compile your own model and weights as described below.
-
 WebLLM works as a companion project of [MLC LLM](https://github.com/mlc-ai/mlc-llm).
-It reuses the model artifact and builds flow of MLC LLM, please check out
+It reuses the model artifact and builds the flow of MLC LLM, please check out
 [MLC LLM document](https://llm.mlc.ai/docs/deploy/javascript.html)
 on how to add new model weights and libraries to WebLLM.
 
-Here, we go over the high-level idea. There are two elements of the WebLLM package that enables new models and weight variants.
+Here, we go over the high-level idea. There are two elements of the WebLLM package that enable new models and weight variants.
 
 - `model`: Contains a URL to model artifacts, such as weights and meta-data.
 - `model_lib`: A URL to the web assembly library (i.e. wasm file) that contains the executables to accelerate the model computations.
 
 Both are customizable in the WebLLM.
 
 ```typescript
+import { CreateMLCEngine } from "@mlc-ai/web-llm";
+
 async main() {
   const appConfig = {
     "model_list": [
@@ -275,7 +307,7 @@ async main() {
   // and cache it in the browser cache
   // The chat will also load the model library from "/url/to/myllama3b.wasm",
   // assuming that it is compatible to the model in myLlamaUrl.
-  const engine = await webllm.CreateMLCEngine(
+  const engine = await CreateMLCEngine(
     "MyLlama-3b-v1-q4f32_0",
     /*engineConfig=*/{ chatOpts: chatOpts, appConfig: appConfig }
   );
@@ -284,7 +316,7 @@ async main() {
 
 In many cases, we only want to supply the model weight variant, but
 not necessarily a new model (e.g. `NeuralHermes-Mistral` can reuse `Mistral`'s
-model library). For examples on how a model library can be shared by different model variants,
+model library). For examples of how a model library can be shared by different model variants,
 see `prebuiltAppConfig`.
 
 ## Build WebLLM Package From Source
@@ -329,12 +361,12 @@ gem install jekyll-remote-theme
 
 ## Links
 
-- [Demo page](https://webllm.mlc.ai/)
+- [Demo App: WebLLM Chat](https://chat.webllm.ai/)
 - If you want to run LLM on native runtime, check out [MLC-LLM](https://github.com/mlc-ai/mlc-llm)
 - You might also be interested in [Web Stable Diffusion](https://github.com/mlc-ai/web-stable-diffusion/).
 
 ## Acknowledgement
 
-This project is initiated by members from CMU catalyst, UW SAMPL, SJTU, OctoML and the MLC community. We would love to continue developing and supporting the open-source ML community.
+This project is initiated by members from CMU Catalyst, UW SAMPL, SJTU, OctoML, and the MLC community. We would love to continue developing and supporting the open-source ML community.
 
-This project is only possible thanks to the shoulders open-source ecosystems that we stand on. We want to thank the Apache TVM community and developers of the TVM Unity effort. The open-source ML community members made these models publicly available. PyTorch and Hugging Face communities make these models accessible. We would like to thank the teams behind vicuna, SentencePiece, LLaMA, Alpaca. We also would like to thank the WebAssembly, Emscripten, and WebGPU communities. Finally, thanks to Dawn and WebGPU developers.
+This project is only possible thanks to the shoulders open-source ecosystems that we stand on. We want to thank the Apache TVM community and developers of the TVM Unity effort. The open-source ML community members made these models publicly available. PyTorch and Hugging Face communities make these models accessible. We would like to thank the teams behind Vicuna, SentencePiece, LLaMA, and Alpaca. We also would like to thank the WebAssembly, Emscripten, and WebGPU communities. Finally, thanks to Dawn and WebGPU developers.
diff --git a/examples/README.md b/examples/README.md
@@ -11,13 +11,13 @@ Note that all examples below run in-browser and use WebGPU as a backend.
 
 - [get-started](get-started): minimum get started example with chat completion.
 
-  [![Open on JSFiddle](https://img.shields.io/badge/open-JSFiddle-blue?logo=jsfiddle&logoColor=white)](https://jsfiddle.net/neetnestor/yac9gbwf/)
-  [![Open on Codepen](https://img.shields.io/badge/open-codepen-gainsboro?logo=codepen)](https://codepen.io/neetnestor/pen/NWVdgey)
+  [![Open in JSFiddle](https://img.shields.io/badge/open-JSFiddle-blue?logo=jsfiddle&logoColor=white)](https://jsfiddle.net/neetnestor/yac9gbwf/)
+  [![Open in Codepen](https://img.shields.io/badge/open-codepen-gainsboro?logo=codepen)](https://codepen.io/neetnestor/pen/NWVdgey)
 
 - [simple-chat-js](simple-chat-js): a mininum and complete chat bot app in vanilla JavaScript.
 
-  [![Open on JSFiddle](https://img.shields.io/badge/open-JSFiddle-blue?logo=jsfiddle&logoColor=white)](https://jsfiddle.net/neetnestor/4nmgvsa2/)
-  [![Open on Codepen](https://img.shields.io/badge/open-codepen-gainsboro?logo=codepen)](https://codepen.io/neetnestor/pen/vYwgZaG)
+  [![Open in JSFiddle](https://img.shields.io/badge/open-JSFiddle-blue?logo=jsfiddle&logoColor=white)](https://jsfiddle.net/neetnestor/4nmgvsa2/)
+  [![Open in Codepen](https://img.shields.io/badge/open-codepen-gainsboro?logo=codepen)](https://codepen.io/neetnestor/pen/vYwgZaG)
 
 - [simple-chat-ts](simple-chat-ts): a mininum and complete chat bot app in TypeScript.
 - [get-started-web-worker](get-started-web-worker): same as get-started, but using web worker.