lm-sys · BabyChouSr · Jul 9, 2024 · Jul 29, 2024 · Jul 29, 2024 · Jul 30, 2024
diff --git a/README.md b/README.md
@@ -1,9 +1,9 @@
 # FastChat
-| [**Demo**](https://chat.lmsys.org/) | [**Discord**](https://discord.gg/HSWAKCrnFx) | [**X**](https://x.com/lmsysorg) |
+| [**Demo**](https://lmarena.ai/) | [**Discord**](https://discord.gg/HSWAKCrnFx) | [**X**](https://x.com/lmsysorg) |
 
 FastChat is an open platform for training, serving, and evaluating large language model based chatbots.
-- FastChat powers Chatbot Arena (https://chat.lmsys.org/), serving over 10 million chat requests for 70+ LLMs.
-- Chatbot Arena has collected over 500K human votes from side-by-side LLM battles to compile an online [LLM Elo leaderboard](https://leaderboard.lmsys.org).
+- FastChat powers Chatbot Arena ([lmarena.ai](https://lmarena.ai)), serving over 10 million chat requests for 70+ LLMs.
+- Chatbot Arena has collected over 1.5M human votes from side-by-side LLM battles to compile an online [LLM Elo leaderboard](https://lmarena.ai/?leaderboard).
 
 FastChat's core features include:
 - The training and evaluation code for state-of-the-art models (e.g., Vicuna, MT-Bench).
@@ -26,7 +26,7 @@ FastChat's core features include:
 
 </details>
 
-<a href="https://chat.lmsys.org"><img src="assets/demo_narrow.gif" width="70%"></a>
+<a href="https://lmarena.ai"><img src="assets/demo_narrow.gif" width="70%"></a>
 
 ## Contents
 - [Install](#install)
@@ -97,7 +97,7 @@ You can use the commands below to chat with them. They will automatically downlo
 
 ## Inference with Command Line Interface
 
-<a href="https://chat.lmsys.org"><img src="assets/screenshot_cli.png" width="70%"></a>
+<a href="https://lmarena.ai"><img src="assets/screenshot_cli.png" width="70%"></a>
 
 (Experimental Feature: You can specify `--style rich` to enable rich text output and better text streaming quality for some non-ASCII content. This may not work properly on certain terminals.)
 
@@ -202,7 +202,7 @@ export FASTCHAT_USE_MODELSCOPE=True
 
 ## Serving with Web GUI
 
-<a href="https://chat.lmsys.org"><img src="assets/screenshot_gui.png" width="70%"></a>
+<a href="https://lmarena.ai"><img src="assets/screenshot_gui.png" width="70%"></a>
 
 To serve using the web UI, you need three main components: web servers that interface with users, model workers that host one or more models, and a controller to coordinate the webserver and model workers. You can learn more about the architecture [here](docs/server_arch.md).
 

diff --git a/docs/arena.md b/docs/arena.md
@@ -1,5 +1,5 @@
 # Chatbot Arena
-Chatbot Arena is an LLM benchmark platform featuring anonymous, randomized battles, available at https://chat.lmsys.org.
+Chatbot Arena is an LLM benchmark platform featuring anonymous, randomized battles, available at https://lmarena.ai.
 We invite the entire community to join this benchmarking effort by contributing your votes and models.
 
 ## How to add a new model

diff --git a/docs/dashinfer_integration.md b/docs/dashinfer_integration.md
@@ -0,0 +1,23 @@
+# dash-infer Integration
+[DashInfer](https://github.com/modelscope/dash-infer) is a high-performance inference engine specifically optimized for CPU environments, delivering exceptional performance boosts for LLM inference tasks. It supports acceleration for a variety of models including Llama, Qwen, and ChatGLM, making it a versatile choice as a performant worker in FastChat. Notably, DashInfer exhibits significant performance enhancements on both Intel x64 and ARMv9 processors, catering to a wide spectrum of hardware platforms. Its efficient design and optimization techniques ensure rapid and accurate inference capabilities, making it an ideal solution for deploying large language models in resource-constrained environments or scenarios where CPU utilization is preferred over GPU acceleration.
+
+## Instructions
+1. Install dash-infer.
+    ```
+    pip install dashinfer
+    ```
+
+2. When you launch a model worker, replace the normal worker (`fastchat.serve.model_worker`) with the dash-infer worker (`fastchat.serve.dashinfer_worker`). All other commands such as controller, gradio web server, and OpenAI API server are kept the same.
+   ```
+   python3 -m fastchat.serve.dashinfer_worker --model-path qwen/Qwen-7B-Chat --revision=master /path/to/dashinfer-model-generation-config.json
+   ```
+Here is an example:
+   ```
+   python3 -m fastchat.serve.dashinfer_worker --model-path qwen/Qwen-7B-Chat --revision=master dash-infer/examples/python/model_config/config_qwen_v10_7b.json
+   ```
+
+   If you use an already downloaded model, try to replace model-path with a local one and choose a conversation template via --conv-template option
+   '''
+   python3 -m fastchat.serve.dashinfer_worker --model-path ~/.cache/modelscope/hub/qwen/Qwen-7B-Chat --conv-template qwen-7b-chat /path/to/dashinfer-model-generation-config.json
+   '''
+   All avaliable conversation chat templates are listed at [fastchat/conversation.py](../fastchat/conversation.py)
diff --git a/docs/dataset_release.md b/docs/dataset_release.md
@@ -2,5 +2,6 @@
 We release the following datasets based on our projects and websites.
 
 - [LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset](https://huggingface.co/datasets/lmsys/lmsys-chat-1m)
+- [LMSYS-Human-Preference-55k](lmsys/lmsys-arena-human-preference-55k)
 - [Chatbot Arena Conversation Dataset](https://huggingface.co/datasets/lmsys/chatbot_arena_conversations)
 - [MT-bench Human Annotation Dataset](https://huggingface.co/datasets/lmsys/mt_bench_human_judgments)
diff --git a/fastchat/constants.py b/fastchat/constants.py
@@ -7,6 +7,13 @@
 
 REPO_PATH = os.path.dirname(os.path.dirname(__file__))
 
+# Survey Link URL (to be removed)
+SURVEY_LINK = """<div style='text-align: center; margin: 20px 0;'>
+    <div style='display: inline-block; border: 2px solid #DE3163; padding: 10px; border-radius: 5px;'>
+        <span style='color: #DE3163; font-weight: bold;'>We would love your feedback! Fill out <a href='https://docs.google.com/forms/d/e/1FAIpQLSfKSxwFOW6qD05phh4fwYjk8q0YV1VQe_bmK0_qOVTbC66_MA/viewform?usp=sf_link' style='color: #DE3163; text-decoration: underline;'>this short survey</a> to tell us what you like about the arena, what you don't like, and what you want to see in the future.</span>
+    </div>
+</div>"""
+
 ##### For the gradio web server
 SERVER_ERROR_MSG = (
     "**NETWORK ERROR DUE TO HIGH TRAFFIC. PLEASE REGENERATE OR REFRESH THIS PAGE.**"
@@ -21,7 +28,7 @@
 CONVERSATION_LIMIT_MSG = "YOU HAVE REACHED THE CONVERSATION LENGTH LIMIT. PLEASE CLEAR HISTORY AND START A NEW CONVERSATION."
 INACTIVE_MSG = "THIS SESSION HAS BEEN INACTIVE FOR TOO LONG. PLEASE REFRESH THIS PAGE."
 SLOW_MODEL_MSG = "⚠️  Both models will show the responses all at once. Please stay patient as it may take over 30 seconds."
-RATE_LIMIT_MSG = "**RATE LIMIT OF THIS MODEL IS REACHED. PLEASE COME BACK LATER OR USE <span style='color: red; font-weight: bold;'>[BATTLE MODE](https://chat.lmsys.org)</span> (the 1st tab).**"
+RATE_LIMIT_MSG = "**RATE LIMIT OF THIS MODEL IS REACHED. PLEASE COME BACK LATER OR USE <span style='color: red; font-weight: bold;'>[BATTLE MODE](https://lmarena.ai)</span> (the 1st tab).**"
 # Maximum input length
 INPUT_CHAR_LEN_LIMIT = int(os.getenv("FASTCHAT_INPUT_CHAR_LEN_LIMIT", 12000))
 BLIND_MODE_INPUT_CHAR_LEN_LIMIT = int(

diff --git a/fastchat/conversation.py b/fastchat/conversation.py
@@ -582,7 +582,11 @@ def save_new_images(self, has_csam_images=False, use_remote_storage=False):
         from fastchat.utils import load_image, upload_image_file_to_gcs
         from PIL import Image
 
-        _, last_user_message = self.messages[-2]
+        last_user_message = None
+        for role, message in reversed(self.messages):
+            if role == "user":
+                last_user_message = message
+                break
 
         if type(last_user_message) == tuple:
             text, images = last_user_message[0], last_user_message[1]

diff --git a/fastchat/llm_judge/README.md b/fastchat/llm_judge/README.md
@@ -57,6 +57,8 @@ To make sure FastChat loads the correct prompt template, see the supported model
 
 You can also specify `--num-gpus-per-model` for model parallelism (needed for large 65B models) and `--num-gpus-total` to parallelize answer generation with multiple GPUs.
 
+> Note: if you experience slow answer generation, please refer to [Other Backends](#other-backends) section to use inference engine to speed up by 20x.
+
 #### Step 2. Generate GPT-4 judgments
 There are several options to use GPT-4 as a judge, such as pairwise winrate and single-answer grading.
 In MT-bench, we recommend single-answer grading as the default mode.
@@ -134,9 +136,7 @@ We can also use vLLM for answer generation, which can be faster for the models s
 
 1. Launch a vLLM worker
 ```
-python3 -m fastchat.serve.controller
-python3 -m fastchat.serve.vllm_worker --model-path [MODEL-PATH]
-python3 -m fastchat.serve.openai_api_server --host localhost --port 8000
+vllm serve [MODEL-PATH] --dtype auto
 ```
   - Arguments:
     - `[MODEL-PATH]` is the path to the weights, which can be a local folder or a Hugging Face repo ID.

diff --git a/fastchat/model/model_adapter.py b/fastchat/model/model_adapter.py
@@ -2423,7 +2423,7 @@ def get_default_conv_template(self, model_path: str) -> Conversation:
 
 
 class DBRXAdapter(BaseModelAdapter):
-    """The model adapter for Cohere"""
+    """The model adapter for Databricks"""
 
     def match(self, model_path: str):
         return model_path in ["dbrx-instruct"]

diff --git a/fastchat/model/model_registry.py b/fastchat/model/model_registry.py
@@ -195,17 +195,17 @@ def get_model_info(name: str) -> ModelInfo:
 )
 
 register_model_info(
-    ["command-r-plus"],
-    "Command-R-Plus",
+    ["command-r-plus", "command-r-plus-04-2024"],
+    "Command R+",
     "https://txt.cohere.com/command-r-plus-microsoft-azure/",
-    "Command-R Plus by Cohere",
+    "Command R+ by Cohere",
 )
 
 register_model_info(
-    ["command-r"],
-    "Command-R",
+    ["command-r", "command-r-03-2024", "command-r-08-2024"],
+    "Command R",
     "https://txt.cohere.com/command-r/",
-    "Command-R by Cohere",
+    "Command R by Cohere",
 )
 
 register_model_info(