Build at Fri Nov 22 15:48:01 UTC 2024

mlc-ai · Nov 22, 2024 · 3a6d95c · 3a6d95c
1 parent 288bba2
commit 3a6d95c
Show file tree

Hide file tree

Showing 4 changed files with 77 additions and 32 deletions.
diff --git a/docs/assets/css/main.css.map b/docs/assets/css/main.css.map
diff --git a/docs/docs/_sources/start/quick_start.rst.txt b/docs/docs/_sources/start/quick_start.rst.txt
@@ -6,41 +6,64 @@ Quick Start
 Example
 -------
 
-The easiest way of trying out XGrammar is to use the ``transformers`` library in Python. 
+The easiest way of trying out XGrammar is to use the ``transformers`` library in Python.
 After :ref:`installing XGrammar <installation>`, run the following example to see how XGrammar enables
 structured generation -- a JSON in this case.
 
+
+Perparation
+^^^^^^^^^^^
+Instantiate a model, a tokenizer, and inputs.
+
 .. code:: python
 
-    from transformers import AutoModelForCausalLM, AutoTokenizer, AutoConfig
-    import torch
     import xgrammar as xgr
 
+    import torch
+    from transformers import AutoModelForCausalLM, AutoTokenizer, AutoConfig
+
     device = "cuda"  # Or "cpu", etc.
-    # 0. Instantiate with any HF model you want
-    model_name = "Qwen/Qwen2.5-0.5B-Instruct"
-    # model_name = "microsoft/Phi-3.5-mini-instruct"
-    # model_name = "meta-llama/Llama-3.2-1B-Instruct"
+    model_name = "meta-llama/Llama-3.1-8B-Instruct"
     model = AutoModelForCausalLM.from_pretrained(
         model_name, torch_dtype=torch.float32, device_map=device
     )
     tokenizer = AutoTokenizer.from_pretrained(model_name)
     config = AutoConfig.from_pretrained(model_name)
 
-    # 1. Compile grammar (NOTE: you can substitute this with other grammars like EBNF, JSON Schema)
-    tokenizer_info = xgr.TokenizerInfo.from_huggingface(tokenizer, vocab_size=config.vocab_size)
-    grammar_compiler = xgr.GrammarCompiler(tokenizer_info)
-    compiled_grammar = grammar_compiler.compile_builtin_json_grammar()
-
-    # 2. Prepare inputs
     messages = [
         {"role": "system", "content": "You are a helpful assistant."},
         {"role": "user", "content": "Introduce yourself in JSON briefly."},
     ]
     texts = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
     model_inputs = tokenizer(texts, return_tensors="pt").to(model.device)
 
-    # 3. Instantiate logits_processor per each generate, generate, and print response
+Compile Grammar
+^^^^^^^^^^^^^^^
+
+Construct a ``GrammarCompiler`` and compile the grammar.
+
+The grammar can be a built-in JSON grammar, a JSON schema string, or an EBNF string. EBNF provides
+more flexibility for customization. See
+`GBNF documentation <https://github.com/ggerganov/llama.cpp/blob/master/grammars/README.md>`_ for
+specification.
+
+.. code:: python
+
+    tokenizer_info = xgr.TokenizerInfo.from_huggingface(tokenizer, vocab_size=config.vocab_size)
+    grammar_compiler = xgr.GrammarCompiler(tokenizer_info)
+    compiled_grammar = grammar_compiler.compile_builtin_json_grammar()
+    # Other ways: provide a json schema string
+    # compiled_grammar = grammar_compiler.compile_json_schema(json_schema_string)
+    # Or provide an EBNF string
+    # compiled_grammar = grammar_compiler.compile_grammar(ebnf_string)
+
+Generate with grammar
+^^^^^^^^^^^^^^^^^^^^^
+
+Use logits_processor to generate with grammar.
+
+.. code:: python
+
     xgr_logits_processor = xgr.contrib.hf.LogitsProcessor(compiled_grammar)
     generated_ids = model.generate(
         **model_inputs, max_new_tokens=512, logits_processor=[xgr_logits_processor]

diff --git a/docs/docs/searchindex.js b/docs/docs/searchindex.js