Skip to content

Commit

Permalink
Build at Fri Nov 22 15:48:01 UTC 2024
Browse files Browse the repository at this point in the history
  • Loading branch information
mlc-gh-actions-bot committed Nov 22, 2024
1 parent 288bba2 commit 3a6d95c
Show file tree
Hide file tree
Showing 4 changed files with 77 additions and 32 deletions.
2 changes: 1 addition & 1 deletion docs/assets/css/main.css.map

Large diffs are not rendered by default.

51 changes: 37 additions & 14 deletions docs/docs/_sources/start/quick_start.rst.txt
Original file line number Diff line number Diff line change
Expand Up @@ -6,41 +6,64 @@ Quick Start
Example
-------

The easiest way of trying out XGrammar is to use the ``transformers`` library in Python.
The easiest way of trying out XGrammar is to use the ``transformers`` library in Python.
After :ref:`installing XGrammar <installation>`, run the following example to see how XGrammar enables
structured generation -- a JSON in this case.


Perparation
^^^^^^^^^^^
Instantiate a model, a tokenizer, and inputs.

.. code:: python
from transformers import AutoModelForCausalLM, AutoTokenizer, AutoConfig
import torch
import xgrammar as xgr
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, AutoConfig
device = "cuda" # Or "cpu", etc.
# 0. Instantiate with any HF model you want
model_name = "Qwen/Qwen2.5-0.5B-Instruct"
# model_name = "microsoft/Phi-3.5-mini-instruct"
# model_name = "meta-llama/Llama-3.2-1B-Instruct"
model_name = "meta-llama/Llama-3.1-8B-Instruct"
model = AutoModelForCausalLM.from_pretrained(
model_name, torch_dtype=torch.float32, device_map=device
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
config = AutoConfig.from_pretrained(model_name)
# 1. Compile grammar (NOTE: you can substitute this with other grammars like EBNF, JSON Schema)
tokenizer_info = xgr.TokenizerInfo.from_huggingface(tokenizer, vocab_size=config.vocab_size)
grammar_compiler = xgr.GrammarCompiler(tokenizer_info)
compiled_grammar = grammar_compiler.compile_builtin_json_grammar()
# 2. Prepare inputs
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Introduce yourself in JSON briefly."},
]
texts = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
model_inputs = tokenizer(texts, return_tensors="pt").to(model.device)
# 3. Instantiate logits_processor per each generate, generate, and print response
Compile Grammar
^^^^^^^^^^^^^^^

Construct a ``GrammarCompiler`` and compile the grammar.

The grammar can be a built-in JSON grammar, a JSON schema string, or an EBNF string. EBNF provides
more flexibility for customization. See
`GBNF documentation <https://github.com/ggerganov/llama.cpp/blob/master/grammars/README.md>`_ for
specification.

.. code:: python
tokenizer_info = xgr.TokenizerInfo.from_huggingface(tokenizer, vocab_size=config.vocab_size)
grammar_compiler = xgr.GrammarCompiler(tokenizer_info)
compiled_grammar = grammar_compiler.compile_builtin_json_grammar()
# Other ways: provide a json schema string
# compiled_grammar = grammar_compiler.compile_json_schema(json_schema_string)
# Or provide an EBNF string
# compiled_grammar = grammar_compiler.compile_grammar(ebnf_string)
Generate with grammar
^^^^^^^^^^^^^^^^^^^^^

Use logits_processor to generate with grammar.

.. code:: python
xgr_logits_processor = xgr.contrib.hf.LogitsProcessor(compiled_grammar)
generated_ids = model.generate(
**model_inputs, max_new_tokens=512, logits_processor=[xgr_logits_processor]
Expand Down
2 changes: 1 addition & 1 deletion docs/docs/searchindex.js

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading

0 comments on commit 3a6d95c

Please sign in to comment.