Skip to content

Commit

Permalink
Add support to export XNNPACK based static_llama
Browse files Browse the repository at this point in the history
Summary:
Add support to export XNNPACK based static_llama
- static_llama is the QNN backend hybrid/prefill+decode model with KV cache as the inference input
  - https://www.internalfb.com/code/fbsource/fbcode/executorch/examples/qualcomm/oss_scripts/llama2/model/static_llama.py

Differential Revision: D67867190
  • Loading branch information
Di Xu (SWE) authored and facebook-github-bot committed Jan 6, 2025
1 parent 68c0208 commit a91eb31
Showing 1 changed file with 2 additions and 1 deletion.
3 changes: 2 additions & 1 deletion examples/models/llama/export_llama_lib.py
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@
verbosity_setting = None


EXECUTORCH_DEFINED_MODELS = ["stories110m", "llama2", "llama3", "llama3_1", "llama3_2"]
EXECUTORCH_DEFINED_MODELS = ["stories110m", "llama2", "llama3", "llama3_1", "llama3_2", "static_llama"]
TORCHTUNE_DEFINED_MODELS = ["llama3_2_vision"]


Expand Down Expand Up @@ -649,6 +649,7 @@ def _validate_args(args):
)


# TODO: export static_llama via XNNPACK
def _export_llama(args) -> LLMEdgeManager: # noqa: C901
_validate_args(args)
pt2e_quant_params, quantizers, quant_dtype = get_quantizer_and_quant_params(args)
Expand Down

0 comments on commit a91eb31

Please sign in to comment.