We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
update inference_with_transformers
Add usage instruction of the parameter `--flash_attn`
add speculative sampling
Updated inference_with_transformers_zh (markdown)
Merge branch 'master' of https://github.com/ymcui/Chinese-LLaMA-Alpaca-2.wiki
prioritize full mode usage, fix style
reserve load_in_8bit
修改load_in_8bit为load_in_kbit及部分说明信息
init