Support Llama / Hugging Face's Universal Format (GGUF) #659
BrainSlugs83
started this conversation in
New features / APIs
Replies: 3 comments
-
@BrainSlugs83 I saw this post and was inspired to come up with a solution, at least in the interim, because I was hoping for a single file format as well. I'm working on this project you may find useful. I will just let the video role. If it is something you may be interested in and wish to test it, let me know. vfolder_phi3-onnx.mp4 |
Beta Was this translation helpful? Give feedback.
0 replies
-
Hi @BrainSlugs83, this API uses the ONNX (Open Neural Network Exchange) model format. Moving this issue into a Discussion as a feature request. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Are there any plans to support the Hugging Face / Llama.cpp universal format (GGUF)?
This format is very popular and uses just a single file to describe a whole model (even for mixture of experts models), it's optimized for fast loading for inference (whether on the CPU or GPU, or elsewhere), and supports quantization. There's also built-in tooling on hugging face to automatically convert other repositories to this format.
The format is designed to be unambiguous by containing all the information needed to load a model. It is also designed to be extensible, so that new information can be added to models without breaking compatibility.
And there is a huge repo of models based on Llama, Mistral, etc. that are already in this format; including fine tunes of Microsoft Phi.
It would be hugely convenient for developers if the DirectML model loader could just load those these...
[Side Question: what is the current supported format? -- I can't really find any repos on hugging face that seem similar enough to the supported Phi-3 repo that "just work" they always complain about missing JSON files, etc. -- and I'm not even fully sure what the current format is, let alone how to convert an existing model to it.]
Beta Was this translation helpful? Give feedback.
All reactions