Skip to content
This repository has been archived by the owner on Jun 24, 2024. It is now read-only.

GGUF support #412

Merged
merged 36 commits into from
Nov 12, 2023
Merged

GGUF support #412

merged 36 commits into from
Nov 12, 2023

Conversation

philpax
Copy link
Collaborator

@philpax philpax commented Aug 20, 2023

Implements support for loading and saving GGUF support.

TODO:

  • Implement a basic GGUF format loader that loads the file in as a struct.
  • Load a Llama model with a remote Hugging Face tokenizer.
    • Load a Llama model with an embedded tokenizer (GGML).
    • Load a Llama model with an embedded tokenizer (HF).
    • Ensure that models with a HF tokenizer can operate without a GGML tokenizer.
  • Implement a basic GGUF format saver that saves a Gguf struct to a file.
    • Reimplement quantize. (For extra points, make it multithreaded.)
    • Think about how hyperparameters should generally be treated. Should a model re-write them to a Metadata map?
  • Make sure all llm metadata values are used for llama.
  • Fix all the models.
    • Fix BLOOM.
    • Fix Falcon.
    • Fix GPT-2.
    • Fix GPT-J.
    • Fix GPT-NeoX.
    • Fix MPT.
  • Remove all of the expects.
  • Remove the architecture option and load entirely based on the architecture specified in the GGUF.

Open questions:

  • Should we still support the old formats?
    • No. Instead, we'll take the old code and build a converter with it. Preferably one that can ingest HF files to provide the necessary information for a fully-compliant GGUF.
  • How resilient should we be to malformed GGUF models?
    • Answer: The usual Rust standard. Don't panic if you can avoid it.

Closes #365.

@philpax philpax added this to the 0.2 milestone Aug 21, 2023
@svenstaro
Copy link

I think having a migration tool for converting previous formats to GGUF and then removing support for other models might be the most maintainable solution. It might be too early to definitely call this but I think it's prudent to assume that the ecosystem will converge on GGUF as the preferred format soon.

@philpax philpax mentioned this pull request Aug 27, 2023
@KerfuffleV2
Copy link
Contributor

I've been messing around cleaning up the Python scripts in llama.cpp (like the converters, Python side of GGUF) so if you need to pick someone's brain about GGUF stuff I might be able to help. I'm not a expert by any means.

@philpax
Copy link
Collaborator Author

philpax commented Aug 30, 2023

Aye, I noticed you contributed the conversion script upstream; I'll definitely reach out if I have any questions about the specifics there.

@philpax philpax mentioned this pull request Nov 1, 2023
6 tasks
@philpax philpax changed the base branch from main to develop November 12, 2023 22:08
@philpax philpax marked this pull request as ready for review November 12, 2023 22:11
@philpax philpax merged commit 535eda1 into develop Nov 12, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support GGUF
3 participants