-
Notifications
You must be signed in to change notification settings - Fork 211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Good First Issue]: Create a GGUF reader #1665
Comments
Can this be broken down into smaller exact tasks ? This would allow us to pick off tasks one by one and help contributors slowly build something instead of all of at once. |
It can be for sure but the way I see it assumes that these tasks should be executed subsequently. For example:
... |
.take |
Thank you for looking into this issue! Please let us know if you have any questions or require any help. |
Hello @AlexKoff88 I would like to work on this can this assigned to me? |
Hi @AlexKoff88 ,@ilya-lavrenov , I am able to parse gguf model via using the gguf tools that you provided above howver they are written in C (had to do some linkning on my end) , in the POC you provided in
do we only need to parse the tokenizer not the whole model ? Thank you this is the update on my end main.cpp extern "C" {
#include "gguf-tools/gguflib.h"
}
#include "openvino/op/concat.hpp"
#include "openvino/op/constant.hpp"
#include "openvino/op/convert.hpp"
#include "openvino/op/gather_elements.hpp"
#include "openvino/op/unsqueeze.hpp"
#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>
#include <string.h>
#include <assert.h>
#include <errno.h>
#include <math.h>
#include <inttypes.h>
#include <iostream>
#include <bits/stdc++.h>
struct {
int verbose; // --verbose option
int diffable; // --diffable option
} Opt = {0};
std::pair<std::map<std::string,int>,std::map<std::string,std::vector<double>>> load_gguf_model(const char *model_path){
std::map<std::string,int> config,meta_data;
std::map<std::string,std::vector<double>> weights;
gguf_ctx *ctx = gguf_open(model_path);
if (ctx == NULL) {
perror(model_path);
exit(1);
}
gguf_key key;
while (gguf_get_key(ctx,&key)) {
meta_data[std::string(key.name,key.namelen)] = key.val->uint32;
gguf_next(ctx,key.type,key.val,Opt.verbose);
}
config.emplace("layer_num",meta_data["llama.block_count"]);
config.emplace("head_num",meta_data["llama.attention.head_count"]);
config.emplace("head_size",meta_data["llama.embedding_length"]/meta_data["llama.attention.head_count"]);
config.emplace("head_num_kv",(meta_data.count("llama.attention.head_count_kv")?meta_data["llama.attention.head_count_kv"]:meta_data["llama.attention.head_count"]));
config.emplace("max_position_embeddings",((meta_data.count("llama.context_length")?meta_data["llama.context_length"]:2048)));
config.emplace("rotary_dims",meta_data["llama.rope.dimension_count"]);
config.emplace("rms_norm_eps",meta_data["llama.attention.layer_norm_rms_epsilon"]);
config.emplace("rope_freq_base",((meta_data.count("llama.rope.freq_base")?meta_data["llama.rope.freq_base"]:10000.0)));
for(auto x : config){
std::cout<<x.first<<" : "<<x.second<<std::endl;
}
return{config,weights};
}
int main(int argc, char* argv[]){
std::cout<<"helloworld]n\n";
std::string filename = argv[1];
std::pair<std::map<std::string,int>,std::map<std::string,std::vector<double>>> model = load_gguf_model(argv[1]);
return 0;
} make file: CC=gcc
CXX=g++
CFLAGS=-march=native -ffast-math -g -ggdb -Wall -W -pedantic -O3
INCLUDES=-I./gguf-tools
OBJECTS=gguf-tools/gguflib.o gguf-tools/sds.o gguf-tools/fp16.o
main: $(OBJECTS) main.cpp
$(CXX) $(CFLAGS) $(INCLUDES) main.cpp $(OBJECTS) -o main
%.o: %.c
$(CC) $(CFLAGS) $(INCLUDES) -c $< -o $@
clean:
rm -f main $(OBJECTS) I am also implementing the model conversion logic as pointed in POC, will update as i finish it. |
@Captain-MUDIT would love to collaborate with you, lets wait for the Alex's response & then we can decide how to proceed. |
Sure @11happy |
@AlexKoff88 |
Thanks for being interested in this issue. It looks like this ticket is already assigned to a contributor. Please communicate with the assigned contributor to confirm the status of the issue. |
Hi @11happy and @Captain-MUDIT, thank you for your interest. @11happy, regarding your question about how to load the GGUF in the right way. You can go with how MLX does it so you can add "gguf-tool" to submodules and borrow the code from MLX that parses GGUF with "gguf-tool". Details are here: https://github.com/ml-explore/mlx/blob/main/mlx/io/gguf.cpp @Captain-MUDIT, you can take the tokenizer conversion part. The task is to transform GGUF tokenizer data to OpenVINO tokenizer. OpenVINO has a dedicated project for converting tokenizers from HF Transformers to OpenVINO format: https://github.com/openvinotoolkit/openvino_tokenizers. The idea is to take tokenizer config, vocab and metadata and use a part of |
@AlexKoff88 can I also work on this issue? |
@janviisonii23, you can as there are a few subtasks in it but you have to wait a bit until the core part is implemented. |
There is a TokenizerPipeline class for building tokenizer/detokenizer models. The easiest way is to parse the tokenizer data from The other way is to build the tokenizer by creating the model graph directly, like in this RWKV example. You might also have to create several base pipelines for different tokenizer types: |
The idea is to have a functionality that allows reading GGUF format and creating OpenVINO GenAI compatible representation that can be used to instantiate LLMPipeline() from it.
This task includes:
The initial scope can include support of llama-based LLMs (e.g. llama-3.2 and SmoLMs) and FP16, Q8_0, Q4_0, Q4_1 models.
All the code should be written in C++.
The text was updated successfully, but these errors were encountered: