[Good First Issue]: Create a GGUF reader #1665

AlexKoff88 · 2025-02-03T11:53:29Z

The idea is to have a functionality that allows reading GGUF format and creating OpenVINO GenAI compatible representation that can be used to instantiate LLMPipeline() from it.
This task includes:

Parsing GGUF with gguf-tools C++ library: https://github.com/antirez/gguf-tools/
Using the code similar to MLX to get weights and config. Here is MLX Python API part but we need its C++ functionality: https://github.com/ml-explore/mlx-examples/blob/c117af83b8cbec15523bd0d69e7a57f01237ca89/llms/gguf_llm/models.py#L275
Creating IR model, OpenVINO tokenizer/detokenizer and config.json on the fly. Here is the POC repository that does this in Python: https://github.com/AlexKoff88/gguf-to-openvino

The initial scope can include support of llama-based LLMs (e.g. llama-3.2 and SmoLMs) and FP16, Q8_0, Q4_0, Q4_1 models.
All the code should be written in C++.

Geeks-Sid · 2025-02-03T22:23:35Z

Can this be broken down into smaller exact tasks ? This would allow us to pick off tasks one by one and help contributors slowly build something instead of all of at once.

AlexKoff88 · 2025-02-04T06:10:18Z

It can be for sure but the way I see it assumes that these tasks should be executed subsequently. For example:

One can start by enabling llama-3.2-1b in FP16.
Parsing and converting tokenizer from GGUF format to OpenVINO (tokenizer/detokenizer models). After that, we will have core functionality in place.
Then, a few tasks can be executed in parallel:
- Enable Q8_0 llama
- Enable Q4_0 and Q4_1 llama
- Enable and verify other llama-based models such as Llama-3.1-8B, SmolLMs
- Enable the most popular quantization schemes such as Q4_K_M
- Enable Qwen model family

...

11happy · 2025-02-12T06:33:57Z

.take

github-actions · 2025-02-12T06:34:19Z

Thank you for looking into this issue! Please let us know if you have any questions or require any help.

Captain-MUDIT · 2025-02-15T05:29:41Z

Hello @AlexKoff88 I would like to work on this can this assigned to me?

11happy · 2025-02-15T14:00:12Z

Hi @AlexKoff88 ,@ilya-lavrenov , I am able to parse gguf model via using the gguf tools that you provided above howver they are written in C (had to do some linkning on my end) , in the POC you provided in load_gguf_model function
the weights is a very complex map containing vector, vector<vector> (layers) should I continue with similar implementation on my end? please point me if I am headed in the right direction. Also I had doubt as you mentioned earlier

Parsing and converting tokenizer from GGUF format to OpenVINO (tokenizer/detokenizer models). After that, we will have core functionality in place.

do we only need to parse the tokenizer not the whole model ?

Thank you

this is the update on my end

main.cpp

extern "C" {
    #include "gguf-tools/gguflib.h"
}
#include "openvino/op/concat.hpp"
#include "openvino/op/constant.hpp"
#include "openvino/op/convert.hpp"
#include "openvino/op/gather_elements.hpp"
#include "openvino/op/unsqueeze.hpp"
#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>
#include <string.h>
#include <assert.h>
#include <errno.h>
#include <math.h>
#include <inttypes.h>
#include <iostream>
#include <bits/stdc++.h>

struct {
    int verbose;        // --verbose option
    int diffable;       // --diffable option
} Opt = {0};


std::pair<std::map<std::string,int>,std::map<std::string,std::vector<double>>> load_gguf_model(const char *model_path){
    
    std::map<std::string,int> config,meta_data;
    std::map<std::string,std::vector<double>> weights;
    gguf_ctx *ctx = gguf_open(model_path);
    if (ctx == NULL) {
        perror(model_path);
        exit(1);
    }
    gguf_key key;
    while (gguf_get_key(ctx,&key)) {
        meta_data[std::string(key.name,key.namelen)] = key.val->uint32;
        gguf_next(ctx,key.type,key.val,Opt.verbose);
    }
    config.emplace("layer_num",meta_data["llama.block_count"]);
    config.emplace("head_num",meta_data["llama.attention.head_count"]);
    config.emplace("head_size",meta_data["llama.embedding_length"]/meta_data["llama.attention.head_count"]);
    config.emplace("head_num_kv",(meta_data.count("llama.attention.head_count_kv")?meta_data["llama.attention.head_count_kv"]:meta_data["llama.attention.head_count"]));
    config.emplace("max_position_embeddings",((meta_data.count("llama.context_length")?meta_data["llama.context_length"]:2048)));
    config.emplace("rotary_dims",meta_data["llama.rope.dimension_count"]);
    config.emplace("rms_norm_eps",meta_data["llama.attention.layer_norm_rms_epsilon"]);
    config.emplace("rope_freq_base",((meta_data.count("llama.rope.freq_base")?meta_data["llama.rope.freq_base"]:10000.0)));
    
    for(auto x : config){
        std::cout<<x.first<<" : "<<x.second<<std::endl;
    }
    return{config,weights};
}

int main(int argc, char* argv[]){
    std::cout<<"helloworld]n\n";
    std::string filename = argv[1];
    std::pair<std::map<std::string,int>,std::map<std::string,std::vector<double>>> model = load_gguf_model(argv[1]);

    return 0;
}

make file:

CC=gcc
CXX=g++
CFLAGS=-march=native -ffast-math -g -ggdb -Wall -W -pedantic -O3
INCLUDES=-I./gguf-tools

OBJECTS=gguf-tools/gguflib.o gguf-tools/sds.o gguf-tools/fp16.o

main: $(OBJECTS) main.cpp
	$(CXX) $(CFLAGS) $(INCLUDES) main.cpp $(OBJECTS) -o main

%.o: %.c
	$(CC) $(CFLAGS) $(INCLUDES) -c $< -o $@

clean:
	rm -f main $(OBJECTS)

I am also implementing the model conversion logic as pointed in POC, will update as i finish it.
Thank you

11happy · 2025-02-15T14:56:31Z

@Captain-MUDIT would love to collaborate with you, lets wait for the Alex's response & then we can decide how to proceed.

Captain-MUDIT · 2025-02-16T02:02:53Z

Sure @11happy

Captain-MUDIT · 2025-02-16T07:42:58Z

@AlexKoff88
.take

github-actions · 2025-02-16T07:43:12Z

Thanks for being interested in this issue. It looks like this ticket is already assigned to a contributor. Please communicate with the assigned contributor to confirm the status of the issue.

AlexKoff88 · 2025-02-18T06:10:00Z

Hi @11happy and @Captain-MUDIT, thank you for your interest.

@11happy, regarding your question about how to load the GGUF in the right way. You can go with how MLX does it so you can add "gguf-tool" to submodules and borrow the code from MLX that parses GGUF with "gguf-tool". Details are here: https://github.com/ml-explore/mlx/blob/main/mlx/io/gguf.cpp

@Captain-MUDIT, you can take the tokenizer conversion part. The task is to transform GGUF tokenizer data to OpenVINO tokenizer. OpenVINO has a dedicated project for converting tokenizers from HF Transformers to OpenVINO format: https://github.com/openvinotoolkit/openvino_tokenizers. The idea is to take tokenizer config, vocab and metadata and use a part of openvino_tokenizers lib to do the conversion. Adding @apaniukov for consultations.

janviisonii23 · 2025-02-18T07:01:52Z

@AlexKoff88 can I also work on this issue?

AlexKoff88 · 2025-02-18T07:15:57Z

@janviisonii23, you can as there are a few subtasks in it but you have to wait a bit until the core part is implemented.

apaniukov · 2025-02-18T12:01:26Z

@11happy @Captain-MUDIT

There is a TokenizerPipeline class for building tokenizer/detokenizer models. The easiest way is to parse the tokenizer data from .gguf file, make such a pipeline and get models from it, see HF-tiktoken tokenizer example.
You can get an example of what steps are created, by checking a steps attribute of the pipeline object that is created here by converting gguf tokenizer using HF AutoTokenizer class. Note that the resulting tokenizer might not accurately represent GGUF tokenizer because each conversion step (GGUF $\rightarrow$ HF $\rightarrow$ OV) might introduce some errors.

The other way is to build the tokenizer by creating the model graph directly, like in this RWKV example.

You might also have to create several base pipelines for different tokenizer types:

AlexKoff88 added this to Good first issues Feb 3, 2025

AlexKoff88 converted this from a draft issue Feb 3, 2025

ilya-lavrenov added the good first issue Good for newcomers label Feb 3, 2025

ilya-lavrenov changed the title ~~Create a GGUF reader~~ [Good First Issue]: Create a GGUF reader Feb 3, 2025

github-actions bot assigned 11happy Feb 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Good First Issue]: Create a GGUF reader #1665

[Good First Issue]: Create a GGUF reader #1665

AlexKoff88 commented Feb 3, 2025

Geeks-Sid commented Feb 3, 2025

AlexKoff88 commented Feb 4, 2025 •

edited

Loading

11happy commented Feb 12, 2025

github-actions bot commented Feb 12, 2025

Captain-MUDIT commented Feb 15, 2025

11happy commented Feb 15, 2025 •

edited

Loading

11happy commented Feb 15, 2025 •

edited

Loading

Captain-MUDIT commented Feb 16, 2025 •

edited

Loading

Captain-MUDIT commented Feb 16, 2025

github-actions bot commented Feb 16, 2025

AlexKoff88 commented Feb 18, 2025

janviisonii23 commented Feb 18, 2025

AlexKoff88 commented Feb 18, 2025

apaniukov commented Feb 18, 2025

[Good First Issue]: Create a GGUF reader #1665

[Good First Issue]: Create a GGUF reader #1665

Comments

AlexKoff88 commented Feb 3, 2025

Geeks-Sid commented Feb 3, 2025

AlexKoff88 commented Feb 4, 2025 • edited Loading

11happy commented Feb 12, 2025

github-actions bot commented Feb 12, 2025

Captain-MUDIT commented Feb 15, 2025

11happy commented Feb 15, 2025 • edited Loading

11happy commented Feb 15, 2025 • edited Loading

Captain-MUDIT commented Feb 16, 2025 • edited Loading

Captain-MUDIT commented Feb 16, 2025

github-actions bot commented Feb 16, 2025

AlexKoff88 commented Feb 18, 2025

janviisonii23 commented Feb 18, 2025

AlexKoff88 commented Feb 18, 2025

apaniukov commented Feb 18, 2025

AlexKoff88 commented Feb 4, 2025 •

edited

Loading

11happy commented Feb 15, 2025 •

edited

Loading

11happy commented Feb 15, 2025 •

edited

Loading

Captain-MUDIT commented Feb 16, 2025 •

edited

Loading