Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with integrating with lm-eval harness #97

Open
1 task
sriyachakravarthy opened this issue Jul 23, 2024 · 7 comments
Open
1 task

Issue with integrating with lm-eval harness #97

sriyachakravarthy opened this issue Jul 23, 2024 · 7 comments

Comments

@sriyachakravarthy
Copy link

sriyachakravarthy commented Jul 23, 2024

Hi! I tried evaluating 1bitLLM/bitnet_b1_58-3B from hugging face. i am getting the error ValueError: Tokenizer class BitnetTokenizer does not exist or is not currently imported.
Kindly help!

Tasks

@LeiWang1999
Copy link
Contributor

Hi @sriyachakravarthy , Would you mind provide the scripts to reproduce?

@LeiWang1999
Copy link
Contributor

you may also want to checkout https://github.com/LeiWang1999/vllm-bitblas/tree/bitblas-intg with

with VllmRunner(
    "BitBLASModel/open_llama_3b_1.58bits_bitblas",
    dtype="half",
    quantization="bitblas",
    enforce_eager=False,
) as bitnet_model:
    prompt = ""
    for i in range(0, in_seq_len):
        prompt += "a "
 
    prompts = [prompt] * batch_size
    from vllm import SamplingParams
 
    sampling_params = SamplingParams(max_tokens=out_seq_len)
    torch.cuda.profiler.start()
    bitbnet_outputs = bitnet_model.generate(
        prompts, sampling_params=sampling_params
    )
    torch.cuda.profiler.stop()

Which is much more faster than the naive integration implementation.

@sriyachakravarthy
Copy link
Author

Hi @sriyachakravarthy , Would you mind provide the scripts to reproduce?

Sure, here is the script.

%pip install git+https://github.com/EleutherAI/lm-evaluation-harness.git@big-refactor

!lm_eval --model hf
--model_args pretrained=BitBLASModel/open_llama_3b_1.58bits_bitblas
--tasks hellaswag
--device cuda:0
--batch_size 8

and when i am trying to use instructions from model card(https://huggingface.co/1bitLLM/bitnet_b1_58-3B), i am getting the following:
$ python3 eval_ppl.py --hf_path 1bitLLM/bitnet_b1_58-3B --seqlen 2048
Traceback (most recent call last):
File "eval_ppl.py", line 7, in
from modeling_bitnet import BitnetForCausalLM
File "/home/sriyar/bitnet/modeling_bitnet.py", line 51, in
from .configuration_bitnet import BitnetConfig
ImportError: attempted relative import with no known parent package
$

@sriyachakravarthy
Copy link
Author

you may also want to checkout https://github.com/LeiWang1999/vllm-bitblas/tree/bitblas-intg with

with VllmRunner(
    "BitBLASModel/open_llama_3b_1.58bits_bitblas",
    dtype="half",
    quantization="bitblas",
    enforce_eager=False,
) as bitnet_model:
    prompt = ""
    for i in range(0, in_seq_len):
        prompt += "a "
 
    prompts = [prompt] * batch_size
    from vllm import SamplingParams
 
    sampling_params = SamplingParams(max_tokens=out_seq_len)
    torch.cuda.profiler.start()
    bitbnet_outputs = bitnet_model.generate(
        prompts, sampling_params=sampling_params
    )
    torch.cuda.profiler.stop()

Which is much more faster than the naive integration implementation.

Sure, will do

@LeiWang1999
Copy link
Contributor

LeiWang1999 commented Jul 23, 2024

Hi @sriyachakravarthy , Would you mind provide the scripts to reproduce?

Sure, here is the script.

%pip install git+https://github.com/EleutherAI/lm-evaluation-harness.git@big-refactor

!lm_eval --model hf --model_args pretrained=BitBLASModel/open_llama_3b_1.58bits_bitblas --tasks hellaswag --device cuda:0 --batch_size 8

and when i am trying to use instructions from model card(https://huggingface.co/1bitLLM/bitnet_b1_58-3B), i am getting the following: $ python3 eval_ppl.py --hf_path 1bitLLM/bitnet_b1_58-3B --seqlen 2048 Traceback (most recent call last): File "eval_ppl.py", line 7, in from modeling_bitnet import BitnetForCausalLM File "/home/sriyar/bitnet/modeling_bitnet.py", line 51, in from .configuration_bitnet import BitnetConfig ImportError: attempted relative import with no known parent package $

The code there was not provided by bitblas, checkout the integration under https://github.com/microsoft/BitBLAS/tree/main/integration/BitNet .

btw, some benchmark numbers of 1.58bits vllm

<style> </style>
    Token Per Second(tok/s)  
model framework BS16IN32OUT128 BS1IN512OUT1024
openllama-3b-1.58bits pytorch 106.83 49.34
openllama-3b-1.58bits pytorch-bitblas 240.33 103.09
openllama-3b-1.58bits vllm-bitblas 379.25 117.43
openllama-3b-1.58bits vllm-bitblas-cuda-graph 2543.58 1621.08

@sriyachakravarthy
Copy link
Author

Thanks! Also, is the transformer trainer package compatible for fine tuning the model?

@LeiWang1999
Copy link
Contributor

@sriyachakravarthy Sry, I have no experience with that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants