Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about int8 quantification. #39

Open
zzczzc20 opened this issue Sep 12, 2023 · 2 comments
Open

Questions about int8 quantification. #39

zzczzc20 opened this issue Sep 12, 2023 · 2 comments

Comments

@zzczzc20
Copy link

Dear authors,
I am a beginner to the project. And I check the code in "include/rwkv/cuda/rwkv.cu". If my understanding is correct, only the computation inside functions cudac_mm8_one and cuda_mm8_threec are related to int8 quantification and the results are float point numbers. But the calculation in sigmoid and kernel_wkvc_forward are done in float point numbers.

My question is why are these parts not quantified? I have heard of some methods which can quantify the non-linear function with a look-up table. Considering the low speed of exp() function. Is there any methods to replace them with fast substitution?

Best,
zzczzc20

@harrisonvanderbyl
Copy link
Owner

Hi!
To answer your question: it's a matter of scale.
The mm8 matvec are n^2 operations, so int8 is used to minimise cuda memory usage.

There's no appreciable memory advantage to quantizing other operations.

If you can get faster inference using the methods outlined, please submit a PR :)

@zzczzc20
Copy link
Author

Hi, I am willing to look into it. But I need to benchmark the model so that I can measure the quant loss. Is there any easy method to benchmark the model? Which benchmark do you use? Thanks very much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants