You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Dear authors,
I am a beginner to the project. And I check the code in "include/rwkv/cuda/rwkv.cu". If my understanding is correct, only the computation inside functions cudac_mm8_one and cuda_mm8_threec are related to int8 quantification and the results are float point numbers. But the calculation in sigmoid and kernel_wkvc_forward are done in float point numbers.
My question is why are these parts not quantified? I have heard of some methods which can quantify the non-linear function with a look-up table. Considering the low speed of exp() function. Is there any methods to replace them with fast substitution?
Best,
zzczzc20
The text was updated successfully, but these errors were encountered:
Hi, I am willing to look into it. But I need to benchmark the model so that I can measure the quant loss. Is there any easy method to benchmark the model? Which benchmark do you use? Thanks very much.
Dear authors,
I am a beginner to the project. And I check the code in "include/rwkv/cuda/rwkv.cu". If my understanding is correct, only the computation inside functions cudac_mm8_one and cuda_mm8_threec are related to int8 quantification and the results are float point numbers. But the calculation in sigmoid and kernel_wkvc_forward are done in float point numbers.
My question is why are these parts not quantified? I have heard of some methods which can quantify the non-linear function with a look-up table. Considering the low speed of exp() function. Is there any methods to replace them with fast substitution?
Best,
zzczzc20
The text was updated successfully, but these errors were encountered: