Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PrepareB but take integers instead of float #42

Open
kpu opened this issue Nov 28, 2019 · 6 comments
Open

PrepareB but take integers instead of float #42

kpu opened this issue Nov 28, 2019 · 6 comments
Assignees

Comments

@kpu
Copy link
Owner

kpu commented Nov 28, 2019

The current PrepareB function combines quantization and rearrangement. The rearragement is dependent on register length. We're going to want to distribute int8 models in an architecture-independent fashion (probably as row major) then have them rearranged at load. The Quantize function already converts to int8 format without rearranging. So what's needed is an int8 rearrangement function.

Possibly with a preprocessing template, though that sounds complicated.

Also worth considering if this should be done in-place or copying.

@kpu kpu self-assigned this Nov 28, 2019
@kpu kpu removed their assignment Dec 10, 2019
@mateuszchudyk mateuszchudyk self-assigned this Dec 10, 2019
@mateuszchudyk
Copy link
Collaborator

mateuszchudyk commented Jan 20, 2020

Prepare B if B is quantized and transposed:
https://github.com/kpu/intgemm/tree/prepare-b-quantized-transposed

Prepare B if B is transposed
https://github.com/kpu/intgemm/tree/prepare-b-transposed

I think we can merge them to the master first and then try to do some optimizations.

@kpu
Copy link
Owner Author

kpu commented Jan 20, 2020

Ooh

@kpu
Copy link
Owner Author

kpu commented Jan 21, 2020

Merged prepare-b-quantized-transposed in 03a4a9d

@XapaJIaMnu
Copy link
Collaborator

We need prepareB if B is only quantized too.

@XapaJIaMnu
Copy link
Collaborator

Also, a slight enhancement, it would be nice (and probably more important from performance point of view) to have transpose and Quantize for prepareA. The affine and dot operators take transA and transB as a parameter. B is cached, so it's not a big deal, but A is not, which means that there would be two memory accesses to A. If we have quantizeAndTranspose that would solve it.

@mateuszchudyk
Copy link
Collaborator

So we need all combinations?:

  • PrepareB if B is quantized and transposed
  • PrepareB if B is only transposed
  • PrepareB if B is only quantized

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants