Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiply function could be 20% faster and smaller #194

Open
nickovs opened this issue Jun 27, 2021 · 2 comments
Open

Multiply function could be 20% faster and smaller #194

nickovs opened this issue Jun 27, 2021 · 2 comments

Comments

@nickovs
Copy link

nickovs commented Jun 27, 2021

All of the constants needed by AES for the inversion of the MixColumns function fit in 4 bits, but both the function and macro versions of the Multiply() function support 5 bits. The size and performance of this could be improved by removing the case when ((y >>4) & 1) is set, since this can never happen with the constants used.

@nickovs
Copy link
Author

nickovs commented Jun 27, 2021

I just noticed the comment about vectorisation and the last xtime() call making the code smaller. That said, given that the constants used in the multiplication are all small, you can efficiently separate the polynomial multiplication and polynomial reduction and handle the reduction using a tiny, 8 value lookup table. The code below significantly reduces the code size when MULTIPLY_AS_A_FUNCTION is defined (by more than 350 bytes on an Intel x64 CPU using gcc).

static const uint8_t poly_reduce_mask[8] = {0, 27, 54, 45, 108, 119, 90, 65};
static uint8_t Multiply(uint8_t x, uint8_t y)
{
  uint16_t r = ((y & 1) * x) ^ ((y & 2) * x) ^ ((y & 4) * x) ^ ((y & 8) * x);
  return (uint8_t)(r ^ poly_reduce_mask[r>>8]);
}

@skater-boy
Copy link

Tested on Cortex M0+ 32-bit microcontroller, gcc with -Os switch and got a 24 bytes reduction in the code size. Not huge reduction but I think to keep your suggestion.
Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants