-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avx512 Validation #45
Conversation
@Nick-Nuon Obviously, we can fold this in if it is ready. :-) |
int start_point = processedLength; | ||
|
||
// The block goes from processedLength to processedLength/16*16. | ||
int asciibytes = 0; // number of ascii bytes in the block (could also be called n1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see the other PR, we do not need asciibytes
. ;-)
// important: we just update asciibytes if there was no error. | ||
// We count the number of ascii bytes in the block using just some simple arithmetic | ||
// and no expensive operation: | ||
asciibytes += (int)(64 - Popcnt.X64.PopCount(mask)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't actually need to update asciibytes
. :-)
return invalidBytePointer; | ||
} | ||
prevIncomplete = Vector512<byte>.Zero; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you look at the other PR, I do some extra work right after...
prevIncomplete = Vector512<byte>.Zero;
... because in a lot of data files, we might have streams of ASCII. So I think we need to grab the twitter.json file and run tests to see whether my optimization applies here. I suspect it does.
(It is quite simple.)
|
||
// We skip any ASCII characters at the start of the buffer | ||
int asciirun = 0; | ||
for (; asciirun + 128 <= inputLength; asciirun += 128) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here you unroll in blocks of 128 bytes which might be the right choice, but we would like to run some checks. We should be certain that 128 bytes is better than 64 bytes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow. I was giving up on the AVX-512 because I was concerned we would need to mess around and we would get bad results, but we do not. The results are actually quite good.
Very good work. I am impressed !!!
Ok. Now we have a good problem in that we have two PRs to merge. But that's not overly difficult.
❤️ |
I am merging your PR. Let us rebase the other one. |
Glad to be of service! ^_^ |
Here is the Avx512 Validation.
there are things that probably could be further polished ( the C++ code uses the ternary_logic instruction which could save us a further few instructions, there are also instructions that the API doesn't seem to expose ) but in the spirit of smaller more frequent PRs as well as recent comments:it works.
EDIT: I saw #44 shortly after publishing this one.
These are the benchmarks on the server:
Almost there!