-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xorgxrdp uses high CPU in CRC #300
Comments
I found a benchmark comparing CRC algorighm. |
Yup this has been a known issue with RFX for a while. The CRC hashing algorithm is slow. One way to address this has been #167, (using other hashing algorithms) which I incorporated into We need to investigate other hashing codecs that will be better, and I think there is also room to use SIMD optimized hashes for even more performance. |
If you'd like I can get the relevant bits from #167 rebased again off of the latest and we can see if it helps? |
When I run that page with Xorg glamor, it's using almost no CPU, hehe. |
I got working SSE2 accel for librfxcodec progressive. I still have to accel the RGB to YUV in Xorg. I think the CRC change to hash and SSE improvements should really help. |
Thank you both. I remembered #167 now. I thought CRC is required by protocol specification but it sounds no. We can try whatever fast hash if there is no bound to use CRC. I think we should use libraries which has built-in SIMD accelerations. I'll try some hash algorithms including wyhash. |
I see CRC checking is skipped using glamor. Lines 608 to 621 in af43a8b
I have raised #301. Wyhash reduces CPU usage by 1/2 to 1/3. |
@jsorg71 It turned out this is not a regression. This issue is reported by an enterprise user. They reported this as a regression between v0.9 and v0.10 however they're using xorgxrdp with the wyhash patch applied in their internal build. The difference of CPU usage is almost caused by wyhash VS CRC. |
The CRC is still done in rdpEGL.c, it just done in the GPU. One reason I didn't like wyhash algro is that it produced a 64 bit hash. The largest you can return with GPU is 32 bit(one pixel). I was looking at alder32 but that was getting too many false hits. Then I kinda ran out of steam on the topic. But it's ok, we can do one thing with CPU and another with gpu. Speed is most important here. |
We can also try xxhash. It is also a fast hash algorithm. and there is XXH32 variant that returns 32-bit hash. |
Quick test, XXH32 is slower than XXH64 and XXH128. I ran
My processor is Ryzen 7 5700G.
|
I did a benchmark to find out which hash algorithm is faster. wyhash with lazy color convert is a winner. Howver, wyhash and xxhash64 are very close. The result may be different if I implemented lazy color convert to xxhash64. The result is:
Environment
Test methodTesting #301, #302 and vanilla v0.10.0.
See CPU usage with The screen that I'm testing looks like this: Result
|
Here's an additional result xxhash64 w/lazy.
|
wyhashpros
cons
xxhash64pros
cons
|
Wyhash has been merged (#301). |
GFX (v0.10) uses CPU significantly higher compared to v0.9.
Sample web page to reproduce:
The page draws random rectangles to a canvas, then does the same after clicking "Press to toggle" but draws white on white so it is invisible screen updates. The hotspot is
crc_process_data
.I'm still investigating why GFX does more CRC calculation and if we can reduce CRC calculation or accelerate using a faster algorithm or library such as crc32c.
@jsorg71 @Nexarian Any thoughts?
The text was updated successfully, but these errors were encountered: