You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have pretty much reached peak current-gen SIMD-ifying of all the blitting in pygame-ce now.
The next obvious place to take these efforts is the transform submodule as that also tends to do pixel by pixel changes to large blocks of pixels. SIMD makes everything faster and pretty much every platform we are on has SSE2 equivalent (NEON on Arm) level SIMD capability, often allowing us to do math operations on at least one entire pixel at a time (4 channels) rather than per channel of a pixel - and sometimes even multiple pixels at the same time.
Their are two current-gen tiers of SIMD we are pursuing:
Base level - SSE2/NEON - Available almost everywhere pygame-ce is deployed (100% on the steam hardware survey).
Enhanced - AVX2 - 91.24% overall availability on the steam hardware survey (majority Windows (92%) & Linux (93%), only at 31% on Mac).
Basic support for adding SIMD versions of transform functions has already been added.
These are the current possibly SIMD-able transforms available in the sub-module as a checklist:
pygame.transform.rotate | rotate an image
pygame.transform.rotozoom | filtered scale and rotation
pygame.transform.scale2x | specialized image doubler (Is this still relevant? - needs performance metrics, looks like vendored scalar code)
pygame.transform.threshold | finds which, and how many pixels in a surface are within a threshold of a 'search_color' or a
These functions are probably not SIMD-able (please tell me if you think I'm wrong):
pygame.transform.flip | flip vertically and horizontally - no math, just mem copying.
pygame.transform.scale | resize to new resolution - directly uses a single SDL function - SDL_SoftStretch() to do the scale.
pygame.transform.scale_by | resize to new resolution, using scalar(s) - directly uses a single SDL function - SDL_SoftStretch() to do the scale.
I suspect invert would be the easiest one to SIMD first as it is an alpha mask and a single 255 subtraction operation across the RGB channels which should scale up very smoothly to multipixel.
Good references for invert would be the new grayscale which has had the general structure optimised for a classic surface transform (i.e. in most cases all the pixels in a surface to be transformed will be contiguous in memory and outputting to a new surface - also contiguous in memory - unlike a blit which is most often a discontinuous rows in the middle of a surface being changed by blitting a contiguous chunk of pixels on top of them). Then blit_blend_rgb_sub_avx2() for the alpha masking and actual subtraction operation.
The text was updated successfully, but these errors were encountered:
These functions are probably not SIMD-able (please tell me if you think I'm wrong):
pygame.transform.flip | flip vertically and horizontally - no math, just mem copying.
This is kinda similar what I did when I tried to SIMD draw module. And replacing memcpy with _mm256_storeu_si256 gave me solid performance boost. Now I am not sure if flip is the same thing, but it could have potential.
We have pretty much reached peak current-gen SIMD-ifying of all the blitting in pygame-ce now.
The next obvious place to take these efforts is the transform submodule as that also tends to do pixel by pixel changes to large blocks of pixels. SIMD makes everything faster and pretty much every platform we are on has SSE2 equivalent (NEON on Arm) level SIMD capability, often allowing us to do math operations on at least one entire pixel at a time (4 channels) rather than per channel of a pixel - and sometimes even multiple pixels at the same time.
Their are two current-gen tiers of SIMD we are pursuing:
Basic support for adding SIMD versions of transform functions has already been added.
These are the current possibly SIMD-able transforms available in the sub-module as a checklist:
These functions are probably not SIMD-able (please tell me if you think I'm wrong):
SDL_SoftStretch()
to do the scale.SDL_SoftStretch()
to do the scale.I suspect invert would be the easiest one to SIMD first as it is an alpha mask and a single 255 subtraction operation across the RGB channels which should scale up very smoothly to multipixel.
Good references for invert would be the new grayscale which has had the general structure optimised for a classic surface transform (i.e. in most cases all the pixels in a surface to be transformed will be contiguous in memory and outputting to a new surface - also contiguous in memory - unlike a blit which is most often a discontinuous rows in the middle of a surface being changed by blitting a contiguous chunk of pixels on top of them). Then
blit_blend_rgb_sub_avx2()
for the alpha masking and actual subtraction operation.The text was updated successfully, but these errors were encountered: