Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIMD surface.fill #2657

Closed
wants to merge 2 commits into from
Closed

Conversation

MightyJosip
Copy link
Contributor

This should improve the performance of simple surface.fill

Test code:

from timeit import repeat
import pygame

pygame.init()
surf = pygame.Surface((800, 600))
G = globals()

teststr = "surf.fill((24, 24, 24))"
l = [min(repeat(teststr, globals=G, number=1000, repeat=10)) for _ in range(5)]
print(f"fill: {sum(l) / len(l)}")

My results

pygame-ce 2.5.0.dev1 (SDL 2.28.5, Python 3.11.0)
fill: 0.03824264000868425

pygame-ce 2.4.0 (SDL 2.28.5, Python 3.11.0)
fill: 0.08401216000784188

@MightyJosip MightyJosip requested a review from a team as a code owner January 5, 2024 19:19
@itzpr3d4t0r
Copy link
Member

Cool to see you venture into SIMD! Unfortunately I think you just found out something I've discovered a while ago, that is going for an AVX/SSE naive strategy for optimizing the fill algorithm (see some of my findings here: #2390). That strategy is fine for small blits but it doesn't scale whell with surface size due to cache pollution. In the graph below the current best SDL version is the SDL (SSE) one that uses stream instructions to minimize cache pollutions and what you are going to is the AVX2 naive one. This should be investigated further but I'm pretty sure this is the case.

all_together

@MightyJosip
Copy link
Contributor Author

oh yea you are right

@MightyJosip MightyJosip closed this Jan 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants