-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[rmkit][fb] speed up draw_rect functions #78
Conversation
e855899
to
0527b4c
Compare
Picking up where #73 (comment) left off
I did 3 "7x7 easy" lightup puzzles each basically, so manual interaction. I just added saves, so it might be possible to automate it by loading a save file and redrawing a couple times, or something like that. I just tried running a perf build with these changes -- I'm building with -O2, so the biggest difference I'm seeing is that do_dithering isn't inlined with this change. Honestly I'm not totally sure how to interpret these results since the draw_rect time got split up b/c of the inlining change, but it looks like performance is similar, (or maybe a little worse?) based on the % time column. without this diff
with this diff
|
src/rmkit/fb/fb.cpy
Outdated
update_dirty(dirty_area, o_x, o_y) | ||
update_dirty(dirty_area, o_x+w, o_y+h) | ||
|
||
_draw_rect_fast(o_x, o_y, w, h, color, fill, dither) | ||
|
||
inline void _draw_rect_fast(int o_x, o_y, w, h, color, fill=true, float dither=1.0): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do see a huge reduction in the number of times update_dirty is called, which seems like a net benefit in any case, so this looks like a helpful part for sure.
src/rmkit/fb/fb.cpy
Outdated
@@ -256,19 +265,30 @@ namespace framebuffer: | |||
if o_y >= self.height || o_x >= self.width || o_y < 0 || o_x < 0: | |||
return | |||
|
|||
update_dirty(dirty_area, o_x, o_y) | |||
update_dirty(dirty_area, o_x+w, o_y+h) | |||
if likely(fill): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like almost every function calls this with fill=true
. I think the only place where fill=false
would happen is in the original draw_rect
itself right?
What about making draw_rect
include something like this? (I haven't tested, just a thought)
if fill:
_draw_rect_fast(x, y, w, h, ...)
else:
_draw_rect_fast(x, y, w, 1, ...) // top
_draw_rect_fast(x, y+h-1, w, 1, ...) // bottom
_draw_rect_fast(x, y, 1, h, ...) // left
_draw_rect_fast(x+w-1, y, 1, h, ...) // right
The main difference I think is that do_dithering could still be inlined in _draw_rect_fast
since it would only have one call site.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that looks good to me
src/rmkit/fb/fb.cpy
Outdated
if i+o_x >= self.width: | ||
break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder how expensive these comparisons are (they might not be at all) compared to:
w = min(w, self.width-x)
h = min(h, self.height-y)
for j 0 h:
for i 0 w:
do_dithering(self.fbmem, i+o_x, j+o_y, color, dither)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i had a little trouble with this just now (and had trouble the first couple times i tried it a while back)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, I'm seeing a modest speedup with this now. I switched my profiling build to -O1 since -O2 was giving me gibberish, idk if it's from inlining or arm branch prediction or what.
In any case, I'm seeing that update_dirty has dropped off from about 3% total time to 0% since it's called over 2 orders of magnitude less. And I'm seeing about a 20-25% speedup in draw_rect which is pretty good!
FWIW I also tried a build where I replaced the do_dithering call with _set_pixel directly, and that saved another 50% or so. We probably aren't at a point where we can do that just yet, but since draw_rect is still by far the most expensive function, it might be worth seeing if the dithering code could be adapted to support more generic patterns/brushes. I've thought more about that and I think I have a decent plan in mind.
No description provided.