Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[rmkit][fb] speed up draw_rect functions #78

Merged
merged 2 commits into from
Feb 4, 2021
Merged

Conversation

raisjn
Copy link
Member

@raisjn raisjn commented Feb 3, 2021

No description provided.

@mrichards42
Copy link
Collaborator

Picking up where #73 (comment) left off

i am unsure of how you did your profiling cases above - is it just a single render? (or X renders?) of a game scene? or does it involve manual interaction with the game?

I did 3 "7x7 easy" lightup puzzles each basically, so manual interaction. I just added saves, so it might be possible to automate it by loading a save file and redrawing a couple times, or something like that.

I just tried running a perf build with these changes -- I'm building with -O2, so the biggest difference I'm seeing is that do_dithering isn't inlined with this change. Honestly I'm not totally sure how to interpret these results since the draw_rect time got split up b/c of the inlining change, but it looks like performance is similar, (or maybe a little worse?) based on the % time column.

without this diff
Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 50.67      1.13     1.13  1079395     0.00     0.00  framebuffer::FB::draw_rect(int, int, int, int, int, int, float)
 27.80      1.75     0.62 27739995     0.00     0.00  framebuffer::DITHER::BAYER_2(int, int, unsigned short)
  6.28      2.04     0.14      139     1.01     1.04  stbtext::render_text(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, image_data&, int)
  4.04      2.13     0.09 32927523     0.00     0.00  framebuffer::DITHER::NONE(int, int, unsigned short)
  1.35      2.16     0.03  2160376     0.00     0.00  framebuffer::FB::update_dirty(framebuffer::FBRect&, int, int)
  0.90      2.18     0.02       75     0.27     0.27  PuzzleDrawer::draw_circle(int, int, int, int, int)
  0.90      2.20     0.02       51     0.39     1.44  PuzzleDrawer::draw_text(int, int, int, int, int, int, char const*)
  0.45      2.21     0.01     2132     0.00     0.00  framebuffer::FB::draw_line(int, int, int, int, int, int, float)
with this diff
Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 50.54      1.87     1.87 65088970     0.00     0.00  framebuffer::FB::do_dithering(unsigned short*, int, int, int, float)
 22.16      2.69     0.82  1400926     0.00     0.00  framebuffer::FB::_draw_rect_fast(int, int, int, int, int, int, float)
 20.27      3.44     0.75 32321456     0.00     0.00  framebuffer::DITHER::BAYER_2(int, int, unsigned short)
  3.51      3.57     0.13      155     0.84     0.84  stbtext::render_text(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, image_data&, int)
  1.62      3.63     0.06 32910037     0.00     0.00  framebuffer::DITHER::NONE(int, int, unsigned short)
  0.27      3.64     0.01      106     0.09     0.09  framebuffer::FB::draw_circle_filled(int, int, int, int, int)
  0.27      3.66     0.01       69     0.14     0.98  PuzzleDrawer::draw_text(int, int, int, int, int, int, char const*)
  0.00      3.70     0.00     8743     0.00     0.00  framebuffer::FB::update_dirty(framebuffer::FBRect&, int, int)
  0.00      3.70     0.00     2200     0.00     0.00  framebuffer::FB::draw_line(int, int, int, int, int, int, float)

Comment on lines 254 to 259
update_dirty(dirty_area, o_x, o_y)
update_dirty(dirty_area, o_x+w, o_y+h)

_draw_rect_fast(o_x, o_y, w, h, color, fill, dither)

inline void _draw_rect_fast(int o_x, o_y, w, h, color, fill=true, float dither=1.0):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do see a huge reduction in the number of times update_dirty is called, which seems like a net benefit in any case, so this looks like a helpful part for sure.

@@ -256,19 +265,30 @@ namespace framebuffer:
if o_y >= self.height || o_x >= self.width || o_y < 0 || o_x < 0:
return

update_dirty(dirty_area, o_x, o_y)
update_dirty(dirty_area, o_x+w, o_y+h)
if likely(fill):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like almost every function calls this with fill=true. I think the only place where fill=false would happen is in the original draw_rect itself right?

What about making draw_rect include something like this? (I haven't tested, just a thought)

if fill:
  _draw_rect_fast(x, y, w, h, ...)
else:
  _draw_rect_fast(x, y, w, 1, ...)     // top
  _draw_rect_fast(x, y+h-1, w, 1, ...) // bottom
  _draw_rect_fast(x, y, 1, h, ...)     // left
  _draw_rect_fast(x+w-1, y, 1, h, ...) // right

The main difference I think is that do_dithering could still be inlined in _draw_rect_fast since it would only have one call site.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that looks good to me

Comment on lines 274 to 275
if i+o_x >= self.width:
break
Copy link
Collaborator

@mrichards42 mrichards42 Feb 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder how expensive these comparisons are (they might not be at all) compared to:

w = min(w, self.width-x)
h = min(h, self.height-y)
for j 0 h:
  for i 0 w:
    do_dithering(self.fbmem, i+o_x, j+o_y, color, dither)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i had a little trouble with this just now (and had trouble the first couple times i tried it a while back)

Copy link
Collaborator

@mrichards42 mrichards42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, I'm seeing a modest speedup with this now. I switched my profiling build to -O1 since -O2 was giving me gibberish, idk if it's from inlining or arm branch prediction or what.

In any case, I'm seeing that update_dirty has dropped off from about 3% total time to 0% since it's called over 2 orders of magnitude less. And I'm seeing about a 20-25% speedup in draw_rect which is pretty good!

FWIW I also tried a build where I replaced the do_dithering call with _set_pixel directly, and that saved another 50% or so. We probably aren't at a point where we can do that just yet, but since draw_rect is still by far the most expensive function, it might be worth seeing if the dithering code could be adapted to support more generic patterns/brushes. I've thought more about that and I think I have a decent plan in mind.

@raisjn raisjn merged commit 3bceb7d into master Feb 4, 2021
@raisjn raisjn deleted the faster_draw_rect branch February 7, 2021 13:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants