Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plotter and waterfall performance enhancements #1383

Merged
merged 17 commits into from
Oct 19, 2024

Conversation

yuzawa-san
Copy link
Contributor

i run this application on my laptop and find that the CPU is quite high, so i did some flame graph analysis to see if i could reduce CPU (to have longer battery life). i had already run volk_profile. i found that actually the rendering was taking up more CPU than the actual GNUradio and DSP operations. i did the following operations:

  • cache meter font so it can be reused
  • update meter less often (only when the 0.1's place needed updating). the extra renders caused excessive blit operations when the CPlotter was drawing but the CMeter was updating the back buffer. they will still happen, but ideally significantly less often. a better solution could be to sync up the CMeter refreshes with the CPlotter, but i held off on that.
  • cache peak circles as QImage so we dont need to calculate ellipse rasters on each frame for each ellipse occurence. the raster is the same and identical so a simple QImage draw will be faster.
  • sliding window waterfall implementation. this eliminates a large memmove on each frame. instead the line is written across the top of a QImage which is 2x height in size and the "window" to be copied is slided up. once the window hits the top, then the top is copied to the bottom and the window sliding starts from the bottom back to the top. see diagram.
  • use polygons for fills (instead of single line fills). each fill was attempted to be done in parallel. but the solution was to do fewer fills.
  • prevent rounding errors. the plotter vs waterfall slider ratio was yielding fractional values sometimes. instead make sure the split is even. this make the drawing simple memmoves instead of a more expensive scale operation. i did this by flipping the math operations to ensure we are working with full pixel values before multiplying by the device pixel ratio.
  • avoid unnecessary fills on mac: unifiedTitleAndToolBarOnMac makes the window translucent which means that the window must be cleared before each render. this is unnecessary since the painting is opaque.
  • use solid colors. the alpha composition was more expensive. by precomputing the colors to be solid colors the composition is faster since we are copying and avoiding a bunch of extra math to do the alpha blending. i made the end result looks basically identical.
  • store QPoint buffers on CPlotter object. allocate them once and reuse them.
  • use opaque flag on widget. we are indeed painting every pixel so this flag should be set. disable autofill background.

sliding window implementation:
image

i ran a test with these settings and docks open:
Screen Shot 2024-10-15 at 20 19 52

before:
2024-10-15_20 19 01-WindowServer_sample txt

after:
2024-10-15_20 50 44-WindowServer_sample txt

notable differences:
savings

i was doing 10fps at 110%, but now that is around 60%.
i was topping out at around 20 fps with around 130% cpu, but now that is around 100% cpu.

now i'm topping out around 40fps with around 110% cpu, so these changes provide a significant improvement.

open question: should the dock audio CPlotter share the same fps as the main CPlotter? it seems to be preset to 25fps (1000/40ms).

@argilo
Copy link
Member

argilo commented Oct 16, 2024

Thanks for these improvements! I'll have a look over them soon.

open question: should the dock audio CPlotter share the same fps as the main CPlotter? it seems to be preset to 25fps (1000/40ms).

As far as I know, the audio plotter has always been fixed at 25 fps, and I haven't really seen much reason to change it or make it adjustable. Perhaps others have opinions on it.

@willcode
Copy link
Contributor

willcode commented Oct 16, 2024

I think the audio plotter rate is meant to be the minimum rate where it looks smooth to the eye.

Nice analysis!

Do you see any effects of the DPR related changes with HiDPI displays?

@yuzawa-san
Copy link
Contributor Author

@willcode i am developing on a high DPR display so I verified that still works. by reordering the maths to do the DPR multiplication last, i can ensure the alignment in device independent pixels in the drawPixmap operation. this means a straight copy should always be done instead of a scale (in those odd cases where rounding made them misaligned).

@willcode
Copy link
Contributor

OK. We spent a lot of time dealing with "line too thin" or "looks fuzzy" kind of things when reworking the plotter for real coordinates, so we want to look at the results carefully.

@yuzawa-san
Copy link
Contributor Author

@willcode here are some screenshots. note the version in the window's title
Screen Shot 2024-10-16 at 08 49 41
Screen Shot 2024-10-16 at 08 50 23

@argilo
Copy link
Member

argilo commented Oct 16, 2024

Peak circles appear to be drawn one or two pixels too far to the right.

Before:

Screenshot from 2024-10-16 18-53-53

After:

Screenshot from 2024-10-16 18-52-47

@argilo
Copy link
Member

argilo commented Oct 17, 2024

the line is written across the top of a QImage which is 2x height in size and the "window" to be copied is slided up. once the window hits the top, then the top is copied to the bottom

Is it necessary to use 2x height and copy? It might be possible to get the same performance benefit by splitting the drawImage call into two drawImage calls, one to draw from the "current" line to the bottom of the QImage, and a second to draw from the top of the QImage to the line above the "current" line. That would avoid the extra memory usage and eliminate the need for a copy.

@argilo
Copy link
Member

argilo commented Oct 17, 2024

The fill appears to be drawn one pixel lower than the line:

Screenshot from 2024-10-16 20-40-10

@yuzawa-san
Copy link
Contributor Author

i have fixed the peak alignment and off by one fills. i have rebased off of the working master and the checks are all passing now.

i actually thought about doing the waterfall the way you mentioned and thought that the device pixel ratio stuff was going to be a pain around around rounding (when the window slides but not to a full pixel). however, i went ahead and changed it to your way. i found that the device pixel ratio maths in drawPixmap is smart enough to do subpixel level things. e.g. 2.5 pixels (with a device pixel ratio of 2) does indeed resolve to 5 pixels. so this way will work and does work on my retina mac. i verified that the efficient memmove is being used.

@argilo
Copy link
Member

argilo commented Oct 18, 2024

I'm glad to hear that it worked!

In my testing so far, I have observed a decent performance improvement (tested on a Linux laptop and a Raspberry Pi, neither with high-DPI displays). I'll do a bit more testing and have a look over the code as well.

@argilo
Copy link
Member

argilo commented Oct 18, 2024

Peak circles still seem to be drawn too far to the right:

Screenshot from 2024-10-18 17-17-25

@argilo
Copy link
Member

argilo commented Oct 18, 2024

If the "Split Plot/WF" slider (in the "FFT Settings" panel) is dragged all the way to the left (i.e. waterfall only), then the plot remains visible and quickly cycles through two or more sizes in an infinite loop.

@argilo
Copy link
Member

argilo commented Oct 18, 2024

If the "Split Plot/WF" slider (in the "FFT Settings" panel) is dragged all the way to the left (i.e. waterfall only), then the plot remains visible and quickly cycles through two or more sizes in an infinite loop.

It looks like this bug happens because plotWidthT is zero in this case (because the if (!m_2DPixmap.isNull()) block doesn't execute). The result is that the waterfall is not painted at all.

@yuzawa-san
Copy link
Contributor Author

I have fixed both items. I actually was able to test in two device pixel ratios (1 and 2) with an external monitor. The DPR 1 was definitely broken as you reported. The DPR 2 I thought generally looked ok, but I was able to apply a fix to both of them. It had to do with some rounding and the fact that the the center of the circle is technically in the top left of the center pixel, so but nudging the pixmap over by one pixel we are back on alignment.

Here are my screenshots (note i enabled a background rect just for debugging, that has been removed):

DPR 1:
Screen Shot 2024-10-18 at 20 56 15

DPR 2:
Screen Shot 2024-10-18 at 20 56 41

@argilo
Copy link
Member

argilo commented Oct 19, 2024

Thanks. It looks good now.

I'll get this merged soon, as long as no other bugs come up.

@argilo argilo merged commit bf40e76 into gqrx-sdr:master Oct 19, 2024
17 checks passed
@argilo
Copy link
Member

argilo commented Oct 20, 2024

It appears this may have caused a performance regression: #1386

Any idea why that might be @yuzawa-san ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants