Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate possible Flutter engine cause of jitter #45

Open
matthew-carroll opened this issue May 23, 2023 · 77 comments
Open

Investigate possible Flutter engine cause of jitter #45

matthew-carroll opened this issue May 23, 2023 · 77 comments
Labels
bug Something isn't working time_5 Up to 5 hours

Comments

@matthew-carroll
Copy link
Contributor

We think that the cause of panning jitter might be due to behaviors within the Flutter engine. This ticket is to investigate possible causes and explore solutions.

The jitter seems to be more noticeable on higher refresh displays, such as the 90Hz Xiaomi Redmi tablet.

To recreate the jitter in a reproducible manner, use the floating action buttons at the lower right of the screen to animate panning up/down and left/right. You should see fairly frequent jitter, which looks a bit like rapid starting and stopping while panning.

@matthew-carroll matthew-carroll added bug Something isn't working time_5 Up to 5 hours labels May 23, 2023
@RossComputerGuy
Copy link
Collaborator

Which particular example Dart file should I run to investigate this?

@matthew-carroll
Copy link
Contributor Author

There are multiples. But you can start with main_page_list_images_inspector

@RossComputerGuy
Copy link
Collaborator

Alright, I tried main_list and noticed it's not affected but main_minimal_viewport is.

@RossComputerGuy
Copy link
Collaborator

I did some testing and this looks to be related to shaders. Following Flutter's documentation on shader compilation jank seems to have mitigated the problem mostly. I am not sure if that is exactly the issue but it's a lead. But even with using a precompiled shader, I notice in the Flutter DevTools that there's still some jank rendering.

image
image

@matthew-carroll
Copy link
Contributor Author

The jitter problem isn't visible in the profiler, in my experience. That occasional jank frame isn't good, but that's a different problem.

If you watch the UI as the panning animation runs, you'll see numerous jitter frames. And yet you won't see a bunch of back to back jank frames in the profiler. That's part of why this problem is difficult. Flutter doesn't seem to recognize that something isn't right.

@RossComputerGuy
Copy link
Collaborator

RossComputerGuy commented May 24, 2023

Interesting, so even with precompiled shaders it's jittery? I'll look more into it tomorrow by enabling the Skia tracing. Have you tried with running with Impeller? If not, I'll try tomorrow since I saw there seems to be others with this issue on Android and Impeller seems to be unaffected.

@matthew-carroll
Copy link
Contributor Author

I haven't tried this experience with Impeller. We tried our direct PDF renderer with Impeller and it was way worse than Skia. The Android version of Impeller is still considered unstable and experimental. But you're welcome to give it a try.

You'll wanna keep a close eye on the screen as it pans. You might also want to download the Xodo app, which is a PDF renderer. You can pan around in there and feel the difference.

@RossComputerGuy
Copy link
Collaborator

Alright will do, I also came across this page which references a method called saveLayer. Might be worth looking into using that.

@matthew-carroll
Copy link
Contributor Author

In the bitmap PDF renderer we're pushing texture layers, so there's nothing for us to save. That experience has the same jitter as this image based tiling experience.

@RossComputerGuy
Copy link
Collaborator

RossComputerGuy commented May 24, 2023

It appears the jitter is gone or reduced significantly under Impeller.

Edit: looks like there's just a slight bit of jank under Impeller as well but I did notice GC is running every time it scrolls to the next page so that might be worth looking into.

@matthew-carroll
Copy link
Contributor Author

Is there a path forward where you could discover the relevant differences between Skia and Impeller, which lead to the elimination of the jitter? At least to the extent that we might point the Flutter team in that direction?

Eventually Impeller will be the norm, and we'll work under the expectation of using Impeller, but that time hasn't arrived yet for Android. So we're interested in a path that would result in fixes to Flutter's integration with Skia such that this problem goes away.

Do you think there's a path forward for you to help with that?

@RossComputerGuy
Copy link
Collaborator

The biggest change I've read for Impeller is that it compiles shaders before execution. I think if GC could not be executed during scrolling then it could lead to better performance.

@matthew-carroll
Copy link
Contributor Author

Those sound like two very different problems. Is the problem repeated shader compilation? Or is the problem repeated GCs?

@RossComputerGuy
Copy link
Collaborator

RossComputerGuy commented May 24, 2023

I can't trace whether it is the shaders affecting it more than GC. I did tracing and every time it starts doing a large paint, GC is running. GC is likely the large issue but shader recompilation can contribute somewhat.

@matthew-carroll
Copy link
Contributor Author

Can you check the PageListViewport and see if we used the new API to disable GC? I know we used it somewhere, but I can't remember if we put it directly into PageListViewport, or into the bitmap PDF renderer.

If it's not already in PageListViewport, or in the controller, or something, then you could try using that new API to disable GC when panning begins, and then re-enable it a couple seconds after panning stops.

@RossComputerGuy
Copy link
Collaborator

Yeah, I don't know what the new GC API is.

@matthew-carroll
Copy link
Contributor Author

@RossComputerGuy
Copy link
Collaborator

Just checked and the new GC API is not being used at all. There isn't really any bindings to it from what I can see but adding the external references from here should provide access to the right methods so it'll just be a matter of calling those.

@RossComputerGuy
Copy link
Collaborator

Just tried that and those methods aren't available, Flutter is likely using a custom Dart build with specific changes so it's likely they didn't expose the methods. I also tried with Impeller in release mode, it's smooth enough that I can barely tell there's any jitter. Unfortunately, it is hard to determine whether or not the actual FPS is an exact 90FPS consistently.

@matthew-carroll
Copy link
Contributor Author

I'm looking for where we used that GC behavior. It's definitely there. It's just not easy to figure out.

WRT FPS - there's a dev option setting on the tablet where you can display the current frame rate on the screen, however that won't show the jitter, either.

The jitter is something that you have to see with your eyes. There's no other signal that we've found to confirm it's happening.

it's smooth enough that I can barely tell there's any jitter

Ozzie said he tried with Impeller and the jitter is still there. When you run the vertical and horizontal animations, together, and it starts panning around the screen in an oval, you don't see recurring situations where you get a bunch of rapid, small visual stutters?

@RossComputerGuy
Copy link
Collaborator

I'm looking for where we used that GC behavior. It's definitely there. It's just not easy to figure out.

Gotcha, I'm not too familiar with GC so I wasn't sure.

Ozzie said he tried with Impeller and the jitter is still there.

Yeah if I stare long enough at the screen, I can see some jitter but it's not a ton. It seems to be at certain frames. Debug and profile have the most jitter with Impeller, release has it more difficult for me to notice.

@matthew-carroll
Copy link
Contributor Author

If it's easier to reproduce and recognize with Skia, then let's go back to that. Impeller isn't really relevant to us, unless Impeller has no jitter, and you're able to learn something from Impeller.

But we're building against the Skia version. That's where we're suffering the issue, and that's what customers will be stuck with. So we're trying to achieve one of the following:

  1. Ideally, find and fix the engine/skia interaction that's causing the jitter, or
  2. Accumulate sufficient contextual information so that we can point the Flutter team to a narrow problem definition in the hope that they can fix it

Do you feel that you have possible paths to reach those goals, do you feel like it's completely unknown what to do, or where to look?

@RossComputerGuy
Copy link
Collaborator

Alright, I'll see what I can do to figure out how GC is affecting rendering through Skia. If it isn't GC then I'm not sure what else it could be since I haven't been able to get useful information from enabling Skia tracing.

@matthew-carroll
Copy link
Contributor Author

I don't know the structure of the engine at that level, but another thing you might try to look into is when/where/how textures and images are positioned for final rasterization. There's a chance that they might be placed with some kind of rounding error, which could be perceived as jitter. A low chance, but a chance that's what's going on.

WRT GCs, if there are a bunch of GCs happening, it's probably worth understanding why they're happening. If we're panning around in a circle, we're not creating a bunch of new widgets, and we're not loading new images. What's getting recycled?

@RossComputerGuy
Copy link
Collaborator

Yeah, that's ultimately what I'm trying to understand with the GC. I see the memory usage rise from 69MB to 80MB and repeats for every frame. I'll look and see if there's some sort of way of doing verbose GC so I can figure where things are allocating and where things are being disposed of.

@matthew-carroll
Copy link
Contributor Author

Here's where we utilized the GC controls: https://github.com/Flutter-Bounty-Hunters/page_list_viewport/pull/14/files

@matthew-carroll
Copy link
Contributor Author

Also, when it comes to root causing, even if you do something like hack Dart or Flutter, itself, by raising the max memory threshold way up high, that's fine too. We need proof about where this problem is coming from....

@RossComputerGuy
Copy link
Collaborator

Alright, I'll use that. I decided to add a break here and discovered this is called right before scrolling to the next page. Is the PageListViewport class not caching the entries?

@matthew-carroll
Copy link
Contributor Author

BTW, if you haven't already, you should download the Xodo PDF app and pan around in there. We've found that Xodo tends to be extremely smooth, at least in our experience. You might get a better feel for the jitter effect if you experience PDF panning without the jitter.

@RossComputerGuy
Copy link
Collaborator

Hmm alright, I'll try analyzing frames and compare it with Xodo. Slowing the video down to 1/4 speed, I can see that slight jitter you're talking about now. If you record a video using the same command on your device, is the jitter the same or worse than it is for me?

@RossComputerGuy
Copy link
Collaborator

RossComputerGuy commented Jun 2, 2023

Looking into this more, seems like raster is causing the issues. I couldn't get the integrated Perfetto UI to work but I was able to capture a trace.
image
The large DrawFrames event is 10ms, could that be a lead onto this issue?

@matthew-carroll
Copy link
Contributor Author

Your guess is as good as mine. My first question would be how often does this situation happen in the trace? We know that we're seeing jitter that happens over and over, for many frames. Do you see an equivalent number of these trace results?

If you do, then the question goes to the root cause. You've identified a long draw call, but it's not clear to me what's actually causing that draw call to take so long. I think we'd need to figure out why.

@dnfield
Copy link

dnfield commented Jun 5, 2023

So as a disclaimer, I'm not able to super reliably reproduce the jittery issue and when I can reproduce it I'm not necessarily the best at visually detecting it.

That said, I tried the following patch locally and it seems to help:

diff --git a/shell/platform/android/platform_view_android.cc b/shell/platform/android/platform_view_android.cc
index b89c23a368..2ff8fc63ae 100644
--- a/shell/platform/android/platform_view_android.cc
+++ b/shell/platform/android/platform_view_android.cc
@@ -204,6 +204,12 @@ void PlatformViewAndroid::NotifyChanged(const SkISize& size) {
   latch.Wait();
 }
 
+PointerDataDispatcherMaker PlatformViewAndroid::GetDispatcherMaker() {
+  return [](DefaultPointerDataDispatcher::Delegate& delegate) {
+    return std::make_unique<SmoothPointerDataDispatcher>(delegate);
+  };
+}
+
 void PlatformViewAndroid::DispatchPlatformMessage(JNIEnv* env,
                                                   std::string name,
                                                   jobject java_message_data,
diff --git a/shell/platform/android/platform_view_android.h b/shell/platform/android/platform_view_android.h
index 300f3ab9b6..59d3618990 100644
--- a/shell/platform/android/platform_view_android.h
+++ b/shell/platform/android/platform_view_android.h
@@ -117,6 +117,9 @@ class PlatformViewAndroid final : public PlatformView {
     return platform_message_handler_;
   }
 
+  // |PlatformView|
+  PointerDataDispatcherMaker GetDispatcherMaker() override;
+
  private:
   const std::shared_ptr<PlatformViewAndroidJNI> jni_facade_;
   std::shared_ptr<AndroidContext> android_context_;

@matthew-carroll are you able to apply that to an engine checkout and see if it fixes things for you? If so I could look into making it landable upstream.

@dnfield
Copy link

dnfield commented Jun 5, 2023

(that patch is meant to help with irregularities when the refresh rate on the screen and the touch input device are different and may get out of sync)

@matthew-carroll
Copy link
Contributor Author

@dnfield we can try the patch, but I'm not sure how it's relevant?

We are currently reproducing the jitter with animations that are tied to a couple of buttons in the demo. There aren't any touch input events happening...

@dnfield
Copy link

dnfield commented Jun 5, 2023

Ah, sorry I had missed the buttons. Touch input could be something else on the device I'm looking at.

@dnfield
Copy link

dnfield commented Jun 5, 2023

For the buttons one, disabling the raster cache seems to help. Checking into that a bit more.

@dnfield
Copy link

dnfield commented Jun 5, 2023

Also, your trace is showing eglSwapBuffersWithDamageKHR above - but that call was removed in a recent version of Flutter (flutter/engine#40898) and the commit containing it made it into the 3.10 stable.

@matthew-carroll
Copy link
Contributor Author

@dnfield I'm definitely on 3.10 stable, but @RossComputerGuy can you please confirm that you're on the latest Flutter stable version? If you're not, can you upgrade and then rerun your traces?

@RossComputerGuy
Copy link
Collaborator

I'm on Flutter master, should I be using a tagged version like 3.10?

@matthew-carroll
Copy link
Contributor Author

master should be fine

@dnfield
Copy link

dnfield commented Jun 5, 2023

Ok, it looks like Android internally is calling eglSwapBuffersWithDamageKHR from eglSwapBuffers ... on some devices and versions of Android at least.

Right now I think we're sometimes running into a problem where we're double stuffing the buffer and failing to catch up. Each frame gets delayed. This is a known issue that's not got a really good solution unfortunately.

@matthew-carroll
Copy link
Contributor Author

This is a known issue that's not got a really good solution unfortunately.

Can you elaborate on that? Does this mean there's nothing we can do on our end to fix it? Does this mean that no one is going to address this on the Flutter side?

Our client's goal has always been to deliver a premium reading experience. This bug makes that impossible. A reader that jitters every time the user touches the screen will never be perceived as premium. As a result, this seemingly minor issue is catastrophic for the client. So it's a very big deal if this problem is essentially a shoulder shrug from Flutter.

@dnfield
Copy link

dnfield commented Jun 6, 2023

I mean the thing I think I'm seeing is a known problem that no one has quite complained about loudly enough yet.

There were two or three attempts at landing something that shoudl fix it, but none of them worked and they ended up causing other, bigger problems instead.

We were able to fix this on iOS by using transactions. But I don't know that the same are available on Android (particularly in the GLES backend). I could try to ask around but it's not the first time I've tried to figure out what's going on with this particular problem.

It's possible I'm missing something, that there's one or more other issues contributing to this (I do think the touch input is a problem on at least some android devices at this point for example).

@dnfield
Copy link

dnfield commented Jun 6, 2023

I can't seem to reproduce the jitter on iOS, which further seems to support my hypothesis.

@dnfield
Copy link

dnfield commented Jun 6, 2023

On a Samsung S6 tablet, I can see this example frequently exceeding budget on the raster thread.

If I switch rendering to a SurfaceView instead of a TextureView, things get much better. There's still some occasional jitteriness but over all it looks much better in tracing and visually to me (specifically, don't do this:

override fun getRenderMode(): RenderMode = RenderMode.texture
).

@dnfield
Copy link

dnfield commented Jun 6, 2023

(Basically, you should never use RenderMode.texture unless you absolutely must)

@dnfield
Copy link

dnfield commented Jun 6, 2023

Second big problem:

The images you're drawing in the tiles are huge. As in, every single image your drawing is larger than the physical size of the screen. The framework has debug checking for this, but you're bypassing that by directly using canvas.drawImage.

This is causing tons of GPU work that is completely unnecessary, not to mention lots of memory usage.

@dnfield
Copy link

dnfield commented Jun 6, 2023

You can help address that by using a ResizeImage wrapped around your AssetImage in image_cache.dart.

@matthew-carroll
Copy link
Contributor Author

@RossComputerGuy can you try both of Dan's recommendations? Switch back to SurfaceView in the Android Activity and also resize the images?

@RossComputerGuy
Copy link
Collaborator

Yeah, I'll try those out. If its worth it, I want to get a trace dump in Perfetto.

@RossComputerGuy
Copy link
Collaborator

RossComputerGuy commented Jun 10, 2023

I just tried this out, I switched to surface rendering and removed the * 72 part in the _naturalPageSize variable. Performance is now excellent. I don't see any jitter and the screen recording shows there's no jitter. I decided to check the framerate in different things and this is what I got for average FPS's:

  • Screen recording: 100
  • Android Studio: 88
  • Flutter DevTools: 89

This was in debug mode and I tried it with a prebuilt engine on tag 3.10.3 since I am on my laptop for the next month and I don't want to recompile the engine if needed.

@matthew-carroll
Copy link
Contributor Author

Can you post a PR with those changes so that @kirkbyo and I can run the demo and see what happens?

@RossComputerGuy
Copy link
Collaborator

Pushed a PR

@matthew-carroll
Copy link
Contributor Author

I tried the PR that goes back to SurfaceView and removes the 72 multiplier.

First, the experience is broken when the 72 is removed. I can't pinch to zoom at all. This likely indicates that the reported findings were based on the lowest-load experience, which is one tile per page.

I added 72 back, zoomed in to get a bunch of tiles, ran the automatic animations, and I'm still getting as much jitter as ever.

@dnfield
Copy link

dnfield commented Jun 13, 2023

I'm not sure what the 72 is referring to, but the real kicker is to make sure that you're not decoding your images at full resolution - try to decode them as close as possible to the resolution you'll actually display them at. A little bigger is ok, but right now they're getting decoded as something like 5k resolution to be displayed on a 1080 screen.

@matthew-carroll
Copy link
Contributor Author

@dnfield - Makes sense.

@RossComputerGuy can you update the PR so that this is accomplished? We still need full pinch to zoom capability, and we want to test the jitter with as many tiles on screen as possible.

@RossComputerGuy
Copy link
Collaborator

I've updated the PR, I made it a 5th of what it originally was and that works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working time_5 Up to 5 hours
Projects
None yet
Development

No branches or pull requests

3 participants