-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi-threaded (or even multi-GPU) rendering? #81
Comments
about the idea of parallelizng camera rendering, I'm not sure if OGRE supports this since there are orders of operations that must be done in sequence in OGRE to prepare the scene for rendering. If we try to make concurrent rendering calls, we usually run into problems locking hardware buffers and OGRE would crash. One workaround would be to distribute to multiple OGRE instances, each with their own GL context. But then there would be the tradeoff of sync'ing data between processes. But before diving into this, I think it would help to profile (e.g. using ign profiler) and see where the bottlenecks are. On the other hand, I've noticed things like if the VRAM is full, the RTF drops to half. We also found issues with lights (with large range) causing performance hit. So there are also a few other places for performance improvement. |
I ran the simulator with EXPLORER_X2_SENSOR_CONFIG_2 robot and enabled remotery profiling for ign-gazebo and ign-sensors. Unfortunately, it seems that ign-rendering has no profiling support (no IGN_PROFILE calls). At least the last screenshot was taken when all of the sensors (4 RGBD cameras + GPU lidar) were subscribed and forced to generate data. Generally, there are 3 time-consuming parts:
I guess 2 and 3 can very well be just waiting until the render thread notifies it has finished. I'm a bit unsure why the beginning of the timeslots for PostUpdate can be later than the rendering, but it definitely seems that both PostUpdate functions exit at the very same time rendering is finished. Do you have an idea how to profile the rendering part? I saw OGRE might have support for remotery too, so I'll try looking in this direction. |
The rendering operation itself is actually allowed to run in parallel to the simulation continuing. The only requirement is that the rendering must be complete before the next rendering call can start. The first iteration in ign-gazebo had rendering sensors blocking all simulation, which was introducing a huge amount of latency. This is in contrast to gazebo9/10/11, which allowed sensors to run freely in parallel with the simulation, at a potential simulation accuracy hit. In cases where all of the cameras are at the same frame rate and syncronized, this provides some speedup by allowing the rendering to happen in parallel at the cost of a small bit of sensor latency. If all the rendering sensors are at different rates and starting times, the benefit is lost. All that being said, rendering should be profilable, if built with the flag enabled. It may not have as many profile points, though. |
And for clarity, "in parallel" here means the rendering thread works in parallel with the simulation thread. Each rendering sensor that needs a frame generated will run in series in the rendering thread. Some of the time in post-update may be read back from the GPU and serialization for ign-transport as well. |
Okay, what you write about concurrency between the simulator and the rendering, that sounds reasonable. However, I was interested in parallelizing the rendering. Isn't there really a way to prepare the scene, put X cameras in it, and say OGRE to render them all? It'd seem weird to me if it couldn't make this somehow efficiently... But maybe I'm wrong... |
I understand, was just trying to provide some background information.
It does, but it has to be built with it. If you would like to start with our current packaged OGRE version: https://github.com/ignition-forks/ogre-2.1-release
I don't believe so offhand. I know OGRE makes use of the Singleton pattern in a few places, so there is a chance that there is something lower than sensors/rendering that could cause an issue. |
I finally got a Remoter-enabled OGRE installation, and here are some data. This should be one sensor update with one EXPLORER_X2_SENSOR_CONFIG_2 (4 RGBD + 1 GPU lidar): A great deal of time is spent on a function called "Forward Clustered Light Collect". That would correspond to your observation that adding lights slows things down. Unfortunately, SubT is a lot about a lot of lights :( What kind of surprised me was that the light processing is even in GPU lidar. This shouldn't do any light-related stuff, should it? Here are some more screenshots at various scales: |
If anyone else wants to experiment, here is the profiler-enabled OGRE build for Ubuntu 18.04.5 amd64: libogre.tar.gz. Just unpack it, The remotery server starts up on port 1500. |
I found GL extension GL_OVR_multiview. That seems to be applicable. But searching for it in OGRE forums yielded no results :( |
Interesting, we asked it to render a pass with depth texture only but it still does light culling and/or probably other light operations. Maybe it's an optimization that can be done in OGRE. We could also look into a way to disable lighting for that particular pass. Clear passes are taking longer than expected - maybe we can get rid of this pass.
We're doing a few things to minimize scene updates: https://github.com/ignitionrobotics/ign-gazebo/blob/ign-gazebo4/src/systems/sensors/Sensors.cc#L250. But still need to tell ogre to render one camera after another though. |
I recently hit this issue I'm wondering if there has been any improvements? The performance degradation on our sim after upgrading to ignition has been pretty profound. Before I'd head down a multi-threaded or multi gpu path it seems like the way ogre is being used is pretty simple, but that's probably not helping here. |
Hey, are there any improvements on the matter ? |
It seems to me (with my limited insight into OGRE/ign-sensors integration) that all sensors get rendered sequentially:
https://github.com/ignitionrobotics/ign-sensors/blob/67dbabc980102b96b2b0b3424c52c86c646a0c2e/src/Manager.cc#L105-L108
Is that right? Could that be the reason why SubT simulator runs so slowly with multiple models, all GPUs almost at rest and 4 CPUs spinning like hell (even if run on a 40-core machine)? It's easy to see that spawning a single EXPLORER_X1 robot decreases real-time-factor to about 10-15 %. The model has 4 RGBD cameras and a 3D GPU lidar. A discussion on this topic is here: osrf/subt#680.
Is there a way to parallelize? I guess this would have to be somehow user-configurable because you can't just put all rendering tasks on the GPU at the same time... An environment variable could be used to let the user say his GPU is performant enough to do 4 rendering tasks in parallel?
Or would it even be possible to extend the parallelization to multiple GPUs? Until Ogre implements EGL rendering, that would mean running multiple X servers, or using VirtualGL (it supports EGL offload since version 3.0, which is currently in beta). I can imagine the user would pass a list of X servers, e.g.
DISPLAY=:0,:1,:2
and sensor manager would uniformly distribute rendering tasks (or sensors) to the GPUs.Or is there something substantial that would prevent any kind of parallelization (e.g. come scene locks?)?
The text was updated successfully, but these errors were encountered: