Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add spatial index for objects #14631

Open
wants to merge 21 commits into
base: master
Choose a base branch
from

Conversation

appgurueu
Copy link
Contributor

@appgurueu appgurueu commented May 9, 2024

Optimizes range queries for objects in order to resolve #14613.

The spatial index is a dynamic forest of static k-d-trees, as outlined in my comment.

To do

This PR is ready for review; there are still a couple nice-to-have TODOs but none of them are necessary.

How to test

There is a randomized unit test which tests this against a naive implementation. I also recommend playing e.g. Shadow Forest or other games with entities to give this some "field testing".

The benchmarks added by sfan show the following range query speedups on my setup:

  • 200 objects: ca. 6x
  • 1450 objects: ca. 22x
  • 10000 objects: ca. 153x (this demonstrates that the new index indeed scales way better)
Data
old
benchmark name                                                                                                       samples       iterations    estimated
                                                                                                                     mean          low mean      high mean
                                                                                                                     std dev       low std dev   high std dev
---------------------------------------------------------------------------------------------------------------------------------------------------------------
inside_radius_200                                                                                                              100          1261    139.593 ms 
                                                                                                                         1.1102 us    1.10106 us    1.12325 us 
                                                                                                                        55.2183 ns    41.7996 ns    73.7618 ns 
                                                                                                                                                               
inside_radius_1450                                                                                                             100           142    140.367 ms 
                                                                                                                        10.0835 us     9.9536 us    10.2648 us 
                                                                                                                        774.521 ns    597.986 ns    1.04772 us 
                                                                                                                                                               
inside_radius_10000                                                                                                            100            11    151.907 ms 
                                                                                                                        174.836 us    168.695 us    181.905 us 
                                                                                                                        33.4375 us    29.0096 us    38.6354 us 
                                                                                                                                                               
in_area_200                                                                                                                    100           850     139.57 ms 
                                                                                                                         1.5991 us    1.59156 us    1.60799 us 
                                                                                                                        41.7553 ns    35.4184 ns    50.8028 ns 
                                                                                                                                                               
in_area_1450                                                                                                                   100            97    140.446 ms 
                                                                                                                        14.5087 us    14.3894 us     14.627 us 
                                                                                                                        606.878 ns    520.745 ns     712.16 ns 
                                                                                                                                                               
in_area_10000                                                                                                                  100             9    144.056 ms 
                                                                                                                        216.703 us     211.43 us    224.314 us 
                                                                                                                        31.9183 us    23.6851 us    53.2383 us 

new
benchmark name                                                                                                       samples       iterations    estimated
                                                                                                                     mean          low mean      high mean
                                                                                                                     std dev       low std dev   high std dev
---------------------------------------------------------------------------------------------------------------------------------------------------------------
inside_radius_200                                                                                                              100          6393    184.118 ms 
                                                                                                                        297.734 ns    295.987 ns    299.511 ns 
                                                                                                                        9.03226 ns    7.87456 ns    10.4505 ns 
                                                                                                                                                               
inside_radius_1450                                                                                                             100          2011     184.61 ms 
                                                                                                                        930.247 ns    925.143 ns    940.841 ns 
                                                                                                                        35.8154 ns    20.3918 ns    69.6263 ns 
                                                                                                                                                               
inside_radius_10000                                                                                                            100           803     184.61 ms 
                                                                                                                        2.53827 us     2.4752 us    2.69643 us 
                                                                                                                        476.573 ns      198.8 ns    917.299 ns 
                                                                                                                                                               
in_area_200                                                                                                                    100          7484    184.106 ms 
                                                                                                                        246.216 ns    244.321 ns    247.968 ns 
                                                                                                                        9.27178 ns    7.80405 ns    11.5285 ns 
                                                                                                                                                               
in_area_1450                                                                                                                   100          2907    184.595 ms 
                                                                                                                        647.267 ns    644.314 ns     649.87 ns 
                                                                                                                         14.137 ns    11.7168 ns    17.8412 ns 
                                                                                                                                                               
in_area_10000                                                                                                                  100          1403    184.635 ms 
                                                                                                                        1.41243 us    1.39912 us    1.44893 us 
                                                                                                                        103.922 ns    46.1786 ns    221.902 ns 

@sfan5
Copy link
Collaborator

sfan5 commented May 9, 2024

I think this will need comprehensive tests for how the spatial index and object lifecycle interact. (basically the check list I had in that issue)

@lhofhansl
Copy link
Contributor

lhofhansl commented May 14, 2024

How does this relate to #14643? Do they solve the same problem? Are they complementary?

@appgurueu
Copy link
Contributor Author

How does this relate to #14643? Do they solve the same problem? Are they complementary?

They solve the same problem in different ways. I am not yet sure which way is better. Some thoughts:

  • Performance on the relatively evenly distributed entities in sfan's benchmark is comparable. Both solutions seem capable of producing speedups large enough to basically solve the problem for now.
  • exe_virus' solution potentially performs poorly in two theoretical cases:
    • Very sparse distributions of entities, large area queries: You would need to check many (empty) buckets for such a query; in this case, performance would be even worse than the current linear search. This is probably relatively irrelevant in practice however due to being dwarfed by the associated costs related to mapblock management. For big query areas, the linear search could also be used as a fallback.
    • Very dense distributions, small area queries: In this case there would be too many entities per bucket, so we would be back to near-linear performance in "miniature" settings (think something like rubenwardy's conquer or a chess game where all pieces are entities). Again I don't know how relevant this is in practice, though I could imagine "swarms" or "clumps" of objects to appear, but I don't know whether (1) these are large enough (probably not); (2) large queries are done anyways, meaning there is hardly a way to optimize these.
    • TL;DR: exe_virus' solution is best suited towards a distribution of entities where there's a low, roughly constant number of entities per bucket. This seems to be what real-world distributions are likely to look like.
  • This data structure can provide better worst-case runtime guarantees for range queries, but they aren't great either, due to "the curse of dimensionality". They aren't of much relevance anyways, since there is a large difference between the worst and average case here.
  • exe_virus' data structure is much simpler. It's basically just a multimap from bucket pos to entity. In that sense, it's similar to what we have already for mapblocks. The data structure here is a dynamic forest of static k-d-trees, which are merged, and sometimes is shrinked. In terms of code size, exe_virus' spatial map (impl. & header) is roughly around 150 loc, whereas my (header-only) implementation is around 450 loc. (I am reasonably confident in the correctness of my data structure nevertheless.)
  • I expect updates to be faster in exe_virus' solution (but I don't think that this will be very relevant in the bigger picture, since I expect there to only be linearly many updates per step, so an amortized logarithmic-ish runtime with good constant factors should be fine here).
  • There is currently some remaining potential for optimization in both solutions.

@ExeVirus
Copy link
Contributor

appgurueu is correct on many fronts here.

To clarify on the two drawbacks mentioned, I have two pending solutions for the subjects:

Very sparse distributions of entities, large area queries

I have implemented exactly the recommended solution here: check the number of buckets we are about to iterate over and compare against the total number of entities. If there more mapblocks, I fallback to linear iteration over all entities for now, until the algorithm is more optimized.

Very dense distributions, small area queries

Except where all entities in the entire server are in a single mapblock, this solution will still help, just not tremendously. In this case, only one or two mapblocks will need to be checked (worst case 8 on a corner), ignoring all the other entities on the server.

I am have been debating allowing the data structure use a non-hardcoded bucket size, so that buckets can be of size 16x16x16 or 1x1x1, 64x64x64, etc. Then we could expose it as a setting for servers/games to tweak for their needs.

There is currently some remaining potential for optimization in both solutions.

And yes, both algorithms can be improved. I'm looking to give mine another 2 or 3x improvement for large range queries by using the regularity of the boxes to check for whole rows/columns of mapblocks, rather than each individual mapblock or entity inside a given mapblock. This will only apply for range queries that are at least 2 mapblocks in dimension or larger, however. I'm looking to improve this specifically to help with connected client object updates, which we currently rely on large getObjectsInRadius queries for.

My Recommendation:

Once we are done with optimizations over the next days/weeks, we can do side by side comparisons and let the numbers decide for us. If they're comparable, Core Devs can make that call. These structures are certainly exclusive: i.e. we should choose one not both.

@appgurueu appgurueu force-pushed the obj-dyn-kdtrees branch 2 times, most recently from 946a55b to 953f2d1 Compare May 26, 2024 20:37
@lhofhansl
Copy link
Contributor

What's needed to make progress on this one (or #14643)?
I tried both out on a busy world and it all seems to work. (In my scenario I did not see any performance improvements - many object in a single blocks - but all continued to work fine)

@ExeVirus
Copy link
Contributor

ExeVirus commented Jun 3, 2024 via email

@Zughy Zughy marked this pull request as draft June 7, 2024 13:27
@ExeVirus
Copy link
Contributor

Okay, I have finished writing the pseudorandom test, found two mistakes in my code. Ran it on this kdtrees branch: had an exception that appgurueu will have to work out. I'll work with him to get those tests completed.

In the meantime, will update my branch from WIP as soon as I'm done integrating these updated cache lookups elsewhere in the codebase besides SAO inRadius and inArea lookups.

At this point: I'm extremely confident in the implementation correctness, and if appgurueu's branch here were to pass the same test(s), I would have full confidence in his solution, and we can just do performance comparisons at that point.

@red-001 red-001 mentioned this pull request Aug 16, 2024
9 tasks
@Zughy Zughy removed the WIP label Aug 22, 2024
@appgurueu appgurueu force-pushed the obj-dyn-kdtrees branch 3 times, most recently from 09c62b5 to 6f8836e Compare September 4, 2024 16:39
void ServerActiveObject::setBasePosition(v3f pos) {
bool changed = m_base_position != pos;
m_base_position = pos;
if (changed && getEnv()) // HACK *when* is getEnv() null?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potentially cleaner alternatives:

  • Couple active objects to their manager by giving them a direct reference; only allow object creation through the manager.
  • Get rid of setBasePosition, force position updates to go through the manager that way. The manager can befriend the m_base_position member variable.

@@ -128,39 +132,40 @@ void ActiveObjectMgr::invalidateActiveObjectObserverCaches()
}
}

void ActiveObjectMgr::updatePos(u16 id, const v3f &pos) {
// HACK only update if we already know the object
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably isn't that bad of a hack (if the object has not been registered yet, it will be registered later, with the correct position), though ideally positions should only be updated once the object has been registered. Fixing the above hack probably helps with this.

@appgurueu appgurueu marked this pull request as ready for review September 4, 2024 17:46
@sfan5 sfan5 self-requested a review November 19, 2024 20:26
@SmallJoker
Copy link
Member

Is there a technical reason why we couldn't use the existing (although optional) SpatialIndex dependency? It does seem to offer a few different trees.

@ExeVirus
Copy link
Contributor

In my case, it was my attempt at optimizing for speed of updates. SpatialIndex needed to be updated every time an entity moves, so it would update all entities' indexes every frame, essentially. In my implementation, I only need to update when you change the 16x16x16 block you are already in, i.e. extremely seldom.

SpatialIndex was faster than unordered_multimap for the case of lookups, but at the cost of a linear growth in update times based on number of entities changing locations.

Don't have the numbers, could try to rebuild my tests if that is desired.

@appgurueu
Copy link
Contributor Author

appgurueu commented Dec 31, 2024

Yes, it boils down to these libraries (I also explored another one) not being optimized for our use case. I tried leveraging them a while ago and it ultimately failed 1. They offer mostly static data structures and sometimes they lack specific things (such as 3d raycast queries, though this PR in its current state doesn't have that either).

Applying the logarithmic method (generic dynamization) to these data structures could be explored and would likely yield similar expected asymptotics, but peeking at the implementation, I legitimately doubt that they can beat my optimized implementation of static k-d-trees (for example their trees seem to heap-allocate each node, whereas my trees live in arrays).

Footnotes

  1. In all fairness a contributing factor was that I was less experienced back then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

The data structure problem with active objects
6 participants