Add spatial index for objects #14631

appgurueu · 2024-05-09T17:40:23Z

Optimizes range queries for objects in order to resolve #14613.

The spatial index is a dynamic forest of static k-d-trees, as outlined in my comment.

To do

This PR is ready for review; there are still a couple nice-to-have TODOs but none of them are necessary.

How to test

There is a randomized unit test which tests this against a naive implementation. I also recommend playing e.g. Shadow Forest or other games with entities to give this some "field testing".

The benchmarks added by sfan show the following range query speedups on my setup:

200 objects: ca. 6x
1450 objects: ca. 22x
10000 objects: ca. 153x (this demonstrates that the new index indeed scales way better)

Data

old

benchmark name                                                                                                       samples       iterations    estimated
                                                                                                                     mean          low mean      high mean
                                                                                                                     std dev       low std dev   high std dev
---------------------------------------------------------------------------------------------------------------------------------------------------------------
inside_radius_200                                                                                                              100          1261    139.593 ms 
                                                                                                                         1.1102 us    1.10106 us    1.12325 us 
                                                                                                                        55.2183 ns    41.7996 ns    73.7618 ns 
                                                                                                                                                               
inside_radius_1450                                                                                                             100           142    140.367 ms 
                                                                                                                        10.0835 us     9.9536 us    10.2648 us 
                                                                                                                        774.521 ns    597.986 ns    1.04772 us 
                                                                                                                                                               
inside_radius_10000                                                                                                            100            11    151.907 ms 
                                                                                                                        174.836 us    168.695 us    181.905 us 
                                                                                                                        33.4375 us    29.0096 us    38.6354 us 
                                                                                                                                                               
in_area_200                                                                                                                    100           850     139.57 ms 
                                                                                                                         1.5991 us    1.59156 us    1.60799 us 
                                                                                                                        41.7553 ns    35.4184 ns    50.8028 ns 
                                                                                                                                                               
in_area_1450                                                                                                                   100            97    140.446 ms 
                                                                                                                        14.5087 us    14.3894 us     14.627 us 
                                                                                                                        606.878 ns    520.745 ns     712.16 ns 
                                                                                                                                                               
in_area_10000                                                                                                                  100             9    144.056 ms 
                                                                                                                        216.703 us     211.43 us    224.314 us 
                                                                                                                        31.9183 us    23.6851 us    53.2383 us

new

benchmark name                                                                                                       samples       iterations    estimated
                                                                                                                     mean          low mean      high mean
                                                                                                                     std dev       low std dev   high std dev
---------------------------------------------------------------------------------------------------------------------------------------------------------------
inside_radius_200                                                                                                              100          6393    184.118 ms 
                                                                                                                        297.734 ns    295.987 ns    299.511 ns 
                                                                                                                        9.03226 ns    7.87456 ns    10.4505 ns 
                                                                                                                                                               
inside_radius_1450                                                                                                             100          2011     184.61 ms 
                                                                                                                        930.247 ns    925.143 ns    940.841 ns 
                                                                                                                        35.8154 ns    20.3918 ns    69.6263 ns 
                                                                                                                                                               
inside_radius_10000                                                                                                            100           803     184.61 ms 
                                                                                                                        2.53827 us     2.4752 us    2.69643 us 
                                                                                                                        476.573 ns      198.8 ns    917.299 ns 
                                                                                                                                                               
in_area_200                                                                                                                    100          7484    184.106 ms 
                                                                                                                        246.216 ns    244.321 ns    247.968 ns 
                                                                                                                        9.27178 ns    7.80405 ns    11.5285 ns 
                                                                                                                                                               
in_area_1450                                                                                                                   100          2907    184.595 ms 
                                                                                                                        647.267 ns    644.314 ns     649.87 ns 
                                                                                                                         14.137 ns    11.7168 ns    17.8412 ns 
                                                                                                                                                               
in_area_10000                                                                                                                  100          1403    184.635 ms 
                                                                                                                        1.41243 us    1.39912 us    1.44893 us 
                                                                                                                        103.922 ns    46.1786 ns    221.902 ns

src/benchmark/benchmark_activeobjectmgr.cpp

sfan5 · 2024-05-09T18:58:01Z

I think this will need comprehensive tests for how the spatial index and object lifecycle interact. (basically the check list I had in that issue)

lhofhansl · 2024-05-14T21:32:37Z

How does this relate to #14643? Do they solve the same problem? Are they complementary?

appgurueu · 2024-05-14T22:17:50Z

How does this relate to #14643? Do they solve the same problem? Are they complementary?

They solve the same problem in different ways. I am not yet sure which way is better. Some thoughts:

Performance on the relatively evenly distributed entities in sfan's benchmark is comparable. Both solutions seem capable of producing speedups large enough to basically solve the problem for now.
exe_virus' solution potentially performs poorly in two theoretical cases:
- Very sparse distributions of entities, large area queries: You would need to check many (empty) buckets for such a query; in this case, performance would be even worse than the current linear search. This is probably relatively irrelevant in practice however due to being dwarfed by the associated costs related to mapblock management. For big query areas, the linear search could also be used as a fallback.
- Very dense distributions, small area queries: In this case there would be too many entities per bucket, so we would be back to near-linear performance in "miniature" settings (think something like rubenwardy's conquer or a chess game where all pieces are entities). Again I don't know how relevant this is in practice, though I could imagine "swarms" or "clumps" of objects to appear, but I don't know whether (1) these are large enough (probably not); (2) large queries are done anyways, meaning there is hardly a way to optimize these.
- TL;DR: exe_virus' solution is best suited towards a distribution of entities where there's a low, roughly constant number of entities per bucket. This seems to be what real-world distributions are likely to look like.
This data structure can provide better worst-case runtime guarantees for range queries, but they aren't great either, due to "the curse of dimensionality". They aren't of much relevance anyways, since there is a large difference between the worst and average case here.
exe_virus' data structure is much simpler. It's basically just a multimap from bucket pos to entity. In that sense, it's similar to what we have already for mapblocks. The data structure here is a dynamic forest of static k-d-trees, which are merged, and sometimes is shrinked. In terms of code size, exe_virus' spatial map (impl. & header) is roughly around 150 loc, whereas my (header-only) implementation is around 450 loc. (I am reasonably confident in the correctness of my data structure nevertheless.)
I expect updates to be faster in exe_virus' solution (but I don't think that this will be very relevant in the bigger picture, since I expect there to only be linearly many updates per step, so an amortized logarithmic-ish runtime with good constant factors should be fine here).
There is currently some remaining potential for optimization in both solutions.

ExeVirus · 2024-05-15T01:49:41Z

appgurueu is correct on many fronts here.

To clarify on the two drawbacks mentioned, I have two pending solutions for the subjects:

Very sparse distributions of entities, large area queries

I have implemented exactly the recommended solution here: check the number of buckets we are about to iterate over and compare against the total number of entities. If there more mapblocks, I fallback to linear iteration over all entities for now, until the algorithm is more optimized.

Very dense distributions, small area queries

Except where all entities in the entire server are in a single mapblock, this solution will still help, just not tremendously. In this case, only one or two mapblocks will need to be checked (worst case 8 on a corner), ignoring all the other entities on the server.

I am have been debating allowing the data structure use a non-hardcoded bucket size, so that buckets can be of size 16x16x16 or 1x1x1, 64x64x64, etc. Then we could expose it as a setting for servers/games to tweak for their needs.

There is currently some remaining potential for optimization in both solutions.

And yes, both algorithms can be improved. I'm looking to give mine another 2 or 3x improvement for large range queries by using the regularity of the boxes to check for whole rows/columns of mapblocks, rather than each individual mapblock or entity inside a given mapblock. This will only apply for range queries that are at least 2 mapblocks in dimension or larger, however. I'm looking to improve this specifically to help with connected client object updates, which we currently rely on large getObjectsInRadius queries for.

My Recommendation:

Once we are done with optimizations over the next days/weeks, we can do side by side comparisons and let the numbers decide for us. If they're comparable, Core Devs can make that call. These structures are certainly exclusive: i.e. we should choose one not both.

lhofhansl · 2024-06-03T16:36:11Z

What's needed to make progress on this one (or #14643)?
I tried both out on a busy world and it all seems to work. (In my scenario I did not see any performance improvements - many object in a single blocks - but all continued to work fine)

ExeVirus · 2024-06-03T16:45:49Z

Mostly we need solid, repeatable, verifiable, and somewhat diverse test methods, to prove to ourselves these implementations are flawless, and to have a very good benchmark to say which we choose. There are small scale tests we can set up easily, but the most important is one that is pseudorandom, that adds, removes, and moves entities, starting with 100, up to about 2-3K, and back down. We would do both in radius and in box checks during this time (say every 20 frames or something), and I'd want to see identical results between master and these branches before saying merge. We haven't really tested all the cases of entities being added and removed during iteration in real time and need to be confident in the solution. I think it would take maybe 3-5 hours to get those tests written, and I would have them done probably within a week or two. After we verify they are safe implementations, it should be very easy to compare performance and consider code maintenance at that point to decide the better solution. Really don't want to over engineer, or bike shed, but hey as long as a core dev doesn't have to do it... Shrug

…

On Mon, Jun 3, 2024, 12:36 PM lhofhansl ***@***.***> wrote: What's needed to make progress on this one (or #14643 <#14643>)? I tried both out on a busy world and it all seems to work. (In my scenario I did not see any performance improvements - many object in a single blocks - but all continued to work fine) — Reply to this email directly, view it on GitHub <#14631 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AKT7N2WUEVIM7BTOEKYFALDZFSLRFAVCNFSM6AAAAABHPG5SFWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNBVGY3DANRRGY> . You are receiving this because you commented.Message ID: ***@***.***>

ExeVirus · 2024-06-11T18:58:45Z

Okay, I have finished writing the pseudorandom test, found two mistakes in my code. Ran it on this kdtrees branch: had an exception that appgurueu will have to work out. I'll work with him to get those tests completed.

In the meantime, will update my branch from WIP as soon as I'm done integrating these updated cache lookups elsewhere in the codebase besides SAO inRadius and inArea lookups.

At this point: I'm extremely confident in the implementation correctness, and if appgurueu's branch here were to pass the same test(s), I would have full confidence in his solution, and we can just do performance comparisons at that point.

appgurueu · 2024-09-04T17:43:38Z

src/server/serveractiveobject.cpp

+void ServerActiveObject::setBasePosition(v3f pos) {
+	bool changed = m_base_position != pos;
+	m_base_position = pos;
+	if (changed && getEnv()) // HACK *when* is getEnv() null?


Potentially cleaner alternatives:

Couple active objects to their manager by giving them a direct reference; only allow object creation through the manager.

Get rid of setBasePosition, force position updates to go through the manager that way. The manager can befriend the m_base_position member variable.

appgurueu · 2024-09-04T17:45:54Z

src/server/activeobjectmgr.cpp

@@ -128,39 +132,40 @@ void ActiveObjectMgr::invalidateActiveObjectObserverCaches()
 	}
 }

+void ActiveObjectMgr::updatePos(u16 id, const v3f &pos) {
+	// HACK only update if we already know the object


This probably isn't that bad of a hack (if the object has not been registered yet, it will be registered later, with the correct position), though ideally positions should only be updated once the object has been registered. Fixing the above hack probably helps with this.

SmallJoker · 2024-12-30T18:18:00Z

Is there a technical reason why we couldn't use the existing (although optional) SpatialIndex dependency? It does seem to offer a few different trees.

ExeVirus · 2024-12-30T18:52:49Z

In my case, it was my attempt at optimizing for speed of updates. SpatialIndex needed to be updated every time an entity moves, so it would update all entities' indexes every frame, essentially. In my implementation, I only need to update when you change the 16x16x16 block you are already in, i.e. extremely seldom.

SpatialIndex was faster than unordered_multimap for the case of lookups, but at the cost of a linear growth in update times based on number of entities changing locations.

Don't have the numbers, could try to rebuild my tests if that is desired.

appgurueu · 2024-12-31T17:41:54Z

Yes, it boils down to these libraries (I also explored another one) not being optimized for our use case. I tried leveraging them a while ago and it ultimately failed ¹. They offer mostly static data structures and sometimes they lack specific things (such as 3d raycast queries, though this PR in its current state doesn't have that either).

Applying the logarithmic method (generic dynamization) to these data structures could be explored and would likely yield similar expected asymptotics, but peeking at the implementation, I legitimately doubt that they can beat my optimized implementation of static k-d-trees (for example their trees seem to heap-allocate each node, whereas my trees live in arrays).

In all fairness a contributing factor was that I was less experienced back then. ↩

The size of the modify safe map is not what you think it is and hence unsuitable for these.

appgurueu added WIP Performance labels May 9, 2024

sfan5 reviewed May 9, 2024

View reviewed changes

src/benchmark/benchmark_activeobjectmgr.cpp Show resolved Hide resolved

appgurueu force-pushed the obj-dyn-kdtrees branch from 84bcdd9 to fb5d3bd Compare May 18, 2024 19:06

appgurueu force-pushed the obj-dyn-kdtrees branch 2 times, most recently from 946a55b to 953f2d1 Compare May 26, 2024 20:37

appgurueu force-pushed the obj-dyn-kdtrees branch from 953f2d1 to 743199f Compare June 7, 2024 11:16

Zughy marked this pull request as draft June 7, 2024 13:27

sfan5 added the Testing needed label Aug 11, 2024

red-001 mentioned this pull request Aug 16, 2024

Add Active Object Spatial Cache #14643

Open

9 tasks

Zughy removed the WIP label Aug 22, 2024

appgurueu force-pushed the obj-dyn-kdtrees branch 3 times, most recently from 09c62b5 to 6f8836e Compare September 4, 2024 16:39

appgurueu commented Sep 4, 2024

View reviewed changes

appgurueu marked this pull request as ready for review September 4, 2024 17:46

appgurueu force-pushed the obj-dyn-kdtrees branch from ec584fe to 3ddc97b Compare October 15, 2024 23:46

appgurueu force-pushed the obj-dyn-kdtrees branch from 3ddc97b to 080be0c Compare November 18, 2024 01:45

sfan5 self-requested a review November 19, 2024 20:26

appgurueu added 21 commits January 7, 2025 00:07

Dynamic k-d-tree firstest draft

8a7b87d

Deletion

4071acd

Add failing test

ef2c13f

let there be fix

ed72d89

hook the thing up

cc9f6ce

Update benchmark to include very large case

cbb3773

stuff

5e6e5ad

Some cleanup, add brief explanation

b1074fd

Eliminate risk of unwanted wraparound

9b970d5

Fix potential wraparound issue

58111fe

Fix yet another wraparound issue

26cdd37

Fix bugs

2b107d9

Trivial optimization for static entities

3f25ef9

Extend unit test

c760ce8

Use separate type for sizes

03f14fc

Use range queries for raycasts and collisions

58f02b3

Refactor server active object manager tests to use Catch

69faee7

Add randomized spatial index unit testing

94d0fa6

Update comments

c9f9d6d

Clean up around updatePos

874fa76

Remove m_active_objects.size() assertions

02035a2

The size of the modify safe map is not what you think it is and hence unsuitable for these.

appgurueu force-pushed the obj-dyn-kdtrees branch from 080be0c to 02035a2 Compare January 6, 2025 23:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add spatial index for objects #14631

Add spatial index for objects #14631

appgurueu commented May 9, 2024 •

edited

Loading

sfan5 commented May 9, 2024 •

edited

Loading

lhofhansl commented May 14, 2024 •

edited

Loading

appgurueu commented May 14, 2024

ExeVirus commented May 15, 2024

lhofhansl commented Jun 3, 2024

ExeVirus commented Jun 3, 2024 via email

ExeVirus commented Jun 11, 2024

appgurueu Sep 4, 2024

appgurueu Sep 4, 2024

SmallJoker commented Dec 30, 2024

ExeVirus commented Dec 30, 2024

appgurueu commented Dec 31, 2024 •

edited

Loading

Add spatial index for objects #14631

Are you sure you want to change the base?

Add spatial index for objects #14631

Conversation

appgurueu commented May 9, 2024 • edited Loading

To do

How to test

old

new

sfan5 commented May 9, 2024 • edited Loading

lhofhansl commented May 14, 2024 • edited Loading

appgurueu commented May 14, 2024

ExeVirus commented May 15, 2024

My Recommendation:

lhofhansl commented Jun 3, 2024

ExeVirus commented Jun 3, 2024 via email

ExeVirus commented Jun 11, 2024

appgurueu Sep 4, 2024

Choose a reason for hiding this comment

appgurueu Sep 4, 2024

Choose a reason for hiding this comment

SmallJoker commented Dec 30, 2024

ExeVirus commented Dec 30, 2024

appgurueu commented Dec 31, 2024 • edited Loading

Footnotes

appgurueu commented May 9, 2024 •

edited

Loading

sfan5 commented May 9, 2024 •

edited

Loading

lhofhansl commented May 14, 2024 •

edited

Loading

appgurueu commented Dec 31, 2024 •

edited

Loading