The data structure problem with active objects #14613

sfan5 · 2024-05-05T14:01:04Z

There's a closely related older issue (#6453) but I felt like a fresh start here was better.

Problem

tl;dr: optimizing anything else server-side is basically a waste until this is solved, skip to the next section.

The nice people from the Your Land server sent me several profiling traces of their production server and a common theme is that about 40% of time (on the server thread) is spent just iterating the entity list (average, not peak!).
Now their mods use get_objects_inside_radius a bunch, but this is not a sole cause and I expect any sizeable server will be plagued with this exact issue.

The issue are spatial queries on the entity list, which happen more often than one might think:

Lua get_objects_inside_radius, get_objects_in_area
Entity collision detection (sometimes)
Raycasts (sometimes)
ServerEnvironment::getAddedActiveObjects

All of these are O(n) with n being the number of entities. When you consider that every entity does collision detection this becomes O(n²) right in the main thread.

Solution requirements

What we need is a new data structure.

It must:

support fast lookup by entity ID
be iterable in the order of entity ID
support deletion and insertion
support fast spatial AABB queries (= getObjectsInArea) on the entity position
efficiently handle entity position updates, as these happen often
position updates must reflect in lookups instantly

external circumstances:

there is almost always opportunity for positions to change between lookups
- this means any kind of "one-off cache" (that needs to be thrown away once an update happens) is unlikely to produce any hits
spatial distance queries (= getObjectsInsideRadius) can be emulated on top of AABB queries and are not an important point
being safe from modification during iteration is also important (Use performant and safe data structure for active objects #13880) but I think we can tack that on if the rest is adressed

Finally we will also need to figure out how to reliably propagate position updates up to the container.
There are many ideas in the form of "just add a cache to <thing>" that come to mind, but keeping it in sync is generally the trickier part than designing the structure.

Real-world example data

(thanks to @Bastrabun)
There are 4500 entities. About 1400 move often, 3100 are static.

The text was updated successfully, but these errors were encountered:

appgurueu · 2024-05-05T14:11:20Z

I am currently working on a solution to this. The data structure I'm implementing is a dynamization of k-d-trees, meaning it would store O(log n) many k-d-trees with sizes of powers of two. This means:

Amortized O(log n) insertions and O((log n)²) deletions
O(log(n) * n^(2/3) + k) worst case range queries (assuming distinct object positions) where k is the number of results, average case should be better
- To give an example of how good the performance can be, a near-point-query should have O((log n)²) time complexity.
Raycast queries can get special optimization. The basic "bounding area" optimization also helps a bit.

Josiah and I tried optimizing this by using an off-the-shelf R-tree library a while ago. This failed due to bad constant factors for our medium-size workloads. It's also a bit overkill: It is centered around boxes. It would have made supporting higher variance in box dimensions easier. But currently we don't need that; we can just approximate the relatively small objects as points for now.

I'm optimistic that with this new approach, I have a good chance of getting the constant factors low enough since there are plenty of knobs to tweak:

There are three possible ways to construct k-d-trees;
I can use a vector for up to, say, 32 or 64 entity ID - position pairs, and only use k-d-trees for more entries;
There are various ways of tweaking the "dynamization" data structure. For example we could use only a single k-d-tree, which would worsen updates to O(sqrt n log n), but improve queries to O(sqrt n + k) worst case (and much better in average case). This would require k-d-trees to be able to get "out of shape" though, so storing them in arrays wouldn't be an option anymore; we would need more allocations.

ExeVirus · 2024-05-05T15:07:34Z

I've had an idea tossing around for a while now on the subject, it would need optimized but generally:

Object ID vector of pointers (std vector)

Std Vector of unused entries

Std unordered map with three levels:

Each level represents an Octree subdivision,

With the bottom layer corresponding with the 16,16,16 mapblock boundary (to align with when entities need to be put to sleep or awoken). This bottom layer is two std::vector()s to hold unlimited entities, same on-erase mechanics as the main std vector list of objects.

So, you have two lookup methods: by Octree bin (basically x,y,z) and by id in a vector.

When insertions occur in the vector, they are mirrored in the unordered maps. When positions updates occur, just need to verify we haven't changed mapblock, if we have, erase our entry in that mapblock vector and insert into the new mapblock vector. So the tradeoff is position updates are more expensive.

Finally, the main unordered map stores pointers, at least at the highest level, so as to keep RAM usage down. When I last looked at it, I think it was about 15 MB for the whole optimization stack at startup, with a couple kilobyte allocations when the first entities are inserted to mapblocks.

Note: will be updating this comment as I do an actual design on the task - might be able to actually do this one.

Update 1: Nice we already have an object manager class using a std::map structure with garbage collecting, so that takes care of my first vector manager. Wonder why we need to use maps? Like, couldn't the vector index be the object id?

Update 2:

The locations that for loop over all objects at ton during runtime are:

In activeObejctManager.cpp
// Get vector sorted by distance around origin point within max distance
ActiveObjectMgr::getActiveObjects(const v3f &origin, f32 max_d, std::vector<DistanceSortedActiveObject> &dest)

// Raycasts 
ActiveObjectMgr::getActiveSelectableObjects(const core::line3d<f32> &shootline)

// objects inside radius
ActiveObjectMgr::getObjectsInsideRadius(const v3f &pos, float radius, std::vector<ServerActiveObject *> &result, std::function<bool(ServerActiveObject *obj)> include_obj_cb)

// objects inside area
ActiveObjectMgr::getObjectsInArea(const aabb3f &box, std::vector<ServerActiveObject *> &result, std::function<bool(ServerActiveObject *obj)> include_obj_cb)

// Maintain Inactive/Active Objects
ActiveObjectMgr::getAddedActiveObjectsAroundPos(v3f player_pos, f32 radius, f32 player_radius, const std::set<u16> current_objects, std::vector<u16> &added_objects)

sfan5 · 2024-05-05T16:05:22Z

So I took a walk and here's my idea:

It bases around the "one-off cache" approach I said wouldn't work but uses some tricks to make it workable.
It also presumes a reliable mechanism exists by which the container is notified when an object changes position.

Keep the ModifySafeMap<u16, SAO> as base data structure.
A cache map is created as a std::multimap<float = Z pos of object, u16 = object ID>.
A list/vector/set of object IDs is created called uncached.

The gist of it is that objects exist in either cached (fast lookup) or uncached (linear lookup as before). Since there is no requirement that the cache is accurate at all times we can cleverly avoid (defer) rebuild costs.

When an object is inserted it is put into the uncached list.
When an object is deleted it is removed from the uncached list.
When an object changes position it is added to the uncached list.

The cache has to be (re-)created at some point. This should happen when the uncached list gets too large, let's say if it's at least 64 entries and >=10% of all objects.

When an AABB query happens the cache is consulted (which speeds up the query, albeit only in one dimension). The uncached map is also consulted and takes priority over what the cache returned for a given object ID.

spatial lookup performance: O(log(n) + m) (n=cached.size(), m=uncached.size())
the cache rebuild that would need to happen once in a while is O(n * log n) (n=all objects)

let's go through the checklist:

lookup by ID: defer to base structure
iterable in ID-order: defer to base structure
deletion and insertion: no problem, described above
spatial queries: yep, this is what the cache is for
efficiently handle updates: virtually no cost
updates apply instantly: yes

bonus:

safe from modification during iteration: no iterator needs to be held of the two new structures + the base structure is already safe
bookkeeping does not require knowing from where objects have moved, just that they have moved

caveats:

doesn't work if too many objects constantly move

If you read closely you will notice that the requirements for the cache structure are very minimal.
It literally only needs to:

be constructable
supports spatial queries

It can be transparently replaced with any other — better — data structure.

Bastrabun · 2024-05-05T16:10:55Z

Thanks for addressing this problem. If we can help any further, please let me know - whatever readouts, logs, temporary code compiled into the engine ...

For reference YL bugtracker 3723, 6325, 6326, 5289, administration 188

ExeVirus · 2024-05-05T16:22:53Z

Sfan: Why is just Z-position enough to cache? Does that speed up any of our lookups by position by enough to matter?

Also, why are position updates no cost?

sfan5 · 2024-05-05T16:27:30Z

Thanks for addressing this problem. If we can help any further, please let me know - whatever readouts, logs, temporary code compiled into the engine ...

I think the most likely next step when someone has working code is for you to test the performance in a real situation.
~~What would be useful to know right now however is: how many active objects do you have usually?~~

Sfan: Why is just Z-position enough to cache?

It may not be, but see the last section.

Also, why are position updates no cost?

After updating an object the cost is paid during every following spatial query, that's the m component in the big-O notation.

ExeVirus · 2024-05-05T16:42:00Z

Sfan: Why is just Z-position enough to cache?

It may not be, but see the last section.

Oh, that makes perfect sense. Okay yeah I can see a way to get my idea to fit in your world relatively painlessly.

Just take the int16(x) >> 4 | int16(y) >> 4 | int16(z) >> 4. That will put all entities into mapblock sized bins and easily fit in 64 bits.

Should still yield nice speedups for objects in radius, raycast, and whatnot, just gotta make the algorithms for those lookups. Could be as simple as running the same algorithms at 16x16x16 size first and iterate over those relevant bins instead of the whole active object list like we currently do.

Bin size should be played with, obviously. Because 1x1x1 bins wouldn't yield much value I suspect.

fluxionary · 2024-05-05T21:22:16Z

i tried to take a stab at a better data structure for this. the result, in lua, is here https://github.com/fluxionary/minetest-futil/blob/main/data_structures/point_search_tree.lua. the biggest issue with it is that it can't be re-balanced when new points are added or removed. i found a paper about that but never got around to implementing it https://arxiv.org/abs/1410.5420

appgurueu · 2024-05-05T21:45:20Z

I also have a Lua implementation of k-d-trees. I think trying to rebalance k-d-trees is a dead end. I tried for a while, it doesn't seem to really be feasible to me.

But it is possible to build an efficient dynamic data structure out of a forest of static k-d-trees (see "transforming static data structures to dynamic structures"), which is what I'm suggesting here and currently implementing. I can't tell whether this will hold up in real world usage, but implementing and testing it is the only way to find out. I'm optimistic that we can get decent constant factors because there are plenty of knobs to tweak.

ExeVirus · 2024-05-06T03:30:44Z

Thought through sfan's idea more (worked on it). There's also a remove (when object is no longer active) that is also O(log(n) + m)

appgurueu · 2024-05-06T19:24:56Z

Alright, I've got a seemingly working (according to a quick randomized test against a naive implementation) dynamic forest of k-d-trees data structure whipped up, feel free to take a peek at my branch. The big O for this should be good (though not quite as good as what I advertised initially; for example deletion is amortized O(log(n)²) rather than O(log n) with this implementation), what remains to be seen is how it holds up in practice (potentially after some optimization), and integration into the active object manager, which I may get to on the weekend.

ExeVirus · 2024-05-07T00:38:38Z

Will take a look, sfans idea was short enough that I just jumped right into server::active object mgr integration, the only not straight forward thing is how to signal object position updates. In mine I just check if last position is different from the new position after step().

I will probably not write tests for your structure directly, as MT is a much better stress test of complexity than I can imagine haha, but I can at least look for obvious things.

Not quite done with my version, but perhaps ready for integration tests against servers/mods by end of week.

ExeVirus · 2024-05-08T02:22:57Z

Okay, first draft version working correctly? I'm certain I have missed spots where an object position changes and I need to invalidate the cache, such as object:set_pos(), but I have data to back up that I have a working solution, albeit only 10% optimized.

Here is the branch:

I am testing with a custom mod with devtest, this is the init.lua:

minetest.register_chatcommand("foo", {
    privs = {
        interact = true,
    },
    func = function(name, param)
        for i = 1, 100 do
            local z = i-50
            local obj = minetest.add_entity({x = i-50, y = 20, z = z}, "entity_test:test_entity", nil)
            obj:set_hp(z+1)
        end
        return true
    end,
})

local function getObjectsInAreaTest()
    local before = minetest.get_us_time()
    local list = 0
    for i = 1, 10 do
        list = minetest.get_objects_in_area({x=0,y=0,z=-50},{x=50,y=50,z=0})
    end
    local after = minetest.get_us_time()
    minetest.chat_send_all(#list .. "   " .. after-before)
    minetest.log(#list .. "," .. after-before)
end

local function callEveryHalfSecond()
    getObjectsInAreaTest()
    minetest.after(0.5, callEveryHalfSecond) -- Schedule itself to run again in 0.5 seconds
end

minetest.register_chatcommand("getList", {
    privs = {
        interact = true,
    },
    func = callEveryHalfSecond,
})
  
  -- Define the test_entity (optional, replace with your desired entity definition)
minetest.register_entity("entity_test:test_entity", {
    description = "Test Entity",
    initial_properties = {
        hp_max = 200,
        physical = true,
        collide_with_objects = false,
        collisionbox = {-0.3, -0.3, -0.3, 0.3, 0.3, 0.3},
        visual = "wielditem",
        visual_size = {x = 0.4, y = 0.4},
        textures = {""},
        spritediv = {x = 1, y = 1},
        initial_sprite_basepos = {x = 0, y = 0},
    },
    on_step = function(self)
        local pos = self.object:get_pos()
        pos.z = -50 + math.fmod(minetest.get_us_time()/30000+self.object:get_hp()-51, 100)
        self.object:set_pos(pos)
    end,
})

I start new devtest world, flat mapgen
Go to 0,10,0
/clearobjects
/foo (creates the 100 cycling entities)
/getList (every half second, spits out the number of entities found and how long it took to find them all 10x in a row)

Here is the gif of what that test mod and integration test looks like in-game:

Then, I went ahead and plotted the results:

You'll notice that even though there are 101 Entities flying around, the actual time to getObjectsInArea 10 times is varying significantly based on where they are in relation to the getObjectsInArea's AABB.

Take a look at my implementation, I just always check the mapblocks to that max AABB extents, so it's sub-optimal where we are requesting a spherical radius, because I still just check all mapblocks from -radius to radius in all 3 axis centered around the origin point.

Still, surprised it was only about ~80 Loc for the structure. (also notice I set the mapblock size as my bin size. Perhaps smaller than 16, like 8, might be better. I wouldn't know without field testing (and it'll be game specific, actually)

Desour · 2024-05-11T12:59:28Z

Some feedback / thoughts:

No need to be perfect in terms of performance. Any solution is acceptable for now, as it will outperform the current situation anyways.
luatic: You could use a bounding volume hierarchies (BVHs) in place of of k-d-trees.
A big benefit would be that you can do refitting, instead of removing and inserting an object whenever it moves a tiny bit.
For the BVH construction you can, for example, do the same splitting as for k-d-trees, you just have to also keep track of the bounding boxes.
You don't have to be worried about the overlap: With k-d-trees you have to make your query AABB bigger by the upper bound of all object volumes. This makes the k-d-tree nodes effectively BVH nodes that always overlap by this upper bound. Actual BVH nodes can't have more overlap than that.
Do we actually currently have an upper bound for object collision/selection boxes? Because IIRC for selection boxes the answer is no. (Edit: Oh right, we're just worried about collision boxes.) Please make sure to properly support big objects, e.g. in the worst case by inserting them into a separate vector if they're too big.
You can do tree rebuilding in a separate thread, to get it out of the hot path.
This would go especially well with sfan5's approach (which, if I didn't misunderstand, is basically a non-hierarchical version of what luatic does (aka what the linked paper describes)).
I do like ExeVirus's approach of using a hash map to mapblock-sized boxes. The good thing is that it makes the run time independent of total number of objects, instead it only depends on the number of objects nearby.
However, idk yet how the best way would be to make it work well in arbitrary situations, like very big objects (need to update membership in so many mapblocks), very big query boxes (need to look into so many mapblocks), or one mapblock containing many objects (would be good to have a data structure in there too then, but which?). I'm thinking of something like an octree with a hashmap at each depth. But I can't come up with anything hierarchical / working well in generic cases, that is also simple.

ExeVirus · 2024-05-11T13:13:52Z

DS: You can't see here, but I've made a lot of progress on the hash map solution for us. I'm only focusing on server side right now, so selection boxes aren't a consideration. That said, everything in Minetest must have a position within 2^16 of 0,0,0 which helps tremendously, allowing me to pack everything into buckets of size 16x16x16 nodes. We can revisit later if we want to make this changeable as a setting. As for big objects, they just have a plain old single origin position, that's all that's needed right now. My single unordered multi map, is a single layer Octree, and with 10,000 mostly evenly spaced out entities (current benchmark), it's quite performant, haha. I agree this is a super dead simple solution and gets us good performance compared to what we have (by far). Will post a PR hopefully by end of day. Performance of the map can absolutely be optimized later, it's just very flexible of a solution.

ExeVirus · 2024-05-12T02:39:02Z

Alrighty, there's my current merge^^^

Here's the performance results:

Summary for the lazy:

Exactly the performance gains you'd expect from a spatial map: scales well when getting entities in radius/range (which happens for all collisions, I found out).
Has no effect on object creation - it is dwarfed by the cost to allocate memory for the new object

kno10 · 2024-10-25T22:11:16Z

Depending on your data distribution, either a k-d-tree, gridfile or an octree can be a good choice.
Here I would assume that a grid based on the map chunks and then an octree within the map chunks can be very elegant and efficient (because the grid corresponds to loading/unloading of chunks, and this level is somewhat "proven coarse enough for many applications").
But for simplicity, a pure octree is also worth exploring (not that "one octree per chunk" is that much more complicated). I doubt that more fancy things such as R-trees, VP-trees, etc. are beneficial here.

ExeVirus · 2024-10-26T03:26:56Z

@kno10

Agree those are the options. I implemented a single-layer octree using std::multimap aligned at the mapblock (16x16x16) boundaries. That PR has gone through multiple reviewers and test rounds. At this point it is technically ready for merge.

However, one of the newer Core Devs, appgurueu, has been working on a dynamic k-d branch, working on it as of even last week.

We could absolutely just merge mine as is and probably get roughly the same performance, but we are waiting for appgurueu to be (a) done, (b) reviewed and (c) tested. That might take a long time.

I still support his efforts, which is why I have not made a stink about merging my implementation. It's simple, does get performance benefits, but this is a case where I'm willing to let great get in the way of good since I'm not a Core Dev.

kno10 · 2024-10-26T12:58:32Z

Speaking of a k-d-tree, one reason why I came to this old thread is biomes. Biome computations are currently linear in the number of biomes, but because of limitations of the current biome API, we see games try to trick the system by registering sub-biomes and variants (in particular, performing y slicing of biomes). At some point, the number of biomes may affect the performance of the biome lookup.
A k-d-tree might be an interesting solution to improve the performance; it would likely be built only once at game start, over (currently) only the heat and humidity points of the biomes (i.e., 2d). Here, a k-d-tree is likely better because humidity and heat are more like normal distributed. Maybe @appgurueu can use some of his code for that, too?

ExeVirus · 2025-01-16T18:12:53Z

Okay, the data is in (more can roll in, but unlikely to change the picture too much).

Background

Yourland is one of the many Luanti servers. They typically have 20-30 players on, with daily peaks as high as 50 players. They used to have server core latencies in the 2.5-2.6 seconds, all due to linear search over some 6000 entities, as the server has a whole battle royale style gameplay between players and hundreds of entities. Each entity, with either collisions or their behaviors, end up searching for nearby entities within radius/rectangle a TON, so this issue is to address those problems.

Performance metrics on the server were enabled and logged, and both Appgurueu and I have candidate branches that were tested on production for this server. Note how we went from 2.5-2.6 second latencies down to more like 0.12-0.16 second latencies under heavy load, on average in the results section.

Data Details

We have 16 days of data using my std::multimap solution, and 7 days of data using Appgurueu's dynamic k-d tree branch.

The data collected is the luanti-reported Active Object Count, Player Count, and Core Latency over those time frames. Attached is the excel sheet containing that data, and analysis.

As an example for how the data was captured. this is a query for player count that I used for Appgurueu's branch over the January 9 -> today data pull

http://94.130.64.154:9090/api/v1/query_range?query=minetest_core_player_number%7Bjob%3D%22yourland_main%22%7D&start=2025-01-09T08:29:34-05:00&end=2025-01-16T08:29:34-05:00&step=1m

Results

Active Object to Server Frames Per Second

Number of Active Objects	Average Frames/sec, Virus Branch	Average Frames/sec, Guru Branch
0-1000	20.84	20.90
1000-2000	18.84	20.12
2000-3000	15.65	16.27
3000-4000	12.02	12.41
4000-5000	9.05	9.22
5000-6000	7.46	7.42

Appgurueu's Branch is 2.61% faster than mine when using Active Object Count as the dependent datapoint.

Player Count to Server Frames Per Second

Player Count	Average Frames/sec, Virus Branch	Average Frames/sec, Guru Branch
0-5	21.06	20.94
6-10	20.86	20.59
11-15	19.93	18.81
16-20	17.10	15.16
21-25	13.05	11.16
26-30	10.12	9.34
31-35	8.20	7.76
36-40	6.93	7.24
41-45	6.28	6.35
46-50	7.32 (outlier, ignoring)	5.85

My Branch is 5.13% faster than Appgurueu's when using Player Count as the dependent datapoint.

Which to use?

Theoretically, player count should have less affect than active object count, but both are dependent on what players and the server are doing. In this case, I ran the regression and player count accounts for 65% of the variance in server frames per second, and active object count only being around 55%.

With that in mind, I still say we should mostly rely on the active object data above, not the player count data.

And so: we have validated test results: 2-3% improvement in performance of Appgurueu's branch over mine.

Conclusions

2-3 % isn't nothing, but of course both implementations could likely be further optimized, blowing this data out of the water.
Both branches have very similar results, especially under heavy loads, where collision math starts to take over the delays not these spatial maps
One could argue my implementation is simpler, while Appgurueu's is more flexible. I you do your entire game in the 0,0,0 node (tiny) and have like 1000 entities in that one spot, Appgurueu's implementation has a large chance of being faster than mine, which treats all those as one bucket.

I leave Core Dev team to make this call, for once we have two solutions not zero ;)

Appgurueu's Branch k-d-trees	ExeVirus's Branch multimap

all_data_and_analysis.xlsx

sfan5 · 2025-01-17T11:12:24Z

I say we go with the simpler of the two.

kno10 · 2025-01-17T12:49:20Z

Note that the collision detection code also has room for improvements (it currently materializes a lot of boxes and iterates over them repeatedly), and I'd hope it doesn't remain the bottleneck forever.

Generally I would assume that entities will somewhat spread out, and hence a grid+octree approach is likely to perform better. But then there are the minecraft farms that cram a lot of mobs into a tiny space. But I don't know if either of the indexes helps a lot on these cases, they collide "by design".

ExeVirus · 2025-01-17T13:12:32Z

I'd hope it doesn't remain the bottleneck forever.

It's the very next thing I (and probably Appgurueu) will be trying to optimize.

I don't know if either of the indexes helps a lot on these cases, they collide "by design".

They won't unless the mod author for those entities has really optimized the entity 'view-distance'. Right now, most entity mods just do a radius check around the entity center. I could see a lot better optimizations in the future where mod authors actually only look mostly in-front of the entity:

That would give them a smaller radius to check, giving better perf, but that's out of scope for the engine.

appgurueu · 2025-01-17T16:46:42Z

Thanks for evaluating the data @ExeVirus, and thanks @Bastrabun for collecting it. This confirms that in this real-world scenario, there is not much of a total difference in the impact the two approaches have. (Arguably it doesn't make a definitive general statement about the relative performance of the two implementations though, since other bottlenecks will by Amdahl's law have obstructed the comparison.)

A k-d-tree might be an interesting solution to improve the performance; it would likely be built only once at game start, over (currently) only the heat and humidity points of the biomes (i.e., 2d). Here, a k-d-tree is likely better because humidity and heat are more like normal distributed. Maybe @appgurueu can use some of his code for that, too?

Yes, my code is reusable. It includes an implementation of a static k-d-tree (since the dynamic data structure is just a forest of those). It is templated over dimension and datatype. So it should work out of the box for this use case (well I suppose I'd have to add a nearest neighbor query, but that's trivial), though it's hard to estimate what the performance impact would be - I think we'd have to just try it and see. (Note also that for this to be easily optimizable, the "weight" feature must not be used.) For very few biomes it definitely wouldn't be worth it. Do you have a ballpark number? (Optimizing biome logic in mapgen should probably go in a different issue. Do we have profiling data e.g. in Mineclon* showing that this is a problem yet?)

Note that the collision detection code also has room for improvements (it currently materializes a lot of boxes and iterates over them repeatedly), and I'd hope it doesn't remain the bottleneck forever.

Indeed, that's a (bunch of) different issue(s) we're aware of and hope to resolve soon™.

But then there are the minecraft farms that cram a lot of mobs into a tiny space. But I don't know if either of the indexes helps a lot on these cases, they collide "by design".

It depends a lot on the queries. No index can optimize the case where there are simply many query results.

exe_virus' index is grid-based. As an extreme example, if all objects are in a single grid cell (a 16³ block), there will be no speedup. My index is based on k-d-trees, meaning it should be able to handle this fine, if the queries use small radii.

Generally I would assume that entities will somewhat spread out, and hence a grid+octree approach is likely to perform better

Better than what? Just a grid? Or a forest of static k-d-trees? Especially for the latter the answer isn't obvious to me (but it isn't obvious for the former either).

I say we go with the simpler of the two.

If all other things were equal, I would agree with this. However they aren't. I would like to reiterate the flexibility of my solution. Not only can it possibly be reused in completely different areas in the future (as e.g. kno10 is suggesting), but it also doesn't really make as strong, slightly "arbitrary" assumptions about how entities are distributed.

What exe has said is correct: One property that's easy to see is that my solution is scaling invariant. If you scale everything (entity positions and queries) up or down by a factor of 1000, my solution retains its performance characteristics. A grid-based / bucketing solution with fixed buckets can not provide this ¹. Since we want to be a general game engine, I think a more flexible solution is the way to go. If you want a more real-world-ish example of what I have in mind here, think about something like Conquer.

I would also like to point out that the complexity difference is, in my opinion, not that terrible, and both are well-tested solutions which I don't expect to require a lot of maintenance. We're looking at something like ~500 loc for my implementation versus ~250 loc for exe's implementation (.h + .cpp).

Arguably other reasons mean that mods currently do well to sort of stick to this scale (e.g. mapblock max entity limit, some hardcoded constants), so practically it doesn't matter that much yet. ↩

ExeVirus · 2025-01-17T17:26:52Z

The only thing I would like to add, is whichever solution we choose, we'll want to make sure its ready for further queries (raycast, etc.)

Specifically, over in #15227, you can see where having a spatial index of collision boxes (map nodes and entities alike) will likely end up being a part of the collision performance solution.

There I mention:

We'd want line, parabola, moving box, static box, sphere, etc.
queries against it.

Probably more kinds of queries.

So: @appgurueu, is k-d trees a good flexible solution for all these different types of queries?

If so, then I would also put my recommendation behind k-d trees, as my map will likely have suboptimal, but okay-ish solutions to all those queries (similar to what is happening here).

And, if we add all those queries, there's no way that spatial index doesn't become complex. Just no way around it.

appgurueu · 2025-01-19T02:33:28Z

line, parabola, moving box, static box, sphere, etc. queries

I would argue that adding queries to a k-d-tree-based solution is in principle not much harder than to add them to a grid-based solution, for the simple reason that a k-d-subtree corresponds to a cuboid region. Hence, in general, if you've got some query that you can implement for cuboids, you can relatively easily apply it to k-d-trees recursively.

As for the specific query types mentioned:

Boxes (aka "cuboids" or "ranges") is what we already implement. Spheres aren't very important: Spheroid vs cuboid is only a very modest constant factor difference (which likely diminishes further with more complex logic needed for the spheroid).

Line is something we don't need: This is a point index, hence only volume queries are really useful. A raycast isn't a line query, it has to be a "moving box" query because we need to account for the selectionboxes of the objects.

I agree that such queries are important, because a search volume that grows quadratically or even cubically is much worse than one which grows linearly (as pointed out in the issue). I think it should be possible to accomodate these queries with k-d-trees as well, though in all fairness the code for grids is already essentially there (raycast code, classes like VoxelLineIterator and friends).

And, if we add all those queries, there's no way that spatial index doesn't become complex. Just no way around it.

I don't think it matters much for the overall discussion, but I don't think so: Once we have cuboids and parallelepipeds, we should pretty much be all set on the essence of the spatial indexing front.

sfan5 added Bug Issues that were confirmed to be a bug @ Server / Client / Env. Performance labels May 5, 2024

rubenwardy mentioned this issue May 5, 2024

Improve object-object collision performance #6453

Closed

appgurueu linked a pull request May 9, 2024 that will close this issue

Add spatial index for objects #14631

Open

ExeVirus mentioned this issue May 12, 2024

Add Active Object Spatial Cache #14643

Open

9 tasks

appgurueu added this to the 5.10.0 milestone Jun 16, 2024

farooqkz mentioned this issue Sep 29, 2024

✨ [Modders and game devs] What's the 3 features you need the most? #15092

Closed

SmallJoker modified the milestones: 5.10.0, 5.11.0 Oct 13, 2024

sfan5 modified the milestones: 5.11.0, 5.12.0 Jan 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The data structure problem with active objects #14613

The data structure problem with active objects #14613

sfan5 commented May 5, 2024 •

edited

Loading

appgurueu commented May 5, 2024 •

edited

Loading

ExeVirus commented May 5, 2024 •

edited

Loading

sfan5 commented May 5, 2024 •

edited

Loading

Bastrabun commented May 5, 2024

ExeVirus commented May 5, 2024 •

edited

Loading

sfan5 commented May 5, 2024 •

edited

Loading

ExeVirus commented May 5, 2024 •

edited

Loading

fluxionary commented May 5, 2024

appgurueu commented May 5, 2024

ExeVirus commented May 6, 2024

appgurueu commented May 6, 2024 •

edited

Loading

ExeVirus commented May 7, 2024

ExeVirus commented May 8, 2024 •

edited

Loading

Desour commented May 11, 2024 •

edited

Loading

ExeVirus commented May 11, 2024 via email •

edited

Loading

ExeVirus commented May 12, 2024 •

edited

Loading

kno10 commented Oct 25, 2024 •

edited

Loading

ExeVirus commented Oct 26, 2024

kno10 commented Oct 26, 2024

ExeVirus commented Jan 16, 2025 •

edited

Loading

sfan5 commented Jan 17, 2025

kno10 commented Jan 17, 2025

ExeVirus commented Jan 17, 2025

appgurueu commented Jan 17, 2025

ExeVirus commented Jan 17, 2025 •

edited

Loading

appgurueu commented Jan 19, 2025

The data structure problem with active objects #14613

The data structure problem with active objects #14613

Comments

sfan5 commented May 5, 2024 • edited Loading

Problem

Solution requirements

Real-world example data

appgurueu commented May 5, 2024 • edited Loading

ExeVirus commented May 5, 2024 • edited Loading

sfan5 commented May 5, 2024 • edited Loading

Bastrabun commented May 5, 2024

ExeVirus commented May 5, 2024 • edited Loading

sfan5 commented May 5, 2024 • edited Loading

ExeVirus commented May 5, 2024 • edited Loading

fluxionary commented May 5, 2024

appgurueu commented May 5, 2024

ExeVirus commented May 6, 2024

appgurueu commented May 6, 2024 • edited Loading

ExeVirus commented May 7, 2024

ExeVirus commented May 8, 2024 • edited Loading

Desour commented May 11, 2024 • edited Loading

ExeVirus commented May 11, 2024 via email • edited Loading

ExeVirus commented May 12, 2024 • edited Loading

kno10 commented Oct 25, 2024 • edited Loading

ExeVirus commented Oct 26, 2024

kno10 commented Oct 26, 2024

ExeVirus commented Jan 16, 2025 • edited Loading

Background

Data Details

Results

Active Object to Server Frames Per Second

Appgurueu's Branch is 2.61% faster than mine when using Active Object Count as the dependent datapoint.

Player Count to Server Frames Per Second

My Branch is 5.13% faster than Appgurueu's when using Player Count as the dependent datapoint.

Which to use?

Conclusions

sfan5 commented Jan 17, 2025

kno10 commented Jan 17, 2025

ExeVirus commented Jan 17, 2025

appgurueu commented Jan 17, 2025

Footnotes

ExeVirus commented Jan 17, 2025 • edited Loading

appgurueu commented Jan 19, 2025

sfan5 commented May 5, 2024 •

edited

Loading

appgurueu commented May 5, 2024 •

edited

Loading

ExeVirus commented May 5, 2024 •

edited

Loading

sfan5 commented May 5, 2024 •

edited

Loading

ExeVirus commented May 5, 2024 •

edited

Loading

sfan5 commented May 5, 2024 •

edited

Loading

ExeVirus commented May 5, 2024 •

edited

Loading

appgurueu commented May 6, 2024 •

edited

Loading

ExeVirus commented May 8, 2024 •

edited

Loading

Desour commented May 11, 2024 •

edited

Loading

ExeVirus commented May 11, 2024 via email •

edited

Loading

ExeVirus commented May 12, 2024 •

edited

Loading

kno10 commented Oct 25, 2024 •

edited

Loading

ExeVirus commented Jan 16, 2025 •

edited

Loading

ExeVirus commented Jan 17, 2025 •

edited

Loading