Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The data structure problem with active objects #14613

Open
sfan5 opened this issue May 5, 2024 · 26 comments · May be fixed by #14631
Open

The data structure problem with active objects #14613

sfan5 opened this issue May 5, 2024 · 26 comments · May be fixed by #14631
Labels
Bug Issues that were confirmed to be a bug Performance @ Server / Client / Env.
Milestone

Comments

@sfan5
Copy link
Collaborator

sfan5 commented May 5, 2024

There's a closely related older issue (#6453) but I felt like a fresh start here was better.

Problem

tl;dr: optimizing anything else server-side is basically a waste until this is solved, skip to the next section.

The nice people from the Your Land server sent me several profiling traces of their production server and a common theme is that about 40% of time (on the server thread) is spent just iterating the entity list (average, not peak!).
Now their mods use get_objects_inside_radius a bunch, but this is not a sole cause and I expect any sizeable server will be plagued with this exact issue.

The issue are spatial queries on the entity list, which happen more often than one might think:

  • Lua get_objects_inside_radius, get_objects_in_area
  • Entity collision detection (sometimes)
  • Raycasts (sometimes)
  • ServerEnvironment::getAddedActiveObjects

All of these are O(n) with n being the number of entities. When you consider that every entity does collision detection this becomes O(n²) right in the main thread.

Solution requirements

What we need is a new data structure.

It must:

  • support fast lookup by entity ID
  • be iterable in the order of entity ID
  • support deletion and insertion
  • support fast spatial AABB queries (= getObjectsInArea) on the entity position
  • efficiently handle entity position updates, as these happen often
  • position updates must reflect in lookups instantly

external circumstances:

  • there is almost always opportunity for positions to change between lookups
    • this means any kind of "one-off cache" (that needs to be thrown away once an update happens) is unlikely to produce any hits
  • spatial distance queries (= getObjectsInsideRadius) can be emulated on top of AABB queries and are not an important point
  • being safe from modification during iteration is also important (Use performant and safe data structure for active objects #13880) but I think we can tack that on if the rest is adressed

Finally we will also need to figure out how to reliably propagate position updates up to the container.
There are many ideas in the form of "just add a cache to <thing>" that come to mind, but keeping it in sync is generally the trickier part than designing the structure.

Real-world example data

(thanks to @Bastrabun)
There are 4500 entities. About 1400 move often, 3100 are static.

@appgurueu
Copy link
Contributor

appgurueu commented May 5, 2024

I am currently working on a solution to this. The data structure I'm implementing is a dynamization of k-d-trees, meaning it would store O(log n) many k-d-trees with sizes of powers of two. This means:

  • Amortized O(log n) insertions and O((log n)²) deletions
  • O(log(n) * n^(2/3) + k) worst case range queries (assuming distinct object positions) where k is the number of results, average case should be better
    • To give an example of how good the performance can be, a near-point-query should have O((log n)²) time complexity.
  • Raycast queries can get special optimization. The basic "bounding area" optimization also helps a bit.

Josiah and I tried optimizing this by using an off-the-shelf R-tree library a while ago. This failed due to bad constant factors for our medium-size workloads. It's also a bit overkill: It is centered around boxes. It would have made supporting higher variance in box dimensions easier. But currently we don't need that; we can just approximate the relatively small objects as points for now.

I'm optimistic that with this new approach, I have a good chance of getting the constant factors low enough since there are plenty of knobs to tweak:

  • There are three possible ways to construct k-d-trees;
  • I can use a vector for up to, say, 32 or 64 entity ID - position pairs, and only use k-d-trees for more entries;
  • There are various ways of tweaking the "dynamization" data structure. For example we could use only a single k-d-tree, which would worsen updates to O(sqrt n log n), but improve queries to O(sqrt n + k) worst case (and much better in average case). This would require k-d-trees to be able to get "out of shape" though, so storing them in arrays wouldn't be an option anymore; we would need more allocations.

@ExeVirus
Copy link
Contributor

ExeVirus commented May 5, 2024

I've had an idea tossing around for a while now on the subject, it would need optimized but generally:

Object ID vector of pointers (std vector)

Std Vector of unused entries

Std unordered map with three levels:

Each level represents an Octree subdivision,

With the bottom layer corresponding with the 16,16,16 mapblock boundary (to align with when entities need to be put to sleep or awoken). This bottom layer is two std::vector()s to hold unlimited entities, same on-erase mechanics as the main std vector list of objects.

So, you have two lookup methods: by Octree bin (basically x,y,z) and by id in a vector.

When insertions occur in the vector, they are mirrored in the unordered maps. When positions updates occur, just need to verify we haven't changed mapblock, if we have, erase our entry in that mapblock vector and insert into the new mapblock vector. So the tradeoff is position updates are more expensive.

Finally, the main unordered map stores pointers, at least at the highest level, so as to keep RAM usage down. When I last looked at it, I think it was about 15 MB for the whole optimization stack at startup, with a couple kilobyte allocations when the first entities are inserted to mapblocks.

Note: will be updating this comment as I do an actual design on the task - might be able to actually do this one.

Update 1: Nice we already have an object manager class using a std::map structure with garbage collecting, so that takes care of my first vector manager. Wonder why we need to use maps? Like, couldn't the vector index be the object id?

Update 2:

The locations that for loop over all objects at ton during runtime are:

In activeObejctManager.cpp
// Get vector sorted by distance around origin point within max distance
ActiveObjectMgr::getActiveObjects(const v3f &origin, f32 max_d, std::vector<DistanceSortedActiveObject> &dest)

// Raycasts 
ActiveObjectMgr::getActiveSelectableObjects(const core::line3d<f32> &shootline)

// objects inside radius
ActiveObjectMgr::getObjectsInsideRadius(const v3f &pos, float radius, std::vector<ServerActiveObject *> &result, std::function<bool(ServerActiveObject *obj)> include_obj_cb)

// objects inside area
ActiveObjectMgr::getObjectsInArea(const aabb3f &box, std::vector<ServerActiveObject *> &result, std::function<bool(ServerActiveObject *obj)> include_obj_cb)

// Maintain Inactive/Active Objects
ActiveObjectMgr::getAddedActiveObjectsAroundPos(v3f player_pos, f32 radius, f32 player_radius, const std::set<u16> current_objects, std::vector<u16> &added_objects)
 
 

@sfan5
Copy link
Collaborator Author

sfan5 commented May 5, 2024

So I took a walk and here's my idea:

It bases around the "one-off cache" approach I said wouldn't work but uses some tricks to make it workable.
It also presumes a reliable mechanism exists by which the container is notified when an object changes position.

Keep the ModifySafeMap<u16, SAO> as base data structure.
A cache map is created as a std::multimap<float = Z pos of object, u16 = object ID>.
A list/vector/set of object IDs is created called uncached.

The gist of it is that objects exist in either cached (fast lookup) or uncached (linear lookup as before). Since there is no requirement that the cache is accurate at all times we can cleverly avoid (defer) rebuild costs.

When an object is inserted it is put into the uncached list.
When an object is deleted it is removed from the uncached list.
When an object changes position it is added to the uncached list.

The cache has to be (re-)created at some point. This should happen when the uncached list gets too large, let's say if it's at least 64 entries and >=10% of all objects.

When an AABB query happens the cache is consulted (which speeds up the query, albeit only in one dimension). The uncached map is also consulted and takes priority over what the cache returned for a given object ID.

spatial lookup performance: O(log(n) + m) (n=cached.size(), m=uncached.size())
the cache rebuild that would need to happen once in a while is O(n * log n) (n=all objects)


let's go through the checklist:

  • lookup by ID: defer to base structure
  • iterable in ID-order: defer to base structure
  • deletion and insertion: no problem, described above
  • spatial queries: yep, this is what the cache is for
  • efficiently handle updates: virtually no cost
  • updates apply instantly: yes

bonus:

  • safe from modification during iteration: no iterator needs to be held of the two new structures + the base structure is already safe
  • bookkeeping does not require knowing from where objects have moved, just that they have moved

caveats:

  • doesn't work if too many objects constantly move

If you read closely you will notice that the requirements for the cache structure are very minimal.
It literally only needs to:

  • be constructable
  • supports spatial queries

It can be transparently replaced with any other — better — data structure.

@Bastrabun
Copy link

Thanks for addressing this problem. If we can help any further, please let me know - whatever readouts, logs, temporary code compiled into the engine ...

For reference YL bugtracker 3723, 6325, 6326, 5289, administration 188

@ExeVirus
Copy link
Contributor

ExeVirus commented May 5, 2024

Sfan: Why is just Z-position enough to cache? Does that speed up any of our lookups by position by enough to matter?

Also, why are position updates no cost?

@sfan5
Copy link
Collaborator Author

sfan5 commented May 5, 2024

Thanks for addressing this problem. If we can help any further, please let me know - whatever readouts, logs, temporary code compiled into the engine ...

I think the most likely next step when someone has working code is for you to test the performance in a real situation.
What would be useful to know right now however is: how many active objects do you have usually?

Sfan: Why is just Z-position enough to cache?

It may not be, but see the last section.

Also, why are position updates no cost?

After updating an object the cost is paid during every following spatial query, that's the m component in the big-O notation.

@ExeVirus
Copy link
Contributor

ExeVirus commented May 5, 2024

Sfan: Why is just Z-position enough to cache?

It may not be, but see the last section.

Oh, that makes perfect sense. Okay yeah I can see a way to get my idea to fit in your world relatively painlessly.

Just take the int16(x) >> 4 | int16(y) >> 4 | int16(z) >> 4. That will put all entities into mapblock sized bins and easily fit in 64 bits.

Should still yield nice speedups for objects in radius, raycast, and whatnot, just gotta make the algorithms for those lookups. Could be as simple as running the same algorithms at 16x16x16 size first and iterate over those relevant bins instead of the whole active object list like we currently do.

Bin size should be played with, obviously. Because 1x1x1 bins wouldn't yield much value I suspect.

@fluxionary
Copy link
Contributor

i tried to take a stab at a better data structure for this. the result, in lua, is here https://github.com/fluxionary/minetest-futil/blob/main/data_structures/point_search_tree.lua. the biggest issue with it is that it can't be re-balanced when new points are added or removed. i found a paper about that but never got around to implementing it https://arxiv.org/abs/1410.5420

@appgurueu
Copy link
Contributor

I also have a Lua implementation of k-d-trees. I think trying to rebalance k-d-trees is a dead end. I tried for a while, it doesn't seem to really be feasible to me.

But it is possible to build an efficient dynamic data structure out of a forest of static k-d-trees (see "transforming static data structures to dynamic structures"), which is what I'm suggesting here and currently implementing. I can't tell whether this will hold up in real world usage, but implementing and testing it is the only way to find out. I'm optimistic that we can get decent constant factors because there are plenty of knobs to tweak.

@ExeVirus
Copy link
Contributor

ExeVirus commented May 6, 2024

Thought through sfan's idea more (worked on it). There's also a remove (when object is no longer active) that is also O(log(n) + m)

@appgurueu
Copy link
Contributor

appgurueu commented May 6, 2024

Alright, I've got a seemingly working (according to a quick randomized test against a naive implementation) dynamic forest of k-d-trees data structure whipped up, feel free to take a peek at my branch. The big O for this should be good (though not quite as good as what I advertised initially; for example deletion is amortized O(log(n)²) rather than O(log n) with this implementation), what remains to be seen is how it holds up in practice (potentially after some optimization), and integration into the active object manager, which I may get to on the weekend.

@ExeVirus
Copy link
Contributor

ExeVirus commented May 7, 2024

Will take a look, sfans idea was short enough that I just jumped right into server::active object mgr integration, the only not straight forward thing is how to signal object position updates. In mine I just check if last position is different from the new position after step().

I will probably not write tests for your structure directly, as MT is a much better stress test of complexity than I can imagine haha, but I can at least look for obvious things.

Not quite done with my version, but perhaps ready for integration tests against servers/mods by end of week.

@ExeVirus
Copy link
Contributor

ExeVirus commented May 8, 2024

Okay, first draft version working correctly? I'm certain I have missed spots where an object position changes and I need to invalidate the cache, such as object:set_pos(), but I have data to back up that I have a working solution, albeit only 10% optimized.

Here is the branch:

I am testing with a custom mod with devtest, this is the init.lua:

minetest.register_chatcommand("foo", {
    privs = {
        interact = true,
    },
    func = function(name, param)
        for i = 1, 100 do
            local z = i-50
            local obj = minetest.add_entity({x = i-50, y = 20, z = z}, "entity_test:test_entity", nil)
            obj:set_hp(z+1)
        end
        return true
    end,
})

local function getObjectsInAreaTest()
    local before = minetest.get_us_time()
    local list = 0
    for i = 1, 10 do
        list = minetest.get_objects_in_area({x=0,y=0,z=-50},{x=50,y=50,z=0})
    end
    local after = minetest.get_us_time()
    minetest.chat_send_all(#list .. "   " .. after-before)
    minetest.log(#list .. "," .. after-before)
end

local function callEveryHalfSecond()
    getObjectsInAreaTest()
    minetest.after(0.5, callEveryHalfSecond) -- Schedule itself to run again in 0.5 seconds
end

minetest.register_chatcommand("getList", {
    privs = {
        interact = true,
    },
    func = callEveryHalfSecond,
})
  
  -- Define the test_entity (optional, replace with your desired entity definition)
minetest.register_entity("entity_test:test_entity", {
    description = "Test Entity",
    initial_properties = {
        hp_max = 200,
        physical = true,
        collide_with_objects = false,
        collisionbox = {-0.3, -0.3, -0.3, 0.3, 0.3, 0.3},
        visual = "wielditem",
        visual_size = {x = 0.4, y = 0.4},
        textures = {""},
        spritediv = {x = 1, y = 1},
        initial_sprite_basepos = {x = 0, y = 0},
    },
    on_step = function(self)
        local pos = self.object:get_pos()
        pos.z = -50 + math.fmod(minetest.get_us_time()/30000+self.object:get_hp()-51, 100)
        self.object:set_pos(pos)
    end,
})
  1. I start new devtest world, flat mapgen
  2. Go to 0,10,0
  3. /clearobjects
  4. /foo (creates the 100 cycling entities)
  5. /getList (every half second, spits out the number of entities found and how long it took to find them all 10x in a row)

Here is the gif of what that test mod and integration test looks like in-game:

101 Entity Test

Then, I went ahead and plotted the results:
image

You'll notice that even though there are 101 Entities flying around, the actual time to getObjectsInArea 10 times is varying significantly based on where they are in relation to the getObjectsInArea's AABB.

Take a look at my implementation, I just always check the mapblocks to that max AABB extents, so it's sub-optimal where we are requesting a spherical radius, because I still just check all mapblocks from -radius to radius in all 3 axis centered around the origin point.

Still, surprised it was only about ~80 Loc for the structure. (also notice I set the mapblock size as my bin size. Perhaps smaller than 16, like 8, might be better. I wouldn't know without field testing (and it'll be game specific, actually)

@appgurueu appgurueu linked a pull request May 9, 2024 that will close this issue
@Desour
Copy link
Member

Desour commented May 11, 2024

Some feedback / thoughts:

  • No need to be perfect in terms of performance. Any solution is acceptable for now, as it will outperform the current situation anyways.
  • luatic: You could use a bounding volume hierarchies (BVHs) in place of of k-d-trees.
    A big benefit would be that you can do refitting, instead of removing and inserting an object whenever it moves a tiny bit.
    For the BVH construction you can, for example, do the same splitting as for k-d-trees, you just have to also keep track of the bounding boxes.
    You don't have to be worried about the overlap: With k-d-trees you have to make your query AABB bigger by the upper bound of all object volumes. This makes the k-d-tree nodes effectively BVH nodes that always overlap by this upper bound. Actual BVH nodes can't have more overlap than that.
  • Do we actually currently have an upper bound for object collision/selection boxes? Because IIRC for selection boxes the answer is no. (Edit: Oh right, we're just worried about collision boxes.) Please make sure to properly support big objects, e.g. in the worst case by inserting them into a separate vector if they're too big.
  • You can do tree rebuilding in a separate thread, to get it out of the hot path.
    This would go especially well with sfan5's approach (which, if I didn't misunderstand, is basically a non-hierarchical version of what luatic does (aka what the linked paper describes)).
  • I do like ExeVirus's approach of using a hash map to mapblock-sized boxes. The good thing is that it makes the run time independent of total number of objects, instead it only depends on the number of objects nearby.
    However, idk yet how the best way would be to make it work well in arbitrary situations, like very big objects (need to update membership in so many mapblocks), very big query boxes (need to look into so many mapblocks), or one mapblock containing many objects (would be good to have a data structure in there too then, but which?). I'm thinking of something like an octree with a hashmap at each depth. But I can't come up with anything hierarchical / working well in generic cases, that is also simple.

@ExeVirus
Copy link
Contributor

ExeVirus commented May 11, 2024 via email

@ExeVirus
Copy link
Contributor

ExeVirus commented May 12, 2024

Alrighty, there's my current merge^^^

Here's the performance results:

performance

Summary for the lazy:

  • Exactly the performance gains you'd expect from a spatial map: scales well when getting entities in radius/range (which happens for all collisions, I found out).
  • Has no effect on object creation - it is dwarfed by the cost to allocate memory for the new object

@kno10
Copy link
Contributor

kno10 commented Oct 25, 2024

Depending on your data distribution, either a k-d-tree, gridfile or an octree can be a good choice.
Here I would assume that a grid based on the map chunks and then an octree within the map chunks can be very elegant and efficient (because the grid corresponds to loading/unloading of chunks, and this level is somewhat "proven coarse enough for many applications").
But for simplicity, a pure octree is also worth exploring (not that "one octree per chunk" is that much more complicated). I doubt that more fancy things such as R-trees, VP-trees, etc. are beneficial here.

@ExeVirus
Copy link
Contributor

@kno10

Agree those are the options. I implemented a single-layer octree using std::multimap aligned at the mapblock (16x16x16) boundaries. That PR has gone through multiple reviewers and test rounds. At this point it is technically ready for merge.

However, one of the newer Core Devs, appgurueu, has been working on a dynamic k-d branch, working on it as of even last week.

We could absolutely just merge mine as is and probably get roughly the same performance, but we are waiting for appgurueu to be (a) done, (b) reviewed and (c) tested. That might take a long time.

I still support his efforts, which is why I have not made a stink about merging my implementation. It's simple, does get performance benefits, but this is a case where I'm willing to let great get in the way of good since I'm not a Core Dev.

@kno10
Copy link
Contributor

kno10 commented Oct 26, 2024

Speaking of a k-d-tree, one reason why I came to this old thread is biomes. Biome computations are currently linear in the number of biomes, but because of limitations of the current biome API, we see games try to trick the system by registering sub-biomes and variants (in particular, performing y slicing of biomes). At some point, the number of biomes may affect the performance of the biome lookup.
A k-d-tree might be an interesting solution to improve the performance; it would likely be built only once at game start, over (currently) only the heat and humidity points of the biomes (i.e., 2d). Here, a k-d-tree is likely better because humidity and heat are more like normal distributed. Maybe @appgurueu can use some of his code for that, too?

@ExeVirus
Copy link
Contributor

ExeVirus commented Jan 16, 2025

Okay, the data is in (more can roll in, but unlikely to change the picture too much).

Background

Yourland is one of the many Luanti servers. They typically have 20-30 players on, with daily peaks as high as 50 players. They used to have server core latencies in the 2.5-2.6 seconds, all due to linear search over some 6000 entities, as the server has a whole battle royale style gameplay between players and hundreds of entities. Each entity, with either collisions or their behaviors, end up searching for nearby entities within radius/rectangle a TON, so this issue is to address those problems.

Performance metrics on the server were enabled and logged, and both Appgurueu and I have candidate branches that were tested on production for this server. Note how we went from 2.5-2.6 second latencies down to more like 0.12-0.16 second latencies under heavy load, on average in the results section.

Data Details

We have 16 days of data using my std::multimap solution, and 7 days of data using Appgurueu's dynamic k-d tree branch.

The data collected is the luanti-reported Active Object Count, Player Count, and Core Latency over those time frames. Attached is the excel sheet containing that data, and analysis.

As an example for how the data was captured. this is a query for player count that I used for Appgurueu's branch over the January 9 -> today data pull

http://94.130.64.154:9090/api/v1/query_range?query=minetest_core_player_number%7Bjob%3D%22yourland_main%22%7D&start=2025-01-09T08:29:34-05:00&end=2025-01-16T08:29:34-05:00&step=1m

Results

Active Object to Server Frames Per Second

Number of Active Objects Average Frames/sec, Virus Branch Average Frames/sec, Guru Branch
0-1000 20.84 20.90
1000-2000 18.84 20.12
2000-3000 15.65 16.27
3000-4000 12.02 12.41
4000-5000 9.05 9.22
5000-6000 7.46 7.42

Appgurueu's Branch is 2.61% faster than mine when using Active Object Count as the dependent datapoint.

Image

Player Count to Server Frames Per Second

Player Count Average Frames/sec, Virus Branch Average Frames/sec, Guru Branch
0-5 21.06 20.94
6-10 20.86 20.59
11-15 19.93 18.81
16-20 17.10 15.16
21-25 13.05 11.16
26-30 10.12 9.34
31-35 8.20 7.76
36-40 6.93 7.24
41-45 6.28 6.35
46-50 7.32 (outlier, ignoring) 5.85

My Branch is 5.13% faster than Appgurueu's when using Player Count as the dependent datapoint.

Image

Which to use?

Theoretically, player count should have less affect than active object count, but both are dependent on what players and the server are doing. In this case, I ran the regression and player count accounts for 65% of the variance in server frames per second, and active object count only being around 55%.

With that in mind, I still say we should mostly rely on the active object data above, not the player count data.

And so: we have validated test results: 2-3% improvement in performance of Appgurueu's branch over mine.

Conclusions

  • 2-3 % isn't nothing, but of course both implementations could likely be further optimized, blowing this data out of the water.
  • Both branches have very similar results, especially under heavy loads, where collision math starts to take over the delays not these spatial maps
  • One could argue my implementation is simpler, while Appgurueu's is more flexible. I you do your entire game in the 0,0,0 node (tiny) and have like 1000 entities in that one spot, Appgurueu's implementation has a large chance of being faster than mine, which treats all those as one bucket.

I leave Core Dev team to make this call, for once we have two solutions not zero ;)

Appgurueu's Branch k-d-trees ExeVirus's Branch multimap

all_data_and_analysis.xlsx

@sfan5
Copy link
Collaborator Author

sfan5 commented Jan 17, 2025

I say we go with the simpler of the two.

@kno10
Copy link
Contributor

kno10 commented Jan 17, 2025

Note that the collision detection code also has room for improvements (it currently materializes a lot of boxes and iterates over them repeatedly), and I'd hope it doesn't remain the bottleneck forever.

Generally I would assume that entities will somewhat spread out, and hence a grid+octree approach is likely to perform better. But then there are the minecraft farms that cram a lot of mobs into a tiny space. But I don't know if either of the indexes helps a lot on these cases, they collide "by design".

@ExeVirus
Copy link
Contributor

I'd hope it doesn't remain the bottleneck forever.

It's the very next thing I (and probably Appgurueu) will be trying to optimize.

I don't know if either of the indexes helps a lot on these cases, they collide "by design".

They won't unless the mod author for those entities has really optimized the entity 'view-distance'. Right now, most entity mods just do a radius check around the entity center. I could see a lot better optimizations in the future where mod authors actually only look mostly in-front of the entity:

Image

That would give them a smaller radius to check, giving better perf, but that's out of scope for the engine.

@appgurueu
Copy link
Contributor

Thanks for evaluating the data @ExeVirus, and thanks @Bastrabun for collecting it. This confirms that in this real-world scenario, there is not much of a total difference in the impact the two approaches have. (Arguably it doesn't make a definitive general statement about the relative performance of the two implementations though, since other bottlenecks will by Amdahl's law have obstructed the comparison.)

A k-d-tree might be an interesting solution to improve the performance; it would likely be built only once at game start, over (currently) only the heat and humidity points of the biomes (i.e., 2d). Here, a k-d-tree is likely better because humidity and heat are more like normal distributed. Maybe @appgurueu can use some of his code for that, too?

Yes, my code is reusable. It includes an implementation of a static k-d-tree (since the dynamic data structure is just a forest of those). It is templated over dimension and datatype. So it should work out of the box for this use case (well I suppose I'd have to add a nearest neighbor query, but that's trivial), though it's hard to estimate what the performance impact would be - I think we'd have to just try it and see. (Note also that for this to be easily optimizable, the "weight" feature must not be used.) For very few biomes it definitely wouldn't be worth it. Do you have a ballpark number? (Optimizing biome logic in mapgen should probably go in a different issue. Do we have profiling data e.g. in Mineclon* showing that this is a problem yet?)

Note that the collision detection code also has room for improvements (it currently materializes a lot of boxes and iterates over them repeatedly), and I'd hope it doesn't remain the bottleneck forever.

Indeed, that's a (bunch of) different issue(s) we're aware of and hope to resolve soon™.

But then there are the minecraft farms that cram a lot of mobs into a tiny space. But I don't know if either of the indexes helps a lot on these cases, they collide "by design".

It depends a lot on the queries. No index can optimize the case where there are simply many query results.

exe_virus' index is grid-based. As an extreme example, if all objects are in a single grid cell (a 16³ block), there will be no speedup. My index is based on k-d-trees, meaning it should be able to handle this fine, if the queries use small radii.

Generally I would assume that entities will somewhat spread out, and hence a grid+octree approach is likely to perform better

Better than what? Just a grid? Or a forest of static k-d-trees? Especially for the latter the answer isn't obvious to me (but it isn't obvious for the former either).

I say we go with the simpler of the two.

If all other things were equal, I would agree with this. However they aren't. I would like to reiterate the flexibility of my solution. Not only can it possibly be reused in completely different areas in the future (as e.g. kno10 is suggesting), but it also doesn't really make as strong, slightly "arbitrary" assumptions about how entities are distributed.

What exe has said is correct: One property that's easy to see is that my solution is scaling invariant. If you scale everything (entity positions and queries) up or down by a factor of 1000, my solution retains its performance characteristics. A grid-based / bucketing solution with fixed buckets can not provide this 1. Since we want to be a general game engine, I think a more flexible solution is the way to go. If you want a more real-world-ish example of what I have in mind here, think about something like Conquer.

I would also like to point out that the complexity difference is, in my opinion, not that terrible, and both are well-tested solutions which I don't expect to require a lot of maintenance. We're looking at something like ~500 loc for my implementation versus ~250 loc for exe's implementation (.h + .cpp).

Footnotes

  1. Arguably other reasons mean that mods currently do well to sort of stick to this scale (e.g. mapblock max entity limit, some hardcoded constants), so practically it doesn't matter that much yet.

@ExeVirus
Copy link
Contributor

ExeVirus commented Jan 17, 2025

The only thing I would like to add, is whichever solution we choose, we'll want to make sure its ready for further queries (raycast, etc.)

Specifically, over in #15227, you can see where having a spatial index of collision boxes (map nodes and entities alike) will likely end up being a part of the collision performance solution.

There I mention:

We'd want line, parabola, moving box, static box, sphere, etc.
queries against it.

Probably more kinds of queries.

So: @appgurueu, is k-d trees a good flexible solution for all these different types of queries?

If so, then I would also put my recommendation behind k-d trees, as my map will likely have suboptimal, but okay-ish solutions to all those queries (similar to what is happening here).

And, if we add all those queries, there's no way that spatial index doesn't become complex. Just no way around it.

@appgurueu
Copy link
Contributor

line, parabola, moving box, static box, sphere, etc. queries

I would argue that adding queries to a k-d-tree-based solution is in principle not much harder than to add them to a grid-based solution, for the simple reason that a k-d-subtree corresponds to a cuboid region. Hence, in general, if you've got some query that you can implement for cuboids, you can relatively easily apply it to k-d-trees recursively.

As for the specific query types mentioned:

Boxes (aka "cuboids" or "ranges") is what we already implement. Spheres aren't very important: Spheroid vs cuboid is only a very modest constant factor difference (which likely diminishes further with more complex logic needed for the spheroid).

Line is something we don't need: This is a point index, hence only volume queries are really useful. A raycast isn't a line query, it has to be a "moving box" query because we need to account for the selectionboxes of the objects.

I agree that such queries are important, because a search volume that grows quadratically or even cubically is much worse than one which grows linearly (as pointed out in the issue). I think it should be possible to accomodate these queries with k-d-trees as well, though in all fairness the code for grids is already essentially there (raycast code, classes like VoxelLineIterator and friends).

And, if we add all those queries, there's no way that spatial index doesn't become complex. Just no way around it.

I don't think it matters much for the overall discussion, but I don't think so: Once we have cuboids and parallelepipeds, we should pretty much be all set on the essence of the spatial indexing front.

@sfan5 sfan5 modified the milestones: 5.11.0, 5.12.0 Jan 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Issues that were confirmed to be a bug Performance @ Server / Client / Env.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants