-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The data structure problem with active objects #14613
Comments
I am currently working on a solution to this. The data structure I'm implementing is a dynamization of k-d-trees, meaning it would store O(log n) many k-d-trees with sizes of powers of two. This means:
Josiah and I tried optimizing this by using an off-the-shelf R-tree library a while ago. This failed due to bad constant factors for our medium-size workloads. It's also a bit overkill: It is centered around boxes. It would have made supporting higher variance in box dimensions easier. But currently we don't need that; we can just approximate the relatively small objects as points for now. I'm optimistic that with this new approach, I have a good chance of getting the constant factors low enough since there are plenty of knobs to tweak:
|
I've had an idea tossing around for a while now on the subject, it would need optimized but generally:Object ID vector of pointers (std vector) Std Vector of unused entries Std unordered map with three levels: Each level represents an Octree subdivision, With the bottom layer corresponding with the 16,16,16 mapblock boundary (to align with when entities need to be put to sleep or awoken). This bottom layer is two std::vector()s to hold unlimited entities, same on-erase mechanics as the main std vector list of objects. So, you have two lookup methods: by Octree bin (basically x,y,z) and by id in a vector. When insertions occur in the vector, they are mirrored in the unordered maps. When positions updates occur, just need to verify we haven't changed mapblock, if we have, erase our entry in that mapblock vector and insert into the new mapblock vector. So the tradeoff is position updates are more expensive. Finally, the main unordered map stores pointers, at least at the highest level, so as to keep RAM usage down. When I last looked at it, I think it was about 15 MB for the whole optimization stack at startup, with a couple kilobyte allocations when the first entities are inserted to mapblocks. Note: will be updating this comment as I do an actual design on the task - might be able to actually do this one. Update 1: Nice we already have an object manager class using a std::map structure with garbage collecting, so that takes care of my first vector manager. Wonder why we need to use maps? Like, couldn't the vector index be the object id? Update 2: The locations that for loop over all objects at ton during runtime are: In activeObejctManager.cpp
// Get vector sorted by distance around origin point within max distance
ActiveObjectMgr::getActiveObjects(const v3f &origin, f32 max_d, std::vector<DistanceSortedActiveObject> &dest)
// Raycasts
ActiveObjectMgr::getActiveSelectableObjects(const core::line3d<f32> &shootline)
// objects inside radius
ActiveObjectMgr::getObjectsInsideRadius(const v3f &pos, float radius, std::vector<ServerActiveObject *> &result, std::function<bool(ServerActiveObject *obj)> include_obj_cb)
// objects inside area
ActiveObjectMgr::getObjectsInArea(const aabb3f &box, std::vector<ServerActiveObject *> &result, std::function<bool(ServerActiveObject *obj)> include_obj_cb)
// Maintain Inactive/Active Objects
ActiveObjectMgr::getAddedActiveObjectsAroundPos(v3f player_pos, f32 radius, f32 player_radius, const std::set<u16> current_objects, std::vector<u16> &added_objects)
|
So I took a walk and here's my idea: It bases around the "one-off cache" approach I said wouldn't work but uses some tricks to make it workable. Keep the The gist of it is that objects exist in either When an object is inserted it is put into the The cache has to be (re-)created at some point. This should happen when the When an AABB query happens the spatial lookup performance: let's go through the checklist:
bonus:
caveats:
If you read closely you will notice that the requirements for the
It can be transparently replaced with any other — better — data structure. |
Thanks for addressing this problem. If we can help any further, please let me know - whatever readouts, logs, temporary code compiled into the engine ... For reference YL bugtracker 3723, 6325, 6326, 5289, administration 188 |
Sfan: Why is just Z-position enough to cache? Does that speed up any of our lookups by position by enough to matter? Also, why are position updates no cost? |
I think the most likely next step when someone has working code is for you to test the performance in a real situation.
It may not be, but see the last section.
After updating an object the cost is paid during every following spatial query, that's the |
Oh, that makes perfect sense. Okay yeah I can see a way to get my idea to fit in your world relatively painlessly. Just take the int16(x) >> 4 | int16(y) >> 4 | int16(z) >> 4. That will put all entities into mapblock sized bins and easily fit in 64 bits. Should still yield nice speedups for objects in radius, raycast, and whatnot, just gotta make the algorithms for those lookups. Could be as simple as running the same algorithms at 16x16x16 size first and iterate over those relevant bins instead of the whole active object list like we currently do. Bin size should be played with, obviously. Because 1x1x1 bins wouldn't yield much value I suspect. |
i tried to take a stab at a better data structure for this. the result, in lua, is here https://github.com/fluxionary/minetest-futil/blob/main/data_structures/point_search_tree.lua. the biggest issue with it is that it can't be re-balanced when new points are added or removed. i found a paper about that but never got around to implementing it https://arxiv.org/abs/1410.5420 |
I also have a Lua implementation of k-d-trees. I think trying to rebalance k-d-trees is a dead end. I tried for a while, it doesn't seem to really be feasible to me. But it is possible to build an efficient dynamic data structure out of a forest of static k-d-trees (see "transforming static data structures to dynamic structures"), which is what I'm suggesting here and currently implementing. I can't tell whether this will hold up in real world usage, but implementing and testing it is the only way to find out. I'm optimistic that we can get decent constant factors because there are plenty of knobs to tweak. |
Thought through sfan's idea more (worked on it). There's also a remove (when object is no longer active) that is also |
Alright, I've got a seemingly working (according to a quick randomized test against a naive implementation) dynamic forest of k-d-trees data structure whipped up, feel free to take a peek at my branch. The big O for this should be good (though not quite as good as what I advertised initially; for example deletion is amortized O(log(n)²) rather than O(log n) with this implementation), what remains to be seen is how it holds up in practice (potentially after some optimization), and integration into the active object manager, which I may get to on the weekend. |
Will take a look, sfans idea was short enough that I just jumped right into server::active object mgr integration, the only not straight forward thing is how to signal object position updates. In mine I just check if last position is different from the new position after step(). I will probably not write tests for your structure directly, as MT is a much better stress test of complexity than I can imagine haha, but I can at least look for obvious things. Not quite done with my version, but perhaps ready for integration tests against servers/mods by end of week. |
Some feedback / thoughts:
|
DS:
You can't see here, but I've made a lot of progress on the hash map
solution for us. I'm only focusing on server side right now, so selection
boxes aren't a consideration. That said, everything in Minetest must have a
position within 2^16 of 0,0,0 which helps tremendously, allowing me to pack
everything into buckets of size 16x16x16 nodes. We can revisit later if we
want to make this changeable as a setting.
As for big objects, they just have a plain old single origin position,
that's all that's needed right now.
My single unordered multi map, is a single layer Octree, and with 10,000
mostly evenly spaced out entities (current benchmark), it's quite
performant, haha.
I agree this is a super dead simple solution and gets us good performance
compared to what we have (by far). Will post a PR hopefully by end of day.
Performance of the map can absolutely be optimized later, it's just very
flexible of a solution.
|
Alrighty, there's my current merge^^^ Here's the performance results: Summary for the lazy:
|
Depending on your data distribution, either a k-d-tree, gridfile or an octree can be a good choice. |
Agree those are the options. I implemented a single-layer octree using std::multimap aligned at the mapblock (16x16x16) boundaries. That PR has gone through multiple reviewers and test rounds. At this point it is technically ready for merge. However, one of the newer Core Devs, appgurueu, has been working on a dynamic k-d branch, working on it as of even last week. We could absolutely just merge mine as is and probably get roughly the same performance, but we are waiting for appgurueu to be (a) done, (b) reviewed and (c) tested. That might take a long time. I still support his efforts, which is why I have not made a stink about merging my implementation. It's simple, does get performance benefits, but this is a case where I'm willing to let great get in the way of good since I'm not a Core Dev. |
Speaking of a k-d-tree, one reason why I came to this old thread is biomes. Biome computations are currently linear in the number of biomes, but because of limitations of the current biome API, we see games try to trick the system by registering sub-biomes and variants (in particular, performing y slicing of biomes). At some point, the number of biomes may affect the performance of the biome lookup. |
Okay, the data is in (more can roll in, but unlikely to change the picture too much). BackgroundYourland is one of the many Luanti servers. They typically have 20-30 players on, with daily peaks as high as 50 players. They used to have server core latencies in the 2.5-2.6 seconds, all due to linear search over some 6000 entities, as the server has a whole battle royale style gameplay between players and hundreds of entities. Each entity, with either collisions or their behaviors, end up searching for nearby entities within radius/rectangle a TON, so this issue is to address those problems. Performance metrics on the server were enabled and logged, and both Appgurueu and I have candidate branches that were tested on production for this server. Note how we went from 2.5-2.6 second latencies down to more like 0.12-0.16 second latencies under heavy load, on average in the results section. Data DetailsWe have 16 days of data using my std::multimap solution, and 7 days of data using Appgurueu's dynamic k-d tree branch. The data collected is the luanti-reported Active Object Count, Player Count, and Core Latency over those time frames. Attached is the excel sheet containing that data, and analysis. As an example for how the data was captured. this is a query for player count that I used for Appgurueu's branch over the January 9 -> today data pull ResultsActive Object to Server Frames Per Second
Appgurueu's Branch is 2.61% faster than mine when using Active Object Count as the dependent datapoint.Player Count to Server Frames Per Second
My Branch is 5.13% faster than Appgurueu's when using Player Count as the dependent datapoint.Which to use?Theoretically, player count should have less affect than active object count, but both are dependent on what players and the server are doing. In this case, I ran the regression and player count accounts for 65% of the variance in server frames per second, and active object count only being around 55%. With that in mind, I still say we should mostly rely on the active object data above, not the player count data. And so: we have validated test results: 2-3% improvement in performance of Appgurueu's branch over mine. Conclusions
I leave Core Dev team to make this call, for once we have two solutions not zero ;)
|
I say we go with the simpler of the two. |
Note that the collision detection code also has room for improvements (it currently materializes a lot of boxes and iterates over them repeatedly), and I'd hope it doesn't remain the bottleneck forever. Generally I would assume that entities will somewhat spread out, and hence a grid+octree approach is likely to perform better. But then there are the minecraft farms that cram a lot of mobs into a tiny space. But I don't know if either of the indexes helps a lot on these cases, they collide "by design". |
It's the very next thing I (and probably Appgurueu) will be trying to optimize.
They won't unless the mod author for those entities has really optimized the entity 'view-distance'. Right now, most entity mods just do a radius check around the entity center. I could see a lot better optimizations in the future where mod authors actually only look mostly in-front of the entity: That would give them a smaller radius to check, giving better perf, but that's out of scope for the engine. |
Thanks for evaluating the data @ExeVirus, and thanks @Bastrabun for collecting it. This confirms that in this real-world scenario, there is not much of a total difference in the impact the two approaches have. (Arguably it doesn't make a definitive general statement about the relative performance of the two implementations though, since other bottlenecks will by Amdahl's law have obstructed the comparison.)
Yes, my code is reusable. It includes an implementation of a static k-d-tree (since the dynamic data structure is just a forest of those). It is templated over dimension and datatype. So it should work out of the box for this use case (well I suppose I'd have to add a nearest neighbor query, but that's trivial), though it's hard to estimate what the performance impact would be - I think we'd have to just try it and see. (Note also that for this to be easily optimizable, the "weight" feature must not be used.) For very few biomes it definitely wouldn't be worth it. Do you have a ballpark number? (Optimizing biome logic in mapgen should probably go in a different issue. Do we have profiling data e.g. in Mineclon* showing that this is a problem yet?)
Indeed, that's a (bunch of) different issue(s) we're aware of and hope to resolve soon™.
It depends a lot on the queries. No index can optimize the case where there are simply many query results. exe_virus' index is grid-based. As an extreme example, if all objects are in a single grid cell (a 16³ block), there will be no speedup. My index is based on k-d-trees, meaning it should be able to handle this fine, if the queries use small radii.
Better than what? Just a grid? Or a forest of static k-d-trees? Especially for the latter the answer isn't obvious to me (but it isn't obvious for the former either).
If all other things were equal, I would agree with this. However they aren't. I would like to reiterate the flexibility of my solution. Not only can it possibly be reused in completely different areas in the future (as e.g. kno10 is suggesting), but it also doesn't really make as strong, slightly "arbitrary" assumptions about how entities are distributed. What exe has said is correct: One property that's easy to see is that my solution is scaling invariant. If you scale everything (entity positions and queries) up or down by a factor of 1000, my solution retains its performance characteristics. A grid-based / bucketing solution with fixed buckets can not provide this 1. Since we want to be a general game engine, I think a more flexible solution is the way to go. If you want a more real-world-ish example of what I have in mind here, think about something like Conquer. I would also like to point out that the complexity difference is, in my opinion, not that terrible, and both are well-tested solutions which I don't expect to require a lot of maintenance. We're looking at something like ~500 loc for my implementation versus ~250 loc for exe's implementation (.h + .cpp). Footnotes
|
The only thing I would like to add, is whichever solution we choose, we'll want to make sure its ready for further queries (raycast, etc.) Specifically, over in #15227, you can see where having a spatial index of collision boxes (map nodes and entities alike) will likely end up being a part of the collision performance solution. There I mention:
Probably more kinds of queries. So: @appgurueu, is k-d trees a good flexible solution for all these different types of queries? If so, then I would also put my recommendation behind k-d trees, as my map will likely have suboptimal, but okay-ish solutions to all those queries (similar to what is happening here). And, if we add all those queries, there's no way that spatial index doesn't become complex. Just no way around it. |
I would argue that adding queries to a k-d-tree-based solution is in principle not much harder than to add them to a grid-based solution, for the simple reason that a k-d-subtree corresponds to a cuboid region. Hence, in general, if you've got some query that you can implement for cuboids, you can relatively easily apply it to k-d-trees recursively. As for the specific query types mentioned: Boxes (aka "cuboids" or "ranges") is what we already implement. Spheres aren't very important: Spheroid vs cuboid is only a very modest constant factor difference (which likely diminishes further with more complex logic needed for the spheroid). Line is something we don't need: This is a point index, hence only volume queries are really useful. A raycast isn't a line query, it has to be a "moving box" query because we need to account for the selectionboxes of the objects. I agree that such queries are important, because a search volume that grows quadratically or even cubically is much worse than one which grows linearly (as pointed out in the issue). I think it should be possible to accomodate these queries with k-d-trees as well, though in all fairness the code for grids is already essentially there (raycast code, classes like
I don't think it matters much for the overall discussion, but I don't think so: Once we have cuboids and parallelepipeds, we should pretty much be all set on the essence of the spatial indexing front. |
There's a closely related older issue (#6453) but I felt like a fresh start here was better.
Problem
tl;dr: optimizing anything else server-side is basically a waste until this is solved, skip to the next section.
The nice people from the Your Land server sent me several profiling traces of their production server and a common theme is that about 40% of time (on the server thread) is spent just iterating the entity list (average, not peak!).
Now their mods use
get_objects_inside_radius
a bunch, but this is not a sole cause and I expect any sizeable server will be plagued with this exact issue.The issue are spatial queries on the entity list, which happen more often than one might think:
get_objects_inside_radius
,get_objects_in_area
ServerEnvironment::getAddedActiveObjects
All of these are
O(n)
with n being the number of entities. When you consider that every entity does collision detection this becomesO(n²)
right in the main thread.Solution requirements
What we need is a new data structure.
It must:
getObjectsInArea
) on the entity positionexternal circumstances:
getObjectsInsideRadius
) can be emulated on top of AABB queries and are not an important pointFinally we will also need to figure out how to reliably propagate position updates up to the container.
There are many ideas in the form of "just add a cache to <thing>" that come to mind, but keeping it in sync is generally the trickier part than designing the structure.
Real-world example data
(thanks to @Bastrabun)
There are 4500 entities. About 1400 move often, 3100 are static.
The text was updated successfully, but these errors were encountered: