-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Node - object collision performance is abysmal for very fast entities #15227
Comments
Beware that objects may also have an acceleration (e.g., bullet drop). In such cases, the ray isn't a simple line. It would be very nice to have some raycasting function for such curves, though: for example so that a mob can check for possible collisions before attempting a jump. |
Thinking more about #15236 (parabolic movements with acceleration) and ignoring the need to reduce or limit dtime because of water #15219 and similar gravity changes: Rough outline:
This should have only cost O(speed · object-size²), so for fast entities such as a bullet it will be linear in the speed. As it only accesses entered nodes, it should access the minimal amount of nodes possible. I am not sure how to handle nodes the object already is overlapping - and that may have changed to be walkable. Part of Bresenham's is usually an optimization for integers, I am not sure if this is easily applicable here because objects do not have integer sizes. But it may well be, as in line drawing it is common to support different line widths. See also: https://zingl.github.io/bresenham.html http://www.zoo.co.uk/murphy/thickline/ |
Looking at the code again, I don't think it's as horrible as you make it out to be: If I have pure X+ velocity ---->>> Then my search volume is my bounding box plus a really large x. I.e. a relatively skinny cuboid. If I have a even X=1000,y=1000,z=1000 velocity, then I search from my min.x,y,z to max.x,y,z in that direction. I think that would be cubic, then, in the worst case. So yes very bad, but not 100% evil. I would say a more optimal solution is:
There's obviously more low hanging fruit to be had here - For example, perhaps we should not use speed-based axisAlignedBounding box checks, and instead discretize movement, especially when our speeds are slow. That said, we're single core right now. Assuming we don't want to break out into parallel threading, we should consider how fast we could actually get. We're talking 3.5 Ghz processor, with Vector (SSE+) operations maximum. With good broadphase culling, we could expect to have to still check maybe 20-50 boxes for any given SAO. Assuming the broadphase checks take say 20 instructions, and each box takes another 10 instructions, we're talking about ~500 instructions per entity, per frame. If we target 20 frames per second, we are then limited to a true maximum of ~350,000 SAOs. Assuming absolutely nothing else happening on the main thread. I think realistically, we're probably going to hit more like 3/4 the time being available for collisions/physics, especially if we allow more complex physics interactions, we might only reasonably expect to hit maybe 50,000 SAOs. Of course, today it looks like we're about at 2,000 low-speed SAOs for single core based on YourLands Server data, so we have some room for improvement. Personally, I'd be happy if we get to a point of:
|
Yes, a single axis is the best case in which this is still linear. Two axes (think maybe a fast-moving object on a plane) is still quadratic and in many cases pretty okay.
Precisely, and that is awful. This example (more extreme than my example even) would require you to visit a billion nodes to do the collisions for a single entity; as I've described, we iterate unloaded mapblocks full of ignore nodes. And really, that's what happens, and it's unacceptable and kills performance. If you shot fast-moving bullet entities into the air, that would be a big problem. These may seem like somewhat extreme cases, but they do happen. (I initially looked into this because someone wanted to cap allocations for the collision box vector at something like 16 MiB: #15208. Currently we may collect enough boxes that we may OOM - for a single collision calculation!) The good news is that fixing just the "skip entire unloaded mapblocks" problem can fix a good bit of the issue, because the amount of loaded mapblocks is much more manageable. |
I'm thinking through the problem more, and think a spatially indexed map of
all collision boxes will end up providing a lot of the surface area for us
to optimize against. We know nodes are restricted to 3x3x3 collision boxes
in practice, and supporting a lot of varied queries against this map would
be ideal. We'd want line, parabola, moving box, static box, sphere, etc.
queries against it.
Another idea is we would want a bool for collides with map, just like we do
for entities. I can think of some cases where I want a bullet to only hit
entities, and not the map at all.
We should probably scale what collision heuristics are used based on speed.
Slow? Don't use speed, just the end position. Medium, use speed, and maybe
box based heuristics. High? Use parabola and line based hueristics
|
Got some time with an LLM to think through a possible spatial index map: I think a very similar solution to what we're doing with SAO locations can be done for collision boxes. Collision boxes would be stored in a unordered map with a unique key based on location (2^16,2^16,2^16) and number entry in the node collision box array (i.e. 2^16 max allowed collision boxes for a single definition). Then, we build a spatial multi-map, with a single bucket for each location, and get all boxes that overlap this grid location. When we query, we keep track of what boxes we have already processed (due to overlap) in an unordered_set. Then, queries are simply traversing this map efficiently. Which: we wouldn't collect all the boxes first. Instead we would walk them, and track the closest hit. For most cases, this will allow us to early out in the case of a collision, however there is overhead in grabbing the set of boxes at each grid location. Probable questions:
|
Thought: To build a collision system that works for both high-speed and high-precision you'd want to essentially put a LOD on top of collision boxes: You could call it a poor man's spatial indexing, but it should also be simpler to implement. |
Very bad asymptotics
Code in question:
https://github.com/minetest/minetest/blob/3797ca52c4559da21623a39615c4e4d84d845ea9/src/collision.cpp#L231-L233
This is a simple loop over the cuboid from min to max relevant position.
This is incredibly wasteful. It scales cubically with the speed of an object. For example, suppose an object has a speed of 1000 nodes per second on each axis. This is well below our current speed limit. A modder might want such extremely fast entities to e.g. simulate bullets. This means that this entity would typically travel 100 nodes per server step in each direction. So in each server step, we have to check a 100³ cuboid - a million nodes! - just to compute collisions for this single entity. It is no surprise that this hangs servers with extreme speeds e.g. as in #15027 (though there are more issues in that one).
We can achieve essentially linear time using a "raycast" approach (at least if we optimistically assume that acceleration is insignificant): Determine the principal component of the velocity. Move along this principal component in steps of 1 node. For each step, consider all nodes in proximity to the ray (by Manhattan distance) such that entity + node collision box may touch; for our collision box limitations this will be something like a 10² square "slice" at worst, usually probably more like 6² (off the top of my head).
(Of course, for objects that don't move that much we will probably want to keep the cuboid approach for constant factor reasons.)
Ignore nodes
Very fast objects are likely to hit ignore nodes soon. We handle that unnecessarily inefficiently. Ignore nodes are typically entire ignore mapblocks. We could be three orders of magnitude faster if we just did collisions directly against the bounding boxes of the mapblocks rather than the 16³ contained ignore nodes. Related: #15208, which tries to circumvent the underlying issue in a hacky manner instead.
Speed limit
We currently have a speed limit of +-5000 nodes per second on each component. I would consider reducing that to 1000 total. I also think we shouldn't be clamping components, which can majorly change direction, but rather scaling such that we arrive at the wanted maximum length. This makes much more sense physically if we want to think of this as some sort of "terminal velocity".
Additional context
I am considering working on this myself, but it isn't trivial.
The text was updated successfully, but these errors were encountered: