Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node - object collision performance is abysmal for very fast entities #15227

Open
appgurueu opened this issue Oct 2, 2024 · 7 comments
Open
Labels
Bug Issues that were confirmed to be a bug Performance @ Server / Client / Env.

Comments

@appgurueu
Copy link
Contributor

Very bad asymptotics

Code in question:

https://github.com/minetest/minetest/blob/3797ca52c4559da21623a39615c4e4d84d845ea9/src/collision.cpp#L231-L233

This is a simple loop over the cuboid from min to max relevant position.

This is incredibly wasteful. It scales cubically with the speed of an object. For example, suppose an object has a speed of 1000 nodes per second on each axis. This is well below our current speed limit. A modder might want such extremely fast entities to e.g. simulate bullets. This means that this entity would typically travel 100 nodes per server step in each direction. So in each server step, we have to check a 100³ cuboid - a million nodes! - just to compute collisions for this single entity. It is no surprise that this hangs servers with extreme speeds e.g. as in #15027 (though there are more issues in that one).

We can achieve essentially linear time using a "raycast" approach (at least if we optimistically assume that acceleration is insignificant): Determine the principal component of the velocity. Move along this principal component in steps of 1 node. For each step, consider all nodes in proximity to the ray (by Manhattan distance) such that entity + node collision box may touch; for our collision box limitations this will be something like a 10² square "slice" at worst, usually probably more like 6² (off the top of my head).

(Of course, for objects that don't move that much we will probably want to keep the cuboid approach for constant factor reasons.)

Ignore nodes

Very fast objects are likely to hit ignore nodes soon. We handle that unnecessarily inefficiently. Ignore nodes are typically entire ignore mapblocks. We could be three orders of magnitude faster if we just did collisions directly against the bounding boxes of the mapblocks rather than the 16³ contained ignore nodes. Related: #15208, which tries to circumvent the underlying issue in a hacky manner instead.

Speed limit

We currently have a speed limit of +-5000 nodes per second on each component. I would consider reducing that to 1000 total. I also think we shouldn't be clamping components, which can majorly change direction, but rather scaling such that we arrive at the wanted maximum length. This makes much more sense physically if we want to think of this as some sort of "terminal velocity".

Additional context

I am considering working on this myself, but it isn't trivial.

@appgurueu appgurueu added Bug Issues that were confirmed to be a bug Performance labels Oct 2, 2024
@kno10
Copy link
Contributor

kno10 commented Oct 3, 2024

Beware that objects may also have an acceleration (e.g., bullet drop). In such cases, the ray isn't a simple line.
This matters more for slower entities, e.g., when jumping the curve of interest is roughly a parabola.

It would be very nice to have some raycasting function for such curves, though: for example so that a mob can check for possible collisions before attempting a jump.

@kno10
Copy link
Contributor

kno10 commented Oct 4, 2024

Thinking more about #15236 (parabolic movements with acceleration) and ignoring the need to reduce or limit dtime because of water #15219 and similar gravity changes:
The current collision code can likely be improved by instead of (possibly repeatedly!) computing the next collision with every walkable node in a cube, performing essentially a 3d parabolic version of Bresenham's line algorithm. In essence, do a parabolic ray cast for the desired dtime; you may also want to think of this as an algorithm to rasterize the movement parabola (with the line width given by the object size).

Rough outline:

  1. compute for each axis the next dtime the integer node coordinate changes (using the parabolic equation!)
  2. compute possible collisions with (possibly moving) objects
  3. repeat until the next possible collision is after the dtime step length:
    1. test the newly entered nodes on the changed axis only
    2. update the next dtime for the axis
    3. if collided, also update possible collisions with other (possibly moving) objects

This should have only cost O(speed · object-size²), so for fast entities such as a bullet it will be linear in the speed. As it only accesses entered nodes, it should access the minimal amount of nodes possible.

I am not sure how to handle nodes the object already is overlapping - and that may have changed to be walkable.

Part of Bresenham's is usually an optimization for integers, I am not sure if this is easily applicable here because objects do not have integer sizes. But it may well be, as in line drawing it is common to support different line widths.

See also: https://zingl.github.io/bresenham.html http://www.zoo.co.uk/murphy/thickline/

@ExeVirus
Copy link
Contributor

ExeVirus commented Dec 31, 2024

Looking at the code again, I don't think it's as horrible as you make it out to be:

If I have pure X+ velocity ---->>>

Then my search volume is my bounding box plus a really large x. I.e. a relatively skinny cuboid.

If I have a even X=1000,y=1000,z=1000 velocity, then I search from my min.x,y,z to max.x,y,z in that direction.

I think that would be cubic, then, in the worst case.

So yes very bad, but not 100% evil.

I would say a more optimal solution is:

  1. Rather than building a big cinfo[] vector of all relevant boxes, we just remember the closest
  2. Rather than always looping Y,X,Z or whatever we do for cache coherency, we instead search along the movement vector, skipping redundant checks. On first hit, stop and go looking through entities. For now, since we know collision boxes can't be bigger than 3x3x3 for nodes, we only need to look past that nearest collision +- 1.5 float.
  3. Possibly consider a larger bounding box for nodeboxes and static mesh entities, so that we can do a broadphase check against just that one box, instead of all the children boxes.

There's obviously more low hanging fruit to be had here - For example, perhaps we should not use speed-based axisAlignedBounding box checks, and instead discretize movement, especially when our speeds are slow.


That said, we're single core right now. Assuming we don't want to break out into parallel threading, we should consider how fast we could actually get. We're talking 3.5 Ghz processor, with Vector (SSE+) operations maximum. With good broadphase culling, we could expect to have to still check maybe 20-50 boxes for any given SAO. Assuming the broadphase checks take say 20 instructions, and each box takes another 10 instructions, we're talking about ~500 instructions per entity, per frame. If we target 20 frames per second, we are then limited to a true maximum of ~350,000 SAOs. Assuming absolutely nothing else happening on the main thread.

I think realistically, we're probably going to hit more like 3/4 the time being available for collisions/physics, especially if we allow more complex physics interactions, we might only reasonably expect to hit maybe 50,000 SAOs. Of course, today it looks like we're about at 2,000 low-speed SAOs for single core based on YourLands Server data, so we have some room for improvement.

Personally, I'd be happy if we get to a point of:

  1. Support for 10,000 higher speed SAO's on a single core (for the current simplified C++ physics)
  2. Less restrictions on collision boxes, possibly axis-unaligned for SAOs
  3. More flexible (lua-exposed) physics handling (specifically an optional collision callback, optional anti-cheat callback, and optional entityStep callback) - this would be expected to reduce number of SAOs supported to more like 3-5K again.

@appgurueu
Copy link
Contributor Author

appgurueu commented Dec 31, 2024

Looking at the code again, I don't think it's as horrible as you make it out to be

If I have pure X+ velocity ---->>>

Then my search volume is my bounding box plus a really large x. I.e. a relatively skinny cuboid.

Yes, a single axis is the best case in which this is still linear. Two axes (think maybe a fast-moving object on a plane) is still quadratic and in many cases pretty okay.

If I have a even X=1000,y=1000,z=1000 velocity, then I search from my min.x,y,z to max.x,y,z in that direction.

I think that would be cubic, then, in the worst case.

Precisely, and that is awful. This example (more extreme than my example even) would require you to visit a billion nodes to do the collisions for a single entity; as I've described, we iterate unloaded mapblocks full of ignore nodes. And really, that's what happens, and it's unacceptable and kills performance. If you shot fast-moving bullet entities into the air, that would be a big problem. These may seem like somewhat extreme cases, but they do happen. (I initially looked into this because someone wanted to cap allocations for the collision box vector at something like 16 MiB: #15208. Currently we may collect enough boxes that we may OOM - for a single collision calculation!)

The good news is that fixing just the "skip entire unloaded mapblocks" problem can fix a good bit of the issue, because the amount of loaded mapblocks is much more manageable.

@ExeVirus
Copy link
Contributor

ExeVirus commented Dec 31, 2024 via email

@ExeVirus
Copy link
Contributor

ExeVirus commented Jan 1, 2025

Got some time with an LLM to think through a possible spatial index map:


I think a very similar solution to what we're doing with SAO locations can be done for collision boxes. Collision boxes would be stored in a unordered map with a unique key based on location (2^16,2^16,2^16) and number entry in the node collision box array (i.e. 2^16 max allowed collision boxes for a single definition).

Then, we build a spatial multi-map, with a single bucket for each location, and get all boxes that overlap this grid location.

When we query, we keep track of what boxes we have already processed (due to overlap) in an unordered_set.

Then, queries are simply traversing this map efficiently. Which: we wouldn't collect all the boxes first. Instead we would walk them, and track the closest hit.

For most cases, this will allow us to early out in the case of a collision, however there is overhead in grabbing the set of boxes at each grid location.


Probable questions:

  1. Why not just use nodes, they are aligned at the locations already?

    Overlap. What if we want to increase node collision box limitations past 3x3x3?

  2. Seems like this might be slower in normal cases?

    Yes, having the bins at every node point might be overkill, we should consider larger 2x2 bins or similar.

  3. What about unloaded mapblocks?

    Treat them as collision boxes, possibly post query

  4. What about empty mapblocks?

    The bins will be empty, minor loss

  5. Could we be more efficient?

    Using slightly larger bins of 2x2x2 seems reasonable, also we could use our current mapblock simplification logic we use client side to change blocks into a single mesh, to change them into a single collision box. I.e. flat ground becomes one large box, rather than all the individual node boxes. Issue: when we collide with this big box, we need to go back and figure out exactly what node we hit (just ask the node manager again using the hit location).

@sfan5
Copy link
Collaborator

sfan5 commented Jan 26, 2025

Thought: To build a collision system that works for both high-speed and high-precision you'd want to essentially put a LOD on top of collision boxes:
For example caching a bitmap of "which node is walkable" for each block is very reasonable.
When you need to perform a high-speed collision you first check this bitmap, this tells you if there is a collision and also where it roughly is.
If yes you then go and perform the normal precise collision, but only on the relevant area.
This could be done with more coarse levels on top but I think we'll never need this in Luanti.

You could call it a poor man's spatial indexing, but it should also be simpler to implement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Issues that were confirmed to be a bug Performance @ Server / Client / Env.
Projects
None yet
Development

No branches or pull requests

5 participants