Skip to content

Opportunistically use dense iteration for archetypal iteration #14049

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Aug 2, 2024

Conversation

re0312
Copy link
Contributor

@re0312 re0312 commented Jun 27, 2024

Objective

Solution

  • use dense iteration when an archetype and its table have the same entity count.
  • to avoid introducing complicate unsafe noise, this pr only implement for for_each style iteration.
  • added a benchmark to test performance for hybrid iteration.

Performance

image

nearly 2x win in specific scenarios, and no performance degradation in other test cases.

@re0312 re0312 marked this pull request as ready for review June 27, 2024 13:26
@alice-i-cecile alice-i-cecile added C-Performance A change motivated by improving speed, memory usage or compile times M-Needs-Release-Note Work that should be called out in the blog due to impact A-ECS Entities, components, systems, and events labels Jun 27, 2024
@alice-i-cecile alice-i-cecile requested review from james7132 and cart June 27, 2024 13:48
@alice-i-cecile alice-i-cecile added the S-Needs-Review Needs reviewer attention (from anyone!) to move forward label Jun 27, 2024
@alice-i-cecile alice-i-cecile added this to the 0.15 milestone Jun 27, 2024
@alice-i-cecile alice-i-cecile added D-Complex Quite challenging from either a design or technical perspective. Ask for help! D-Unsafe Touches with unsafe code in some way X-Uncontroversial This work is generally agreed upon labels Jun 27, 2024
@re0312
Copy link
Contributor Author

re0312 commented Jun 27, 2024

this pr also unblocks some compiler auto SIMD ,consider following code

pub fn new() -> Self {
        let mut world = World::new();

        for _ in 0..10000 {
            world.spawn((TableData(0.0), SparseData(0.0))).id();
        }

        let query = world.query_filtered::<&mut TableData, With<SparseData>>();
        Self(world, query)
    }

    #[inline(never)]
    pub fn run(&mut self) {
        self.1
            .iter_mut(&mut self.0)
            .for_each(|mut v1| v1.0 += 1.)
}

image

accum =
// SAFETY: Matched table IDs are guaranteed to still exist.
let table = unsafe { self.tables.get(archetype.table_id()).debug_checked_unwrap() };
if table.entity_count() == archetype.len() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be worthwhile to add a comment explaining that this an optimization for hybrid queries where the archetype is actually dense

@alice-i-cecile
Copy link
Member

alice-i-cecile commented Jun 27, 2024

Does this fully fix #2144? I suppose not, since it only optimizes for_each.

@re0312
Copy link
Contributor Author

re0312 commented Jun 27, 2024

Does this fully fix #2144? I suppose not, since it only optimizes for_each.

yeah , Optimizing for-style iteration is much more complicated because it could significantly degrade performance due to introducing branch in the hot path.

@alice-i-cecile
Copy link
Member

Sounds good. Let's leave that to follow-up then.

v.push(world.spawn(TableData(0.)).id());
}

v.shuffle(&mut deterministic_rand());
Copy link
Contributor

@cBournhonesque cBournhonesque Jun 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

v.shuffle(&mut deterministic_rand());

I think this should have a comment to explain why you're doing it.
If I understand correctly you have archetypes
A: [TableData, SparseData]
B: [TableData]

Entities are added in this order:
e0: A
e1: B
e2: A
e3: B
...

So that the order of entities is
Table: [e0, e1, e2, ...]
A: [e0, e2, ..]
B: [e1, e3, ..]

Then if you just despawned all the entities of B in order, you would remove e1, then e3, etc.
Each despawn call would do a swap_remove in both the archetype and the table.
So the order of entities in the table would be somewhat jumbled compared with the order in A, which is still [e0, e2, ...]

By adding the shuffling, you want to guarantee that their orders are very different; which maximimes the benefits of your PR (which keeps cache-locality by iterating through the table order, instead of the archetype order

@cBournhonesque
Copy link
Contributor

cBournhonesque commented Jun 27, 2024

Could you please explain why this code is faster?

        for _ in 0..10000 {
            world.spawn((TableData(0.0), SparseData(0.0))).id();
        }

        let query = world.query_filtered::<&mut TableData, With<SparseData>>();
        Self(world, query)

In this case the entities should be iterated in the same order before/after your change, no?

@re0312
Copy link
Contributor Author

re0312 commented Jun 28, 2024

In this case the entities should be iterated in the same order before/after your change, no?

it would theoretically have the same cache locality. however, dense iteration could imply to the compiler that we are iterating over continuous memory which could enable automatic SIMD optimizations , This could potentially make the operation nearly 4 times faster.

@alice-i-cecile alice-i-cecile changed the title Opprotunistically use dense iteration for archetypal iteration. Opportunistically use dense iteration for archetypal iteration Jul 15, 2024
Copy link
Contributor

@ItsDoot ItsDoot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not an ECS expert, but the code quality overall looks fine.

@re0312 re0312 mentioned this pull request Jul 25, 2024
Copy link
Member

@cart cart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me! Nice work.

@cart cart added this pull request to the merge queue Aug 2, 2024
Merged via the queue into bevyengine:main with commit 8235daa Aug 2, 2024
27 checks passed
github-merge-queue bot pushed a commit that referenced this pull request Sep 3, 2024
…#14673)

# Objective

- follow of #14049 ,we could use it on our Parallel Iterator,this pr
also unified the used function in both regular iter and parallel
iterations.


## Performance 


![image](https://github.com/user-attachments/assets/cba700bc-169c-4b58-b504-823bdca8ec05)

no performance regression for regular itertaion

3.5X faster in hybrid parallel iteraion,this number is far greater than
the benefits obtained in regular iteration(~1.81) because mutable
iterations on continuous memory can effectively reduce the cost of
mataining core cache coherence
@alice-i-cecile
Copy link
Member

Thank you to everyone involved with the authoring or reviewing of this PR! This work is relatively important and needs release notes! Head over to bevyengine/bevy-website#1667 if you'd like to help out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-ECS Entities, components, systems, and events C-Performance A change motivated by improving speed, memory usage or compile times D-Complex Quite challenging from either a design or technical perspective. Ask for help! D-Unsafe Touches with unsafe code in some way M-Needs-Release-Note Work that should be called out in the blog due to impact S-Needs-Review Needs reviewer attention (from anyone!) to move forward X-Uncontroversial This work is generally agreed upon
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Opprotunistically use dense iteration when an archetype and its table are 1:1
5 participants