Blasphemous: Increase logic performance #4585

Mysteryem · 2025-01-31T03:32:31Z

Increases the performance of Blasphemous's logic.

Reduces --skip_output --seed 1 generation with 10 template Blasphemous yamls, with progression balancing disabled, from 19.8s to 14.9s (-24.6% duration) while producing the same output as before. 5 trials were ran and averaged to produce these numbers.

A similar percentage reduction was observed when the template yamls were replaced with yamls with almost all random options (progression balancing still disabled).

Replace `any()`/`all()` in `load_rule` with `for` loops

Each use of any(generator)/all(generator) would have to

Create the new generator object
Call any()/all() on the generator
Iterate that generator from within the any()/all() call
Iterate the list from within the generator
Call the iterated function from within the generator and return its result to the any()/all()
Determine if the any()/all() can now return or needs to continue iterating

Using for loops removes a lot of the overhead, effectively reducing the code to only steps 4. and 5. and 6..

On its own, this change resulted in a surprisingly big reduction in generation time, of around 12.8%. Some of the other changes described later reduce the number of rules that would have needed to be combined together with any()/all(), so the overall effect of this change is probably slightly lower when accounting for those other changes.

Pre-calculate rules

It is possible to pre-calculate many rules based on options and set them on the BlasRules object.

This removes the time that would be spent calculating a result from many rules.

There are a few rules that will always return True/False for some options, but have to check the CollectionState for other options. Some of these rules are called by other rules, so it would also be possible to pre-calculate parts of the other rules, but it starts to get messier the more rules you try to unroll like this, so some of these other rules have not been modified. One example would be can_dawn_jump(), which always returns False with world.options.difficulty < 1, but is used in other rules, such as can_cross_gap_1().

Boss strengths have also been pre-calculated depending on the value of the difficulty option. This additionally avoids re-creating the boss_strengths dictionary on every call to has_boss_strength()

Removal of rules that are always True or always False

By using specific function instances to represent rules that are always True or always False for a player's chosen options, the load_rule function can check if a 'string rule' is either function and then adjust the requirements it is combining together.

A rule that is always True can be skipped in requirements.
A rule that is always False makes the entire list of requirements impossible, so the clause containing those requirements can be skipped.

This reduces the number of separate rule functions that get chained together, making the final rules for each location/entrance faster.

Don't add indirect conditions for rules that are always `True` or always `False`

Adding on to the previous point, if a rule ends up being always True or always False for a player's chosen options, adding the corresponding indirect conditions for that rule can be skipped.

This makes it faster to update reachable regions during generation.

Replace `count_unique_from_group()` with large groups with custom `collect()`/`remove()`

Each item in the group has to be checked individually. Initially I replaced this with has_unique_from_group because it could return earlier than counting all the unique items, but custom world.collect/world.remove to track the count of unique items from each group collected into/removed from the state was faster.

For simplicity, all uses of count_unique_from_group() were updated, but maybe the smaller groups might be slightly better using has_unique_from_group(count).

Return early in most 'doors' rules

Most 'doors' rules were always checking every group of regions, but when only a few groups of regions needed to be accessible for a rule to return True. These rules now return True as soon as the required count of accessible regions is met.

miriam_rooms was also only called requiring the maximum number of 'doors' (regions) to be accessible. It has been replaced with all_miriam_rooms that returns False as soon as it finds an inaccessible region.

Low hanging fruit

These changes have little to no noticeable effect on performance, but are, or should be slightly faster.

Replace lists in rules with tuples

Tuples are, in general, slightly faster to create and more memory efficient. Literal tuples (tuples containing only literals or other literal tuples) are much faster to create.

There was also one function that was creating a set that has also been replaced with a tuple.

Use state.has in `ranged()`

This function was returning state.count(...) and the rules using the function would compare ranged(state) > 0. ranged() now returns state.has(...) and the rules using ranged() have been updated.

Use `for` loops instead of `sum(generator)` in rules

Using a for loop instead was about twice as fast. It avoids the overhead of creating a generator and iterating that generator, which in turn iterates the tuple/list.

Calculate player strength only once in `amanecida_rooms()`

This function called 4 rule functions that all had to calculate player strength. Player strength is now calculated once and the result is passed to the 4 functions.

Things to think about

There are a lot of rules that check for access to regions. It might be beneficial to replace some of these with events, but this is not clear from a performance standpoint.

It sounds like events might be useful in cases where either:

There are multiple possible regions where any one being accessible allows a rule to return True.
By placing the same event item in each of these regions, checking for the state having at least one of that item means only a single check is required, compared to checking multiple regions until at least one is accessible.
There is a need to know how many of a number of regions are accessible.
By placing the same event item in each of these regions, the count can be retrieved from the state and compared against instead of counting accessible regions until it is determined that enough are accessible for a rule to return True, or there are no more regions to check.

Some of the 'doors' rules increase a count for access to either Region A OR Region B, but having both accessible won't increase the count twice, so this could make using events more complicated.

Other investigations

Plenty of rules access self.player or state.can_reach_region multiple times. I tried assigning these to a variable and using that variable to reduce attribute lookups, but this made no noticeable difference.
If the rule for an entrance, returned by load_rule, is impossible (load_rule returns Rules._never) it would be beneficial to not add that entrance in the first place, but it appears that this never happens currently.
Similarly to the previous point, it appears that no location rules returned by load_rule ever end up impossible. A location with an impossible rule should not happen anyway, because Archipelago expects all locations to be reachable with all items collected.

How was this tested?

I have the end of the generator modified locally to convert every location in the multiworld and the item placed at each location into a string. This string is written to a file if that string does not already exist on a line in that file. If no new line is written, then that means the generation was identical to before, and this result is printed to the console.

Using this, I created a Blasphemous yaml with almost entirely random options, duplicated it 10 times and then generated the same seed before and after this PR. I observed the same results produced before and after.

Reduced --no_output generation duration of 10 template yamls with no progression balancing by around 3.6% (19.5s -> 18.8s)

Every use of any/all was creating a new generator every time that lambda was called, and then had to iterate that generator, which had to iterate a list of functions. Reduced --no_output generation duration of 10 template yamls with no progression balancing by around 12.4% (18.8s -> 16.5s)

- Earlier return once a single door is accessible - Faster iteration: tuple vs set, and for loop vs generator - Faster container creation: literal tuple vs set

Tuples are slightly faster to create than lists, but literal tuples, tuples containing only literals, are much faster to create. Tuples are also the tiniest bit faster to iterate than lists, but this difference is almost always negligible.

The required player strengths for bosses to logically be beatable are now pre-calculated for the player's chosen difficulty option and stored on the BlasRules instance.

Barely any performance to gain from this. The for loops seem to be about 2x as fast as `sum(generator)` when there is nothing to add, and about 1.2x as fast when every iterated element results in addition. This was profiled using `guilt_rooms` and its 7-length `doors` tuple. `miriam_rooms` with its 5-length tuple is expected to favour the for loops even more.

The 'has' rules, in the case of counting unique from groups stop iterating once the desired count is reached, whereas a 'count' rule always has to iterate the full group. The CollectionState code for 'count_group_unique' also has the overhead of creating and iterating a generator currently, whereas, `has_group_unique` is using a for loop.

Earlier return and no need to count.

This saves another 2% or so.

These did not need changing and should not affect performance, so reverting these changes simplifies the PR

Mysteryem · 2025-01-31T03:43:09Z

@TRPG0

I don't know the game, so maybe some of these changes are not desirable if there are some future plans for the world that would work better without some of these changes.

Blasphemous has a lot of regions and indirect conditions, so it is heavily affected by both #4582 and #4583, both of which would make Blasphemous generate even faster. Blasphemous was the reason I looked into those issues in Core.

With both of those Core PRs and this PR together, the same 10x template Blasphemous generation reduces to about 10.8s for me (so a further 4.1s than just this PR), almost halving the generation compared to generating without any of the PRs.

Mysteryem added 20 commits January 29, 2025 03:47

Use .value on options in rules

1066505

Reduced --no_output generation duration of 10 template yamls with no progression balancing by around 3.6% (19.5s -> 18.8s)

Pre-calculate more rules

ff5f1ae

Skip _never and _always rules when possible

f357d3a

Use for loop and tuple in flasks()

27a3ddc

- Earlier return once a single door is accessible - Faster iteration: tuple vs set, and for loop vs generator - Faster container creation: literal tuple vs set

Use literal tuples in rules instead of lists

5f99f11

Tuples are slightly faster to create than lists, but literal tuples, tuples containing only literals, are much faster to create. Tuples are also the tiniest bit faster to iterate than lists, but this difference is almost always negligible.

Move bosses dict and option checks outside has_boss_strength

5bb6c55

The required player strengths for bosses to logically be beatable are now pre-calculated for the player's chosen difficulty option and stored on the BlasRules instance.

Use state.has in ranged() to avoid an extra function call

7433aa8

Return earlier in most doors functions

490ee30

Replace miriam_rooms with all_miriam_rooms

2e1af64

Earlier return and no need to count.

Calculate player strength only once in amanecida_rooms()

3df3193

Replace set with literal tuple

e18a9e2

Replace has_group_unique with custom collect()/remove() and has

182af90

This saves another 2% or so.

Fix missed count->has and group->has conversion for limestones()

f75a6b5

Try to make redento_rooms more readable

b583b02

Don't register indirect conditions for impossible rules

7854f4e

Add some more comments

8ba9db5

Revert state.count -> state.has changes for non-groups

2a5aae7

These did not need changing and should not affect performance, so reverting these changes simplifies the PR

github-actions bot added the waiting-on: peer-review Issue/PR has not been reviewed by enough people yet. label Jan 31, 2025

Exempt-Medic requested a review from TRPG0 January 31, 2025 04:56

Exempt-Medic added is: enhancement Issues requesting new features or pull requests implementing new features. waiting-on: world-maintainer Issue/PR is waiting for feedback or approval by the maintainer of a world. labels Jan 31, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Blasphemous: Increase logic performance #4585

Blasphemous: Increase logic performance #4585

Mysteryem commented Jan 31, 2025 •

edited

Loading

Mysteryem commented Jan 31, 2025

Blasphemous: Increase logic performance #4585

Are you sure you want to change the base?

Blasphemous: Increase logic performance #4585

Conversation

Mysteryem commented Jan 31, 2025 • edited Loading

Replace any()/all() in load_rule with for loops

Pre-calculate rules

Removal of rules that are always True or always False

Don't add indirect conditions for rules that are always True or always False

Replace count_unique_from_group() with large groups with custom collect()/remove()

Return early in most 'doors' rules

Low hanging fruit

Replace lists in rules with tuples

Use state.has in ranged()

Use for loops instead of sum(generator) in rules

Calculate player strength only once in amanecida_rooms()

Things to think about

Other investigations

How was this tested?

Mysteryem commented Jan 31, 2025

Mysteryem commented Jan 31, 2025 •

edited

Loading

Replace `any()`/`all()` in `load_rule` with `for` loops

Don't add indirect conditions for rules that are always `True` or always `False`

Replace `count_unique_from_group()` with large groups with custom `collect()`/`remove()`

Use state.has in `ranged()`

Use `for` loops instead of `sum(generator)` in rules

Calculate player strength only once in `amanecida_rooms()`