Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blasphemous: Increase logic performance #4585

Open
wants to merge 20 commits into
base: main
Choose a base branch
from

Conversation

Mysteryem
Copy link
Contributor

@Mysteryem Mysteryem commented Jan 31, 2025

Increases the performance of Blasphemous's logic.

Reduces --skip_output --seed 1 generation with 10 template Blasphemous yamls, with progression balancing disabled, from 19.8s to 14.9s (-24.6% duration) while producing the same output as before. 5 trials were ran and averaged to produce these numbers.

A similar percentage reduction was observed when the template yamls were replaced with yamls with almost all random options (progression balancing still disabled).

Replace any()/all() in load_rule with for loops

Each use of any(generator)/all(generator) would have to

  1. Create the new generator object
  2. Call any()/all() on the generator
  3. Iterate that generator from within the any()/all() call
  4. Iterate the list from within the generator
  5. Call the iterated function from within the generator and return its result to the any()/all()
  6. Determine if the any()/all() can now return or needs to continue iterating

Using for loops removes a lot of the overhead, effectively reducing the code to only steps 4. and 5. and 6..

On its own, this change resulted in a surprisingly big reduction in generation time, of around 12.8%. Some of the other changes described later reduce the number of rules that would have needed to be combined together with any()/all(), so the overall effect of this change is probably slightly lower when accounting for those other changes.

Pre-calculate rules

It is possible to pre-calculate many rules based on options and set them on the BlasRules object.

This removes the time that would be spent calculating a result from many rules.

There are a few rules that will always return True/False for some options, but have to check the CollectionState for other options. Some of these rules are called by other rules, so it would also be possible to pre-calculate parts of the other rules, but it starts to get messier the more rules you try to unroll like this, so some of these other rules have not been modified. One example would be can_dawn_jump(), which always returns False with world.options.difficulty < 1, but is used in other rules, such as can_cross_gap_1().

Boss strengths have also been pre-calculated depending on the value of the difficulty option. This additionally avoids re-creating the boss_strengths dictionary on every call to has_boss_strength()

Removal of rules that are always True or always False

By using specific function instances to represent rules that are always True or always False for a player's chosen options, the load_rule function can check if a 'string rule' is either function and then adjust the requirements it is combining together.

A rule that is always True can be skipped in requirements.
A rule that is always False makes the entire list of requirements impossible, so the clause containing those requirements can be skipped.

This reduces the number of separate rule functions that get chained together, making the final rules for each location/entrance faster.

Don't add indirect conditions for rules that are always True or always False

Adding on to the previous point, if a rule ends up being always True or always False for a player's chosen options, adding the corresponding indirect conditions for that rule can be skipped.

This makes it faster to update reachable regions during generation.

Replace count_unique_from_group() with large groups with custom collect()/remove()

Each item in the group has to be checked individually. Initially I replaced this with has_unique_from_group because it could return earlier than counting all the unique items, but custom world.collect/world.remove to track the count of unique items from each group collected into/removed from the state was faster.

For simplicity, all uses of count_unique_from_group() were updated, but maybe the smaller groups might be slightly better using has_unique_from_group(count).

Return early in most 'doors' rules

Most 'doors' rules were always checking every group of regions, but when only a few groups of regions needed to be accessible for a rule to return True. These rules now return True as soon as the required count of accessible regions is met.

miriam_rooms was also only called requiring the maximum number of 'doors' (regions) to be accessible. It has been replaced with all_miriam_rooms that returns False as soon as it finds an inaccessible region.

Low hanging fruit

These changes have little to no noticeable effect on performance, but are, or should be slightly faster.

Replace lists in rules with tuples

Tuples are, in general, slightly faster to create and more memory efficient. Literal tuples (tuples containing only literals or other literal tuples) are much faster to create.

There was also one function that was creating a set that has also been replaced with a tuple.

Use state.has in ranged()

This function was returning state.count(...) and the rules using the function would compare ranged(state) > 0. ranged() now returns state.has(...) and the rules using ranged() have been updated.

Use for loops instead of sum(generator) in rules

Using a for loop instead was about twice as fast. It avoids the overhead of creating a generator and iterating that generator, which in turn iterates the tuple/list.

Calculate player strength only once in amanecida_rooms()

This function called 4 rule functions that all had to calculate player strength. Player strength is now calculated once and the result is passed to the 4 functions.

Things to think about

There are a lot of rules that check for access to regions. It might be beneficial to replace some of these with events, but this is not clear from a performance standpoint.

It sounds like events might be useful in cases where either:

  1. There are multiple possible regions where any one being accessible allows a rule to return True.
    By placing the same event item in each of these regions, checking for the state having at least one of that item means only a single check is required, compared to checking multiple regions until at least one is accessible.
  2. There is a need to know how many of a number of regions are accessible.
    By placing the same event item in each of these regions, the count can be retrieved from the state and compared against instead of counting accessible regions until it is determined that enough are accessible for a rule to return True, or there are no more regions to check.

Some of the 'doors' rules increase a count for access to either Region A OR Region B, but having both accessible won't increase the count twice, so this could make using events more complicated.

Other investigations

  • Plenty of rules access self.player or state.can_reach_region multiple times. I tried assigning these to a variable and using that variable to reduce attribute lookups, but this made no noticeable difference.
  • If the rule for an entrance, returned by load_rule, is impossible (load_rule returns Rules._never) it would be beneficial to not add that entrance in the first place, but it appears that this never happens currently.
  • Similarly to the previous point, it appears that no location rules returned by load_rule ever end up impossible. A location with an impossible rule should not happen anyway, because Archipelago expects all locations to be reachable with all items collected.

How was this tested?

I have the end of the generator modified locally to convert every location in the multiworld and the item placed at each location into a string. This string is written to a file if that string does not already exist on a line in that file. If no new line is written, then that means the generation was identical to before, and this result is printed to the console.

Using this, I created a Blasphemous yaml with almost entirely random options, duplicated it 10 times and then generated the same seed before and after this PR. I observed the same results produced before and after.

Reduced --no_output generation duration of 10 template yamls with no
progression balancing by around 3.6% (19.5s -> 18.8s)
Every use of any/all was creating a new generator every time that lambda
was called, and then had to iterate that generator, which had to iterate
a list of functions.

Reduced --no_output generation duration of 10 template yamls with no
progression balancing by around 12.4% (18.8s -> 16.5s)
- Earlier return once a single door is accessible
- Faster iteration: tuple vs set, and for loop vs generator
- Faster container creation: literal tuple vs set
Tuples are slightly faster to create than lists, but literal tuples,
tuples containing only literals, are much faster to create.

Tuples are also the tiniest bit faster to iterate than lists, but this
difference is almost always negligible.
The required player strengths for bosses to logically be beatable are
now pre-calculated for the player's chosen difficulty option and stored
on the BlasRules instance.
Barely any performance to gain from this. The for loops seem to be about
2x as fast as `sum(generator)` when there is nothing to add, and about
1.2x as fast when every iterated element results in addition.

This was profiled using `guilt_rooms` and its 7-length `doors` tuple.

`miriam_rooms` with its 5-length tuple is expected to favour the for
loops even more.
The 'has' rules, in the case of counting unique from groups stop
iterating once the desired count is reached, whereas a 'count' rule
always has to iterate the full group.

The CollectionState code for 'count_group_unique' also has the overhead
of creating and iterating a generator currently, whereas,
`has_group_unique` is using a for loop.
Earlier return and no need to count.
These did not need changing and should not affect performance, so
reverting these changes simplifies the PR
@github-actions github-actions bot added the waiting-on: peer-review Issue/PR has not been reviewed by enough people yet. label Jan 31, 2025
@Mysteryem
Copy link
Contributor Author

@TRPG0

I don't know the game, so maybe some of these changes are not desirable if there are some future plans for the world that would work better without some of these changes.


Blasphemous has a lot of regions and indirect conditions, so it is heavily affected by both #4582 and #4583, both of which would make Blasphemous generate even faster. Blasphemous was the reason I looked into those issues in Core.

With both of those Core PRs and this PR together, the same 10x template Blasphemous generation reduces to about 10.8s for me (so a further 4.1s than just this PR), almost halving the generation compared to generating without any of the PRs.

@Exempt-Medic Exempt-Medic requested a review from TRPG0 January 31, 2025 04:56
@Exempt-Medic Exempt-Medic added is: enhancement Issues requesting new features or pull requests implementing new features. waiting-on: world-maintainer Issue/PR is waiting for feedback or approval by the maintainer of a world. labels Jan 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
is: enhancement Issues requesting new features or pull requests implementing new features. waiting-on: peer-review Issue/PR has not been reviewed by enough people yet. waiting-on: world-maintainer Issue/PR is waiting for feedback or approval by the maintainer of a world.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants