-
Notifications
You must be signed in to change notification settings - Fork 782
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Blasphemous: Increase logic performance #4585
base: main
Are you sure you want to change the base?
Blasphemous: Increase logic performance #4585
Conversation
Reduced --no_output generation duration of 10 template yamls with no progression balancing by around 3.6% (19.5s -> 18.8s)
Every use of any/all was creating a new generator every time that lambda was called, and then had to iterate that generator, which had to iterate a list of functions. Reduced --no_output generation duration of 10 template yamls with no progression balancing by around 12.4% (18.8s -> 16.5s)
- Earlier return once a single door is accessible - Faster iteration: tuple vs set, and for loop vs generator - Faster container creation: literal tuple vs set
Tuples are slightly faster to create than lists, but literal tuples, tuples containing only literals, are much faster to create. Tuples are also the tiniest bit faster to iterate than lists, but this difference is almost always negligible.
The required player strengths for bosses to logically be beatable are now pre-calculated for the player's chosen difficulty option and stored on the BlasRules instance.
Barely any performance to gain from this. The for loops seem to be about 2x as fast as `sum(generator)` when there is nothing to add, and about 1.2x as fast when every iterated element results in addition. This was profiled using `guilt_rooms` and its 7-length `doors` tuple. `miriam_rooms` with its 5-length tuple is expected to favour the for loops even more.
The 'has' rules, in the case of counting unique from groups stop iterating once the desired count is reached, whereas a 'count' rule always has to iterate the full group. The CollectionState code for 'count_group_unique' also has the overhead of creating and iterating a generator currently, whereas, `has_group_unique` is using a for loop.
Earlier return and no need to count.
This saves another 2% or so.
These did not need changing and should not affect performance, so reverting these changes simplifies the PR
I don't know the game, so maybe some of these changes are not desirable if there are some future plans for the world that would work better without some of these changes. Blasphemous has a lot of regions and indirect conditions, so it is heavily affected by both #4582 and #4583, both of which would make Blasphemous generate even faster. Blasphemous was the reason I looked into those issues in Core. With both of those Core PRs and this PR together, the same 10x template Blasphemous generation reduces to about 10.8s for me (so a further 4.1s than just this PR), almost halving the generation compared to generating without any of the PRs. |
Increases the performance of Blasphemous's logic.
Reduces
--skip_output --seed 1
generation with 10 template Blasphemous yamls, with progression balancing disabled, from 19.8s to 14.9s (-24.6% duration) while producing the same output as before. 5 trials were ran and averaged to produce these numbers.A similar percentage reduction was observed when the template yamls were replaced with yamls with almost all random options (progression balancing still disabled).
Replace
any()
/all()
inload_rule
withfor
loopsEach use of
any(generator)
/all(generator)
would have toany()
/all()
on the generatorany()
/all()
callany()
/all()
any()
/all()
can now return or needs to continue iteratingUsing
for
loops removes a lot of the overhead, effectively reducing the code to only steps 4. and 5. and 6..On its own, this change resulted in a surprisingly big reduction in generation time, of around 12.8%. Some of the other changes described later reduce the number of rules that would have needed to be combined together with
any()
/all()
, so the overall effect of this change is probably slightly lower when accounting for those other changes.Pre-calculate rules
It is possible to pre-calculate many rules based on options and set them on the
BlasRules
object.This removes the time that would be spent calculating a result from many rules.
There are a few rules that will always return
True
/False
for some options, but have to check theCollectionState
for other options. Some of these rules are called by other rules, so it would also be possible to pre-calculate parts of the other rules, but it starts to get messier the more rules you try to unroll like this, so some of these other rules have not been modified. One example would becan_dawn_jump()
, which always returnsFalse
withworld.options.difficulty < 1
, but is used in other rules, such ascan_cross_gap_1()
.Boss strengths have also been pre-calculated depending on the value of the difficulty option. This additionally avoids re-creating the
boss_strengths
dictionary on every call tohas_boss_strength()
Removal of rules that are always True or always False
By using specific function instances to represent rules that are always
True
or alwaysFalse
for a player's chosen options, theload_rule
function can check if a 'string rule' is either function and then adjust the requirements it is combining together.A rule that is always
True
can be skipped in requirements.A rule that is always
False
makes the entire list of requirements impossible, so the clause containing those requirements can be skipped.This reduces the number of separate rule functions that get chained together, making the final rules for each location/entrance faster.
Don't add indirect conditions for rules that are always
True
or alwaysFalse
Adding on to the previous point, if a rule ends up being always
True
or alwaysFalse
for a player's chosen options, adding the corresponding indirect conditions for that rule can be skipped.This makes it faster to update reachable regions during generation.
Replace
count_unique_from_group()
with large groups with customcollect()
/remove()
Each item in the group has to be checked individually. Initially I replaced this with
has_unique_from_group
because it could return earlier than counting all the unique items, but customworld.collect
/world.remove
to track the count of unique items from each group collected into/removed from the state was faster.For simplicity, all uses of
count_unique_from_group()
were updated, but maybe the smaller groups might be slightly better usinghas_unique_from_group(count)
.Return early in most 'doors' rules
Most 'doors' rules were always checking every group of regions, but when only a few groups of regions needed to be accessible for a rule to return
True
. These rules now returnTrue
as soon as the required count of accessible regions is met.miriam_rooms
was also only called requiring the maximum number of 'doors' (regions) to be accessible. It has been replaced withall_miriam_rooms
that returnsFalse
as soon as it finds an inaccessible region.Low hanging fruit
These changes have little to no noticeable effect on performance, but are, or should be slightly faster.
Replace lists in rules with tuples
Tuples are, in general, slightly faster to create and more memory efficient. Literal tuples (tuples containing only literals or other literal tuples) are much faster to create.
There was also one function that was creating a
set
that has also been replaced with atuple
.Use state.has in
ranged()
This function was returning
state.count(...)
and the rules using the function would compareranged(state) > 0
.ranged()
now returnsstate.has(...)
and the rules usingranged()
have been updated.Use
for
loops instead ofsum(generator)
in rulesUsing a for loop instead was about twice as fast. It avoids the overhead of creating a generator and iterating that generator, which in turn iterates the tuple/list.
Calculate player strength only once in
amanecida_rooms()
This function called 4 rule functions that all had to calculate player strength. Player strength is now calculated once and the result is passed to the 4 functions.
Things to think about
There are a lot of rules that check for access to regions. It might be beneficial to replace some of these with events, but this is not clear from a performance standpoint.
It sounds like events might be useful in cases where either:
True
.By placing the same event item in each of these regions, checking for the state having at least one of that item means only a single check is required, compared to checking multiple regions until at least one is accessible.
By placing the same event item in each of these regions, the count can be retrieved from the state and compared against instead of counting accessible regions until it is determined that enough are accessible for a rule to return
True
, or there are no more regions to check.Some of the 'doors' rules increase a count for access to either Region A OR Region B, but having both accessible won't increase the count twice, so this could make using events more complicated.
Other investigations
self.player
orstate.can_reach_region
multiple times. I tried assigning these to a variable and using that variable to reduce attribute lookups, but this made no noticeable difference.load_rule
, is impossible (load_rule
returnsRules._never
) it would be beneficial to not add that entrance in the first place, but it appears that this never happens currently.load_rule
ever end up impossible. A location with an impossible rule should not happen anyway, because Archipelago expects all locations to be reachable with all items collected.How was this tested?
I have the end of the generator modified locally to convert every location in the multiworld and the item placed at each location into a string. This string is written to a file if that string does not already exist on a line in that file. If no new line is written, then that means the generation was identical to before, and this result is printed to the console.
Using this, I created a Blasphemous yaml with almost entirely random options, duplicated it 10 times and then generated the same seed before and after this PR. I observed the same results produced before and after.