Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix summary reading being too slow in case of many summary keys to match against #7410

Merged

Conversation

eivindjahren
Copy link
Contributor

@eivindjahren eivindjahren commented Mar 8, 2024

Resolves #7262

looping over a fnmatch is too slow, especially after converting the fnmatch from c to python. Compiling a regex once is much faster, see the included test.

I ran the following test and the result of the new function seems to be identical to the old function.

from hypothesis import given, settings
from fnmatch import fnmatch, translate
import re
import hypothesis.strategies as st


def _fetch_keys_to_matcher(fetch_keys):
    if not fetch_keys:
        return lambda _: False
    regex = re.compile("|".join(translate(key) for key in fetch_keys))
    return lambda s: regex.fullmatch(s) is not None


def _should_load_summary_key(data_key, user_set_keys):
    return any(fnmatch(data_key, key) for key in user_set_keys)


@settings(max_examples=10000)
@given(st.lists(st.text()), st.text())
def test_are_the_same(fetch_keys, key):
    assert _fetch_keys_to_matcher(fetch_keys)(key) == _should_load_summary_key(
        key, fetch_keys
    )

I took some benchmarks using random but realistic fetch_keys and keys.

image

image

The new algorithm can sometimes perform worse when the size of fetch_keys is close to or smaller than the number of keys, but the old algorithm is regularly worse as the number of keys grow.

@eivindjahren eivindjahren changed the title Use a compiled regex for matching in summary Fix summary reading being too slow in case of many summary keys to match against Mar 8, 2024
@eivindjahren eivindjahren force-pushed the replace_fnmatch_with_compiled_regex branch 4 times, most recently from af8ca49 to afcd909 Compare March 8, 2024 13:54
@codecov-commenter
Copy link

codecov-commenter commented Mar 8, 2024

Codecov Report

Attention: Patch coverage is 88.88889% with 1 lines in your changes are missing coverage. Please review.

Project coverage is 84.45%. Comparing base (2f02032) to head (e0c2692).

Files Patch % Lines
src/ert/config/_read_summary.py 88.88% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7410      +/-   ##
==========================================
- Coverage   85.19%   84.45%   -0.75%     
==========================================
  Files         384      384              
  Lines       22860    22864       +4     
  Branches      882      883       +1     
==========================================
- Hits        19476    19309     -167     
- Misses       3276     3442     +166     
- Partials      108      113       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@eivindjahren eivindjahren force-pushed the replace_fnmatch_with_compiled_regex branch 2 times, most recently from 9fae82e to be5f8df Compare March 8, 2024 20:30
@eivindjahren eivindjahren added the release-notes:bug-fix Automatically categorise as bug fix in release notes label Mar 8, 2024
@eivindjahren eivindjahren marked this pull request as ready for review March 8, 2024 20:31
@eivindjahren eivindjahren force-pushed the replace_fnmatch_with_compiled_regex branch from be5f8df to 3681521 Compare March 9, 2024 12:17
looping over a fnmatch was simply too slow
@eivindjahren eivindjahren force-pushed the replace_fnmatch_with_compiled_regex branch from 3681521 to e0c2692 Compare March 9, 2024 12:29
@eivindjahren eivindjahren enabled auto-merge (rebase) March 11, 2024 13:46
@eivindjahren eivindjahren merged commit 6bac0ba into equinor:main Mar 11, 2024
53 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-notes:bug-fix Automatically categorise as bug fix in release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Realizations have finished but still reported as running in the GUI
3 participants