Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

matching.py: Fix incorrect fuzzy amount loop bounds #210

Merged
merged 2 commits into from
Nov 19, 2023

Conversation

ankurdave
Copy link
Contributor

@ankurdave ankurdave commented Nov 1, 2023

The indexing logic in matching.py has an off-by-one bug that manifests as #202.

We currently modify the lower bound of the loop to include an element less than the lower bound:

for sp in cur_matches[lower_bound-1 if lower_bound > 0 else 0:upper_bound]:
    # Verify that number is in fuzzy range.
    # Currently fails (issue #202).
    assert abs(sp.number - amount.number) <= self.fuzzy_match_amount

Removing that fixes the problem:

for sp in cur_matches[lower_bound:upper_bound]:
    # Verify that number is in fuzzy range.
    # No longer fails.
    assert abs(sp.number - amount.number) <= self.fuzzy_match_amount

This approach is an alternative to #206.

In addition to the loop bounds fix, this PR adds the following:

  • The above loop assertion.
  • test_nonmatch_fuzzy_amount_with_dates: a repro for transactions matched that shouldn't be #202. It previously failed and now passes.
  • test_match_fuzzy_amount_upper_bound: exercises the loop's upper bound handling. Injecting an off-by-one error in the upper bound by changing upper_bound to upper_bound-1 causes this test to fail.

Fixes #202.

@ankurdave
Copy link
Contributor Author

ankurdave commented Nov 2, 2023

It seems like the tests are failing due to unstable ordering of the posting meta keys. I saw the same issue when running locally on master with Python 3.11.

@ankurdave
Copy link
Contributor Author

I think the recent release of Beancount 2.3.6 broke the tests. Here's the diff between releases and the relevant PR: beancount/beancount#726

I'll submit an empty PR to confirm, then a PR to unbreak.

@Zburatorul
Copy link
Collaborator

Looks like it's still failing after the metadata sorting PR got merged.

@@ -443,7 +443,8 @@ def _get_matches(
if cur_matches is not None:
lower_bound = bisect.bisect_left(cur_matches, (lower, tuple(), None, None))
upper_bound = bisect.bisect_right(cur_matches, (upper, (sys.maxsize,), None, None), lo=lower_bound)
for sp in cur_matches[lower_bound-1 if lower_bound > 0 else 0:upper_bound]:
for sp in cur_matches[lower_bound:upper_bound]:
assert abs(sp.number - amount.number) <= self.fuzzy_match_amount
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this assert have an easy informative error message?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the following message:

                assert abs(sp.number - amount.number) <= self.fuzzy_match_amount, (
                    f'Bug in matching algorithm: {sp} is not within '
                    + f'{self.fuzzy_match_amount} of {amount}')

@Zburatorul
Copy link
Collaborator

I forgot to ask, but do you understand why this off-by-one error slipped through in previous PRs? Is it easy to construct a test that exercises just that logic?

@ankurdave
Copy link
Contributor Author

ankurdave commented Nov 19, 2023

@Zburatorul:

I forgot to ask, but do you understand why this off-by-one error slipped through in previous PRs? Is it easy to construct a test that exercises just that logic?

Good question, I was unclear about that as well. It turns out this was because the existing nonmatch test only exercises the case where the target posting is at the beginning of the list of candidates (i.e. lower_bound == 0), in which case the existing code happened to be correct.

In turn that's because the Expenses:FIXME posting has a large positive amount (100 USD), which is negated during candidate search to become a large negative number (-100).

The new test instead has an Expenses:FIXME posting with a large negative amount (-20.00 USD), so it exercises the opposite case.

You can see this by adding the following print statements:

modified   beancount_import/matching.py
@@ -444,7 +444,22 @@ class PostingDatabase(object):
         if cur_matches is not None:
             lower_bound = bisect.bisect_left(cur_matches, (lower, tuple(), None, None))
             upper_bound = bisect.bisect_right(cur_matches, (upper, (sys.maxsize,), None, None), lo=lower_bound)
-            for sp in cur_matches[lower_bound:upper_bound]:
+
+            print()
+            print(f'__get_matches({account}, {date}, {amount}): '
+                  + f'considering {len(cur_matches)} date/currency matches')
+            printer = beancount.parser.printer.EntryPrinter()
+            for i in range(0, lower_bound-1 if lower_bound > 0 else 0):
+                print(f'  Not considering match below lower bound, at {i}: '
+                      + f'{printer(cur_matches[i].mp.posting)}')
+            for i in range(lower_bound-1 if lower_bound > 0 else 0, upper_bound):
+                print(f'  Considering match within bounds, at {i}: '
+                      + f'{printer(cur_matches[i].mp.posting)}')
+            for i in range(upper_bound, len(cur_matches)):
+                print(f'  Not considering match above upper bound, at {i}: '
+                      + f'{printer(cur_matches[i].mp.posting)}')
+
+            for sp in cur_matches[lower_bound-1 if lower_bound > 0 else 0:upper_bound]:
                 assert abs(sp.number - amount.number) <= self.fuzzy_match_amount, (
                     f'Bug in matching algorithm: {sp} is not within '
                     + f'{self.fuzzy_match_amount} of {amount}')

I simplified the new test and documented this in a comment.

@Zburatorul Zburatorul merged commit c831323 into jbms:master Nov 19, 2023
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

transactions matched that shouldn't be
2 participants