Skip to content

Commit

Permalink
training.py: two tweaks to feature selection (#226)
Browse files Browse the repository at this point in the history
1. Include posting amounts as a feature. This allows us to distinguish
different classes of payments to the same payee (e.g. recurring membership
fees, which often have a constant amount, from individual purchases).

2. For example key/value pairs, include the key by itself (with no substring
of the value) as a feature. This is useful because different account types
often have non-overlapping sets of example keys, and including the bare key as
a value allows the decision tree to be effectively segmented by account type
fairly close to the root.

These two very small changes significantly improve training accuracy on my
journal, from 94.81% to 99.32% (an 86% reduction in error rate!).
  • Loading branch information
jktomer authored Sep 21, 2024
1 parent f8fcb72 commit 30dc718
Show file tree
Hide file tree
Showing 2 changed files with 8 additions and 3 deletions.
5 changes: 3 additions & 2 deletions beancount_import/training.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,10 +30,11 @@
def get_features(example: PredictionInput) -> Dict[str, bool]:
features = collections.defaultdict(lambda: False) # type: Dict[str, bool]
features['account:%s' % example.source_account] = True

# For now, skip amount and date.
features['amount:%s' % example.amount.currency] = example.amount.number
# For now, skip date.

for key, values in example.key_value_pairs.items():
features[key] = True
if isinstance(values, str):
values = (values, )
for value in values:
Expand Down
6 changes: 5 additions & 1 deletion beancount_import/training_test.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
import datetime

from beancount.core.data import Amount
from beancount.core.number import D
from . import test_util
from . import training

Expand All @@ -21,7 +22,10 @@ def test_get_features():
'a:hello': True,
'b:foo': True,
'b:bar': True,
'b:foo bar': True
'b:foo bar': True,
'a': True,
'b': True,
'amount:USD': D(3)
}


Expand Down

0 comments on commit 30dc718

Please sign in to comment.