Skip to content

A categorizer for beancount, taking payee and/or description and returning account names associated looking up in a spreadsheet.

License

Notifications You must be signed in to change notification settings

erpreciso/beancount_spreadsheet_categorizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Explanation

Motivation

Long time user of beancount and the Beancount Categorizer, a need materialized to delegate categorization of entries to an unskilled other person.

Therefore, constraints:

  • Data entry in spreadsheet-like format.
  • Irrelevance of ordering of clauses, i.e. any sorting or ordering manipulation should not affect matching logic.
  • Partial matching.
  • Case insensitive.
  • Punctuation allowed.

./test/test-spreadsheet-screenshot.png

Tutorial

Install

  • Place the file spreadsheet_categorizer.py in your PYTHONPATH. Alternatively, use this line to add the folder where you saved the module to PYTHONPATH, integrating <full path> with the full absolute path:
    sys.path.append("<full path>/projects/beancount_spreadsheet_categorizer")
        
  • Create the categorizer spreadsheet starting from the template categorizer.ods. Add at least one entry.

Setup

Import

from spreadsheet_categorizer import SpreadsheetCategorizer as SC

Initialize

Initialize the categorizer object in the __init__ of your importer, so it can be called by the extraction methods.

class MyImporter(importer.ImporterProtocol):
    def __init__(self):
        spreadsheet_path = 'test/categorizer.ods'
        sheet_name = 'valid'
        self.SC = SC(spreadsheet_path, sheet_name)

Apply

Your imported should have some code that create transactions using information extracted from the source. Call the method match from the categorizer object initialized in init, and pass the two arguments payee and description (sometimes called ‘narration’ in beancount code). One or both arguments can be None and will be matched as per matching logic.

As example, see below an importer class method that calls match to get the two transaction’s posting accounts.

def create_posting(self, line, cedente):
    desc = line['Desc']
    acc_source, _ = self.SC.match(cedente, desc)
    if not acc_source:
        acc_source = 'Expenses:FIXME'
    price = float(line['Prezzo'])
    tax = float(line['Aliquota IVA'])
    price_withtax = round(price + price * tax / 100, 2)
    amount = Amount(Decimal('{:.2f}'.format(price_withtax)), 'EUR')
    meta = {'description': self.clean_string(desc)}
    p = data.Posting(acc_source, amount, None, None, None, meta)
    return p

In this case, the method is passed a dataframe line with the columns ['Desc', 'Prezzo', 'Aliquota IVA']; in addition, the payee is passed as cedente. The match method is called to get only the source account, and the destination account is ignored. If the source account is not found, the posting is created with a Expenses:FIXME account.

The destination account of the same transaction is extracted somewhere else in the code to create the main transaction:

_, acc_dest = self.SC.match(cedente, None)

In this case, the description is not passed as parameter since, as part of this importer matching logic, there is one and only one destination account by payee, and ‘catch-all’ clauses are present for every payee in the spreadsheet (behaviour of catch-all cases to be applied when missing payee or description can be controlled by setting them up in the spreadsheet, see matching logic.)

How-to

Hook the categorizer to an importer

Below a simple example to use the spreadsheet categorizer in an importer.

It uses the ‘hook’ facility built with smart_importer.

file ‘categorizer.py’ in your “importers” folder to be used by bean-extract

Built on the `ImporterHook’ class in hooks.py

import pandas as pd
from beancount.core import data
from smart_importer.hooks import ImporterHook
from spreadsheet_categorizer import SpreadsheetCategorizer as SC


class AppliedSpreadsheetCategorizer(ImporterHook):
    def __init__(self, spreadsheet, sheet_name):
        self.asc = SC(spreadsheet, sheet_name)

    def __call__(self, importer, file, imported_entries, existing_entries):
        return [self._process(entry) for entry in imported_entries]

    def _process(self, entry):
        """Process the entry by adding/replacing the posting(s).

        If entry is not a transaction, pass it unchanged."""
        # exclude all non-transaction entries
        if type(entry) != data.Transaction or len(entry.postings) != 1:
            return entry
        # use the spreadsheet importer: match the payee and narration
        acc_source, acc_dest = self.asc.match(entry.payee, entry.narration)
        # create the main posting
        source_posting = data.Posting(acc_source, None, None, None, None, None)
        if acc_dest and not pd.isnull(acc_dest):
            # if both accounts are specified in the spreadsheet, replace also
            # the existing posting with the new one
            units = entry.postings[0].units
            meta = entry.postings[0].meta
            dest_posting = data.Posting(acc_dest, units, None, None, None,
                                        meta)
            entry.postings.clear()
            entry.postings.append(source_posting)
            entry.postings.append(dest_posting)
        else:
            # otherwise, simply append or insert the new posting
            if entry.postings[0].units.number > 0:
                entry.postings.append(source_posting)
            else:
                entry.postings.insert(0, source_posting)
        return entry

file ‘config.py’ used by bean-extract

from smart_importer import apply_hooks
from smart_importer import PredictPostings as predict

# import my custom importer
from my_custom_importer_module import MyCustomImporter

# import the categorizer class built on smart_importer
from categorizer import AppliedSpreadsheetCategorizer as ASC

SPREADSHEET_PATH = "~/projects/spreadsheet-importer.ods"

spreadsheet_categorizer_instance = ASC(SPREADSHEET_PATH, 'categorizer-sheet')

# the below applies both the spreadsheet categorizer, and the suggested
# PredictPostings hook
CONFIG = [apply_hooks(MyCustomImporter(), [spreadsheet_categorizer_instance,
                                          predict()])]

Matching logic

Matching is done with the method search_key that does

  1. a regex search of the payee, therefore allowing partial match
  2. if more matches are found, a string match, allowing only exact match.

The regex is escaped before the matching, to allow for punctuation in the search payee or description. Case is always insensitive.

If only the payee, or only the description, or neither of them are passed, the method search for a ‘nan’ in place of the missing argument; during categorizer initialization, the catch-all cases are compiled using this ‘nan’ string to identify cases that should match everything.

Example cases are more eloquent. Below, the match and the corresponding relevant entries. All these cases are in unittest test/test.py.

Match payee and description, case insensitive

payeedescriptionaccount-sourceaccount-destination
Gas StationTank car #1Exp:Car:VolvoAssets:Bank
match('gas station', 'tank car #1') == 'Exp:Car:Volvo', 'Assets:Bank'

Match payee only

payeedescriptionaccount-sourceaccount-destination
Gas StationExp:CarAssets:FIXME-NO-DESC’
match('gas station', '') == 'Exp:Car', 'Assets:FIXME-NO-DESC'

Match description only

payeedescriptionaccount-sourceaccount-destination
t-shirtExp:Clothes
match('', 't-shirt') == 'Exp:Clothes', None

Partial match

payeedescriptionaccount-sourceaccount-destination
t-shirtExp:Clothes
match('', 'shirt') == 'Exp:Clothes', None

Exact match

payeedescriptionaccount-sourceaccount-destination
Gas StationTank car #1Exp:Car:VolvoAssets:Bank
Gas statTank car #1Exp:Car:PorscheAssets:Bank

The search payee is partially matched with the first one, and exactly matched with the second. Only the second is returned.

match('gas stat', 'tank car #1') == 'Exp:Car:Porsche', 'Assets:Bank'

Ambiguous description

payeedescriptionaccount-sourceaccount-destination
Gas StationTank car #1Exp:Car:VolvoAssets:Bank
Gas StationTank car #2Exp:Car:SaabAssets:Bank
Gas StationExp:CarAssets:FIXME-NO-DESC

The search description could match two cases, therefore it is ignored and the payee catch-all clause is matched.

match('gas station', 'tank car') == 'Exp:Car', 'Assets:FIXME-NO-DESC'

Ambiguous payee and description

payeedescriptionaccount-sourceaccount-destination
Gas StationTank car #1Exp:Car:VolvoAssets:Bank
Gas StationTank car #2Exp:Car:SaabAssets:Bank
Gas StationExp:CarAssets:FIXME-NO-DESC
Gas statTank car #1Exp:Car:PorscheAssets:Bank
Exp:CATCH-ALLAssets:CATCH-ALL

Both search payee and description could match multiple cases, therefore they are both ignored and the general catch-all clause is matched.

match('gas', 'tank') == 'Exp:CATCH-ALL', 'Assets:CATCH-ALL'

Case not found

payeedescriptionaccount-sourceaccount-destination
a.b.c.serviceExp:SoftwareAssets:Cash
Exp:CATCH-ALLAssets:CATCH-ALL

Case is not found; while payee exists in the table, there isn’t a catch-all clause for that payee. Therefore, general catch-all clause applies.

match('a.b.c.', 'software') == 'Exp:CATCH-ALL', 'Assets:CATCH-ALL'

Reference

API guide (DRAFT)

class SpreadsheetCategorizer

method __init__

Take log_level as argument.

method search_key

Categorizer spreadsheet specifications

  • Must be an ODS spreadsheet (see Open Document Format for Office Applications (ODF).
  • Must have four columns containing:
    • the payee to be matched
    • the description to be matched
    • the account source to be returned
    • the account destination to be returned
  • If these four columns are not named with standard names, a dict with the preferred names must be passed to the constructor.
  • It could contain additional columns (e.g. comments, sorting keys).
  • It should not contain empty rows.
  • It must contain a catch-all clause when both payee and description are missing.
  • It should not contain duplicate rows.
  • Warnings will be issued if the import detect situations like missing catch-all payees that, while generating undesired results, do not compromize the logic.
  • The import stops if, instead, illegal cases are present, like duplicated cases.

External references

Future

  • Improve documentation.
    • Add docstrings.
  • Package the module.
  • Add full example, including an importer to show full workflow.
  • Support input of orgmode tables.
  • Support Microsoft Excel spreadsheet.

About

A categorizer for beancount, taking payee and/or description and returning account names associated looking up in a spreadsheet.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages