Prediction List Improvements for Result File Dataclasses #206

mawelborn · 2025-01-06T23:28:42Z

Expanding the PredictionList API to simplify some common patterns seen in auto review and custom output.

Group by Set of Linked Labels / Pages

Predictions can now be grouped by mutable collections, including a document extractions's set of linked label groups and an unbundling's list of pages. Internally, mutable collections are converted to their immutable variant before being used as a dictionary key.

Before:

extractions.groupby(lambda extraction: frozenset(extraction.groups))
unbundlings.groupby(lambda unbundling: tuple(unbundling.pages))

After:

extractions.groupby(attrgetter("groups"))
unbundlings.groupby(attrgetter("pages"))

Group by Individual Linked Labels / Pages

The new .groupbyiter() method groups each prediction with every key in an iterable individually. This is particularly useful for a document extraction's set of linked label groups. While it's sometimes desirable to group by the entire set as .groupby() does, it's more often desirable to group by each linked label group individually.

Before:

extractions_by_group: Mapping[Group, Extraction] = defaultdict(PredictionList)

for extraction in extractions:
    for group in extraction.groups:
        extractions_by_group[group].append(extraction)

After:

extractions.groupbyiter(attrgetter("groups"))

The .groupby() and .groupbyiter() unit tests are good examples of the difference in behavior:

>>> predictions.extractions.groupby(attrgetter("groups"))
{
    frozenset({group_alpha}): [first_name],
    frozenset({group_alpha, group_bravo}): [last_name],
}
>>> predictions.extractions.groupbyiter(attrgetter("groups"))
{
    group_alpha: [first_name, last_name],
    group_bravo: [last_name],
}

"Where Attr In" for Documents, Models, and Reviews

The .where() method has new document_in, model_in, and review_in keyword args to complement the existing single-value variants.

Before:

predictions.where(lambda prediction: prediction.model in MODELS)
predictions.where(lambda prediction: prediction.document in DOCUMENTS)
predictions.where(lambda prediction: prediction.review in REVIEWS)

After:

predictions.where(model_in=MODELS)
predictions.where(document_in=DOCUMENTS)
predictions.where(review_in=REVIEWS)

mawelborn added 5 commits January 6, 2025 16:42

Support grouping predictions by mutable collections

8a86e59

Add a variant of groupby that groups by each key in an iterable

0c4af11

Add *_in variants of document, model, and review .where() kwargs

357297c

Improve .where() docstring

4108d8f

Update unit tests

2d607b8

mawelborn requested review from nickesparza, Scott771, andrew8bit, annaliu-indico and prafulIndico January 6, 2025 23:28

Use dict literal to improve .groupby() unit tests

be7e3ae

mawelborn merged commit e7c84c6 into main Jan 8, 2025
9 checks passed

mawelborn deleted the mawelborn/prediction-list-improvements branch January 8, 2025 19:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Prediction List Improvements for Result File Dataclasses #206

Prediction List Improvements for Result File Dataclasses #206

Uh oh!

mawelborn commented Jan 6, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Prediction List Improvements for Result File Dataclasses #206

Prediction List Improvements for Result File Dataclasses #206

Uh oh!

Conversation

mawelborn commented Jan 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mawelborn commented Jan 6, 2025 •

edited

Loading