Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to validate an ordered categorical column? #71

Open
teddygroves opened this issue May 1, 2024 · 0 comments
Open

How to validate an ordered categorical column? #71

teddygroves opened this issue May 1, 2024 · 0 comments

Comments

@teddygroves
Copy link

This topic probably belongs in a discussion forum but I couldn't find one for patito. Please let me know if there is a better place to ask this.

I would like to use patito to validate a dataframe with a categorical column with known categories where the order of the categories is important. What I have done so far is as follows:

from typing import Literal, get_args

import patito as pt
import polars as pl


class MyModel(pt.Model):
    my_col: Literal["a", "b"]


my_dtype = pl.Enum([*get_args(MyModel.model_fields["my_col"].annotation)])

good_df = pl.DataFrame({"my_col": pl.Series(["b", "a"], dtype=my_dtype)})
bad_df = pl.DataFrame(
    {"my_col": pl.Series(["b", "a"], dtype=pl.Enum(["b", "a"]))}
)

MyModel.validate(good_df)
MyModel.validate(bad_df)

This passes for good_df and fails for bad_df as expected. However I'm not 100% sure that this is the intended use of Literal in a patito model, and it was a little awkward to get the correctly ordered categories to put in my custom dtype so I thought I'd ask to see if there's a better (or just different) way to do this.

@lmmx lmmx added this to Planner Sep 9, 2024
@lmmx lmmx moved this to 🧊 On hold in Planner Sep 9, 2024
@lmmx lmmx moved this from 🧊 On hold to 🔙 Backlog in Planner Sep 9, 2024
@lmmx lmmx removed this from Planner Sep 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant