Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validation bug with pl.Categorical ordering argument #83

Open
chainyo opened this issue Jul 22, 2024 · 3 comments
Open

Validation bug with pl.Categorical ordering argument #83

chainyo opened this issue Jul 22, 2024 · 3 comments

Comments

@chainyo
Copy link
Contributor

chainyo commented Jul 22, 2024

Hi, I'm trying to set patito.Model with list of polars.Categorical but when validating there is an error with the ordering parameters which is by default ordering='physical'.

Here is the model example:

class Provenance(str, Enum):

    A = "A"
    B = "B"
    C = "C"

class Schema(patito.Model):

    index: int
    databases: List[Provenance] = patito.Field(dtype=polars.List(polars.Categorical))

The problem arises also when I use a typing.Literal instead of an Enum for the Provenance item.

Here is the error rised when validating the model:

databases
  Polars dtype List(Categorical(ordering='physical')) does not match model field type. (type=type_error.columndtype)

I'm forced to use polars.Categorical for now as polars.Enum isn't supported by patito for now.

Do you have suggestions to fix this error?

@EasterEggScrambler
Copy link

@chainyo: any commonalities with #71 ?

@lmmx lmmx added this to Planner Sep 9, 2024
@lmmx lmmx moved this to 🐣 Hatching in Planner Sep 9, 2024
@lmmx lmmx removed this from Planner Sep 14, 2024
@chainyo
Copy link
Contributor Author

chainyo commented Jan 3, 2025

@chainyo: any commonalities with #71 ?

Sorry for the late reply but yes it seems to be linked as well.

Any news on this problem? @thomasaarholt

@thomasaarholt
Copy link
Collaborator

I am getting a polars rust panic on the following slightly different example. I was using this to debug your problem, but I've now spent over an hour banging by head against it and am not feeling great about it. I think it has to do with how we build a new DataFrame type here (because if I modify cls.DataFrame to pl.DataFrame here, it works.), but I don't understand why that leads to the error below.

import enum
import patito as pt

class Provenance(str, enum.Enum):
    A = "A"
    B = "B"
    C = "C"

class Schema(pt.Model):
  databases: list[Provenance]

Schema.examples()
---
thread '<unnamed>' panicked at crates/polars-arrow/src/array/struct_/mod.rs:120:56:
called `Result::unwrap()` on an `Err` value: ComputeError(ErrString("The children DataTypes of a StructArray must equal the children data types.
However, the field 0 has data type LargeList(Field { name: \"item\", dtype: Dictionary(UInt32, Utf8View, false), is_nullable: true, metadata: Some({\"_PL_ENUM_VALUES\": \"1;A1;B1;C\"}) }) but the value has data type LargeList(Field { name: \"item\", dtype: Dictionary(UInt32, Utf8View, false), is_nullable: true, metadata: None })"))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants