Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversion between Polars -> Patito DataFrames and back #10

Open
alexthoden opened this issue Aug 15, 2023 · 4 comments
Open

Conversion between Polars -> Patito DataFrames and back #10

alexthoden opened this issue Aug 15, 2023 · 4 comments

Comments

@alexthoden
Copy link

The functionality of this packages is awesome, but for the use case my team and I have, it's rendered essentially useless due to the fact that patito.polars.DataFrames can't be reverted back to polars.polars.DataFrames. This feature would be a huge help!

@cbb330
Copy link

cbb330 commented Aug 18, 2023

Patito offers patito.DataFrame, a class that extends polars.DataFrame in order to provide utility methods related to patito.Model. The schema of a data frame can be specified at runtime by invoking patito.DataFrame.set_model(model), after which a set of contextualized methods become available:

The two types seem to be doing different things, a patito dataframe exposes APIs related to schema validation and management of the data in relation to schema. While a polars dataframe is for transformations selections and input/output.

The true problem seems to be that they should have a different name to describe this distinction better.

But for you, I'm wondering what code you have that cannot work around this distinction?

@alexthoden
Copy link
Author

I apologize for this, I'm relatively new to pyarrow and didn't realize it maintains dtypes through to polars. The use case my team and I have is using Patito as a data validation medium prior to transformation of our data. We often struggle with polars joining and exploding on columns that have erroneous data types so we were looking for a solution to quickly validate data, then hand it off to the next process for transformation, while maintaining the type castings. I did not realize I could merely convert to arrow, then back to polars after validation with patito. Thanks for your response, and I apologize again for not doing proper research before asking this question!

@ion-elgreco
Copy link
Contributor

ion-elgreco commented Oct 9, 2023

I ran into DataFrame subclassing causing an issue. Quickest zero copy I guess is to do this: pl.DataFrame(Model.examples().to_arrow())

@GeorgePearse
Copy link
Contributor

GeorgePearse commented Jul 21, 2024

I also think this would be useful (or at least cleaner code wise), and pretty straightforward to implement?

Then patito is polars but with pydantic validation, which I feel is a very clean thing to describe to users.

image

^ Bits like this are just a very nice functionality add on top of polars, not so much for validation.

@lmmx lmmx added this to Planner Sep 9, 2024
@lmmx lmmx moved this to 💡 Idea in Planner Sep 9, 2024
@lmmx lmmx removed this from Planner Sep 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants