-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversion between Polars -> Patito DataFrames and back #10
Comments
The two types seem to be doing different things, a patito dataframe exposes APIs related to schema validation and management of the data in relation to schema. While a polars dataframe is for transformations selections and input/output. The true problem seems to be that they should have a different name to describe this distinction better. But for you, I'm wondering what code you have that cannot work around this distinction? |
I apologize for this, I'm relatively new to pyarrow and didn't realize it maintains dtypes through to polars. The use case my team and I have is using Patito as a data validation medium prior to transformation of our data. We often struggle with polars joining and exploding on columns that have erroneous data types so we were looking for a solution to quickly validate data, then hand it off to the next process for transformation, while maintaining the type castings. I did not realize I could merely convert to arrow, then back to polars after validation with patito. Thanks for your response, and I apologize again for not doing proper research before asking this question! |
I ran into DataFrame subclassing causing an issue. Quickest zero copy I guess is to do this: |
I also think this would be useful (or at least cleaner code wise), and pretty straightforward to implement? Then patito is polars but with pydantic validation, which I feel is a very clean thing to describe to users. ^ Bits like this are just a very nice functionality add on top of polars, not so much for validation. |
The functionality of this packages is awesome, but for the use case my team and I have, it's rendered essentially useless due to the fact that patito.polars.DataFrames can't be reverted back to polars.polars.DataFrames. This feature would be a huge help!
The text was updated successfully, but these errors were encountered: