Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(python)!: Allow all DataType objects to be instantiated #12470

Merged
merged 6 commits into from
Dec 3, 2023

Conversation

stinodego
Copy link
Contributor

@stinodego stinodego commented Nov 15, 2023

Closes #6163

This went a lot more smoothly than I expected. This update should cause almost no friction with users.

Changes

  • The following methods now return instantiated DataType objects, rather than a mix of classes/instances:
    • Series.dtype
    • DataFrame.dtypes
    • LazyFrame.dtypes
    • DataFrame.schema
    • LazyFrame.schema
    • Series.struct.schema
  • Instantiating a DataType will now actually return an instantiated object, e.g. Int8() is no longer magically converted into an Int8 class.
  • Update the DataType class with default __hash__, __eq__, and __repr__ methods. These are valid for 'simple' data types. 'Complex' data types like Datetime and List overwrite these defaults as they did before.
  • Update conversion of DataTypes to/from Rust. Instantiated data types are now accepted and the returned data types are always instantiated.

Almost everything still works as before. The only thing really affected is the is operator:

Before

>>> s = pl.Series([1, 2, 3], dtype=pl.Int8)
>>> s.dtype is pl.Int8
True

After

>>> s.dtype is pl.Int8
False

Use the equality operator or isinstance instead:

>>> s.dtype == pl.Int8
True
>>> isinstance(s.dtype, pl.Int8)
True

And users may have to update some of their type hints.

After this is merged, there can be a little more cleanup and refactoring to smooth things out further behind the scenes. But this should take care of the user-facing changes I intended to make.

@github-actions github-actions bot added breaking Change that breaks backwards compatibility enhancement New feature or an improvement of an existing feature python Related to Python Polars labels Nov 15, 2023
Comment on lines +293 to +303
DataType::Int8 => {
let class = pl.getattr(intern!(py, "Int8")).unwrap();
class.call0().unwrap().into()
},
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use call0() to instantiate the Int8 class.

@@ -413,6 +467,24 @@ impl FromPyObject<'_> for Wrap<DataType> {
},
}
},
"Int8" => DataType::Int8,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instances of Int8 can be converted into the corresponding Rust datatype.

@@ -130,7 +130,7 @@ def test_ipc_schema(compression: IpcCompression) -> None:
df.write_ipc(f, compression=compression)
f.seek(0)

expected = {"a": pl.Int64, "b": pl.Utf8, "c": pl.Boolean}
expected = {"a": pl.Int64(), "b": pl.Utf8(), "c": pl.Boolean()}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous equality still holds - this change is just to satisfy mypy.

@stinodego stinodego added this to the 1.0.0 milestone Nov 15, 2023
@stinodego stinodego marked this pull request as ready for review November 15, 2023 19:36
@stinodego stinodego modified the milestones: 1.0.0, 0.20.0 Nov 16, 2023
@stinodego stinodego merged commit 6491844 into main Dec 3, 2023
18 checks passed
@stinodego stinodego deleted the dtype-update branch December 3, 2023 10:10
@stinodego stinodego changed the title feat(python)!: Return instantiated DataType objects in schema/dtype methods feat(python)!: Allow all DataType objects to be instantiated Dec 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking Change that breaks backwards compatibility enhancement New feature or an improvement of an existing feature python Related to Python Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Polars datatypes should be instantiated
1 participant