Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: use mypy pre-commit in local environment #1966

Merged
merged 56 commits into from
Feb 11, 2025

Conversation

EdAbati
Copy link
Collaborator

@EdAbati EdAbati commented Feb 8, 2025

What type of PR is this? (check all applicable)

  • πŸ’Ύ Refactor
  • ✨ Feature
  • πŸ› Bug Fix
  • πŸ”§ Optimization
  • πŸ“ Documentation
  • βœ… Test
  • 🐳 Other

Related issues

Checklist

  • Code follows style guide (ruff)
  • Tests added
  • Documented the changes

If you have comments or can explain your changes, please do so below

I did some digging, looking for how other projects run typing and general best practices with mypy.
One thing people seem to agree with is that mirrors-mypy has its gotchas and may not be ideal for every project.
In our case we would need to add every backend dependencies as additional-dependencies. In this case we would have to maintain a script that keep those dependencies up to date.

One alternative would be to run mypy in the local environment where all the dependencies are already installed. This is what this PR is trying to do.

I have still some open questions, making this draft to test the CI too.
~I am expecting to see 300 errors from mypy πŸ‘€

Update: I propose to move to this solution where we:

  • keep using pre-commit for linting but move mypy to run in the local env -> this is to avoid having to add many additional-dependencies
  • Add Makefile to add 1 consistent command to run typing checks: make typing
  • Run mypy in a separate CI pipeline on the entire code

@MarcoGorelli
Copy link
Member

i like the look of this, thanks for doing it

it looks like this fails in pre-commit-ci, so we may need to find another way of running it in ci? (i'm not averse to removing pre-commit-ci)

@EdAbati
Copy link
Collaborator Author

EdAbati commented Feb 8, 2025

Thanks @MarcoGorelli ! ❀️

This could be a possible approach that doesn't change super much what we currently have:

  • we add mypy in the dev dependencies
  • mypy runs in pre-commit locally
  • we use pyproject.toml to configure mypy's behaviour
  • we only remove mypy from the pre-commit.ci (since the local env with dependencies will not be created there)
  • we add a new GH action for typing
  • the Makefile is to create a command that works the same way in Linux/macOS and Windows (make typing)

The errors are now showing up in the CI!
The dependencies we install in the Typing CI will influence which backends we will check for typing error. (just for this example I only added core)

Happy to hear your thoughts (also @FBruzzesi and @dangotbanned )

@MarcoGorelli
Copy link
Member

nice, happy to go with this - if we can get it green and then gradually address things, happy to get this in

@dangotbanned

This comment was marked as outdated.

@@ -159,7 +159,7 @@ def select(
):
return self._from_native_frame(
self._native_frame.aggregate(
[val.alias(col) for col, val in new_columns_map.items()]
[val.alias(col) for col, val in new_columns_map.items()] # type: ignore[arg-type]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

based on the duckdb stubs, only str are allowed in .aggregate. πŸ€”

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks wrong, it should be str | list[Expression], right? want to make a PR to duckdb?

Comment on lines +55 to +56
"narwhals[tests]",
"narwhals[typing]",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should separate the dev dependencies into tests and typing.

The latest pinned mypy is available for python >=3.9 and if we simply include it in dev our python 3.8 tests will fail.

@EdAbati EdAbati marked this pull request as ready for review February 9, 2025 21:36
@dangotbanned

This comment was marked as resolved.

@dangotbanned
Copy link
Member

dangotbanned commented Feb 10, 2025

The last warning seems legit to me https://github.com/narwhals-dev/narwhals/actions/runs/13240255809/job/36953842381?pr=1966

Both uses of bin_count pass None as arguments to functions that don't accept None:

image

elif self._native_series.count() < 1:
if bins is not None:
data = {
"breakpoint": bins[1:],
"count": zeros(shape=len(bins) - 1),
}
else:
data = {
"breakpoint": linspace(0, 1, bin_count),
"count": zeros(shape=bin_count),
}

I don't see any tests covering Series.hist(..., bin_count=None):
https://github.com/narwhals-dev/narwhals/blob/4b1d778beeab20c3b5565dba629e320eb23804c8/tests/series_only/hist_test.py

The one mypy caught looks like it would cause an error at runtime:
https://github.com/numpy/numpy/blob/06f987b7b77453aae7fa11de15f728567fccd407/numpy/_core/function_base.py#L121-L125

@camriddell just wanted to run this by you following #1859 - feel like I must be missing something

Update

Resolved in (fd192f2)

This code provides the narrowing at runtime.
I tried out adding @overload(s) in Series, v1.Series, PandasLikeSeries but they end up invalidating eachother

narwhals/narwhals/series.py

Lines 2591 to 2601 in 60e5705

if bins is not None and bin_count is not None:
msg = "can only provide one of `bin_count` or `bins`"
raise ComputeError(msg)
if bins is None and bin_count is None:
bin_count = 10 # polars (v1.20) sets bin=10 if neither are provided.
if bins is not None:
for i in range(1, len(bins)):
if bins[i - 1] >= bins[i]:
msg = "bins must increase monotonically"
raise ComputeError(msg)

- The real solution would be simplifying the control flow
- There's too much going on here, but `nw.Series` has mutually exclusive logic that prevents the bug

narwhals-dev#1966 (comment)
```
Argument of type "Hashable | None" cannot be assigned to parameter "name" of type "str" in function "alias" Β Β Type "Hashable | None" is not assignable to type "str" Β Β Β Β "Hashable" is not assignable to "str"
```
@dangotbanned
Copy link
Member

@MarcoGorelli I'm reasonably happy with where the typing is at after (#1966 (commits))

Of these modules, narwhals._arrow.series is the only one left with major issues:

pyright shouting at me

image

Will try to look into those after this and #1979 have merged

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks both! sure, happy to move forwards and address the rest incrementally, much appreciate your efforts here!

@dangotbanned dangotbanned changed the title RFC, fix: use mypy pre-commit in local environment fix: use mypy pre-commit in local environment Feb 10, 2025
@camriddell
Copy link
Contributor

@dangotbanned seems like you resolved everything there, thanks!

If it is useful to you in a post-hoc sense

narwhals/narwhals/series.py

Lines 2591 to 2601 in 60e5705

if bins is not None and bin_count is not None:
msg = "can only provide one of `bin_count` or `bins`"
raise ComputeError(msg)
if bins is None and bin_count is None:
bin_count = 10 # polars (v1.20) sets bin=10 if neither are provided.
if bins is not None:
for i in range(1, len(bins)):
if bins[i - 1] >= bins[i]:
msg = "bins must increase monotonically"
raise ComputeError(msg)

The high-level narwhals.Series entrypoint controls over uses runtime determination of bins or bin_count to determine the behavior of the function. The following rules should be respected:

  1. One cannot pass both bins and bin_count in as it leads to ambiguous behavior
  2. If one is passed in, then it should be treated as its non-None type; the other should be set to None.
  3. If neither are passed it, then it is treated as bin_count=10 and bins=None

I decided to split broad smoke-tests into 2 separate functions:


Thanks for resolving all of these typing issue. In the future, it could be worth refactoring these two behaviors into separate functions for each backend and dispatch the correct call from the narwhals.Series. I think this would make the internal code make a lot more sense as well.

@dangotbanned dangotbanned mentioned this pull request Feb 11, 2025
10 tasks
@dangotbanned dangotbanned merged commit 2d6df36 into narwhals-dev:main Feb 11, 2025
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CI / typing: migrate away from pre-commit?
4 participants