Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Typehints for injected functionality #253

Open
lewisjared opened this issue Sep 8, 2023 · 5 comments
Open

Typehints for injected functionality #253

lewisjared opened this issue Sep 8, 2023 · 5 comments
Labels
bug Something isn't working question Further information is requested

Comments

@lewisjared
Copy link
Collaborator

lewisjared commented Sep 8, 2023

Describe the bug

Type hints for the injected functions on a ScmRun are not properly handled. Pycharm has no idea that lineplot is an available function which is a pretty poor developer experience.

With large container classes we want to provide a range of behaviour without having to lump it all in the same module which is difficult to understand

Currently we inject functionality for:

  • netcdf io
  • xarray io
  • arithemetic operations
  • plotting

We also inherit an "OpsMixin" baseclass which correctly sets some magic methods

There are a few ways of achieving this:

  1. Move to a model similar to xarray which uses inheritance to bring together functionality
    https://github.com/pydata/xarray/blob/e2b6f3468ef829b8a83637965d34a164bf3bca78/xarray/core/arithmetic.py#L122

Pandas doesn't use inheritance in the same way, but simply has a huge frame.py file

  1. Use composition to perform the same action. That might work well for plotting as it seems to be the common pattern used across xarray and pandas. e.g. df.plot.line or ds.plot.scatter. This might cause a large change to the API without some relatively opaque redirection.
    See https://github.com/pandas-dev/pandas/blob/d04747c367d00ee03c5d008ce5670892d450e801/pandas/core/series.py#L5782 for pandas accessors

  2. Add the IO functions to BaseScmRun, removing the need for the injection. The lower level functionality can happen elsewhere in scmdata.netcdf, but the method knows what to call.

In reality, the soln could be a mixture of all three.
plotting = composition
arithmetic = inheritance
xarray and NC = methods

@mikapfl Any other suggestions or thoughts?

Failing Test

Pycharm autocompletes lineplot with the correct type hint information

@lewisjared lewisjared added bug Something isn't working question Further information is requested labels Sep 8, 2023
@znicholls
Copy link
Collaborator

znicholls commented Sep 8, 2023

In reality, the soln could be a mixture of all three

This seems pretty sensible. I assume that there's no good way to avoid inheritance for the arithmetic (I have always found mix-ins very opaque but having a huge definition file also seems painful so maybe it is the best tradeoff here)?

@mikapfl
Copy link
Member

mikapfl commented Sep 8, 2023

For nice developer experience with primap2, I did some horrible things with pyi stubs: https://github.com/pik-primap/primap2/blob/main/primap-stubs.patch

However, since we are controlling the classes here, we probably don't have to do anything like that. The problem in primap2 is that we are providing new methods on xarray classes directly, using the xarray mechanism to register that. Honestly, the amount of weird stuff you have to do to support the pythonic API way of "method on class" makes me question if it is really worth it. Is

a = run.filter(category=x)

really so much better than

import scmdata as sd

a = sd.filter(run, category=x)

to warrant the added complexity?

I know everyone likes the pipe-style of chaining multiple things, so run.a().b().c() instead of c(b(a(run))), but still, in the primap2 developer docs there is a section on how to add a function. You'd hope that you don't have to write 2000 characters on how to add a function.

But enough philosophical musings, back to the topic at hand: your proposal looks sensible. There is a nice hierarchy of "core" (inheritance) to less "core" (composition, methods) functionality. I'd maybe opt for a huge class with all the arithmetic over inheritance because MixIns are really super opaque, but either way is a solid compromise.

@znicholls
Copy link
Collaborator

I know everyone likes the pipe-style of chaining multiple things, so run.a().b().c() instead of c(b(a(run)))

I would pay a stupidly high price to get chaining. I don't know why, it is just so much easier to reason for me.

@mikapfl
Copy link
Member

mikapfl commented Sep 8, 2023

Fair enough. Maybe the reasonable compromise is: pay for chaining for functionality which chains well (e.g. filter), but maybe don't pay for chaining for functionality which is a dead end (e.g. plotting). So the norm for plotting would be:

data = run.a().b().c()
sd.lineplot(data, plotting_args)

I honestly think that is even better than

run.a().b().c().lineplot(plotting_args)

simply because you can set breakpoints on the data = line and inspect the data to plot, or make another plot with the same data etc.

@znicholls
Copy link
Collaborator

znicholls commented Sep 8, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants