Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Working towards better type annotation #373

Closed
enadeau opened this issue Jun 24, 2023 · 19 comments · Fixed by #423
Closed

Working towards better type annotation #373

enadeau opened this issue Jun 24, 2023 · 19 comments · Fixed by #423

Comments

@enadeau
Copy link
Contributor

enadeau commented Jun 24, 2023

Hi I've been using the package and I quite like it.

I'd be interested in contributing to the library by adding up type hints because it's always nice to have good more confidence in code correctness and more autocomplete for the users.

If that's something you think is a good idea, I'd try to work in small chunks. I think the best way to do would be to start with a first PR that sets up configure mypy and setup CI. The goal would be to get that in a working state with minimal changes. Probably needs to fix a few types and maybe add typing to one file or two. Then I could incrementally add types and check more files in subsequent PRs.

@jsamoocha
Copy link
Collaborator

Hi, this sounds like a great idea! Looking forward to PRs.

@lwasser
Copy link
Collaborator

lwasser commented Jun 24, 2023 via email

@enadeau
Copy link
Contributor Author

enadeau commented Jun 24, 2023

I've read that getting mypy to working pre-commit is problematic. The best approach is probably to have it as another workflow that is not part of pre-commit.

It could eventually be it's on github action/workflow but it doesn't need to happen right now.

The progressive approach seems right to me. I'd try to add type hints it one file at a time, activate mypy on the files that I've fixed.

@enadeau enadeau mentioned this issue Jun 24, 2023
10 tasks
@enadeau
Copy link
Contributor Author

enadeau commented Jun 27, 2023

I was thinking about all the type annotation/description that are present in the docstring. They are quite useful to write the type annotation, but they become kind of redundant once the code is annotated and can easily be out of sync. I see mostly three options.

  1. Remove them as the funcitons get annotated
  2. Remove them once the whole code base is annotated
  3. Just keep them

I think I'd go for 1 or 2. Removing them immediately has the advantage to not need to make another pass through the file while keeping while we complete the typing will make it easier to refer back as needed.

Any take on that?

@lwasser
Copy link
Collaborator

lwasser commented Jun 28, 2023

let me dig into that a bit more. this issue becomes we use autodoc that filters into sphinx. so i need to understand how that would impact our documentation.

i guess what i'm confused about is if you properly type out the docstrings (input / output etc) then in theory you should not have to fully type all of the code.

What have you implemented in other projects you've worked on?

i'll also check in with some of the python community to see what the precedent / best practices are here so we don't reinvent the wheel!!

@lwasser
Copy link
Collaborator

lwasser commented Jun 28, 2023

ok i still need to dig in but i had a look at a pr i had been working on AGES AGO (i've been super busy so haven't had time to play with this)

  1. the "typing" in the docstring is actually just numpy-style docstring format which is not necessarily formal typing but what appears in our sphinx docs when they are printed. this is important for API docs and user facing docs.
  2. Whereas with type hints we are typing each parameter input using formal typing annotation and then there may be some typing required in the core code base as well on top of that (for things like class attributes etc).

i still don't have an answer but that is why you see that thin layer of type hints in the docstrings - for user-facing documentation purposes. The question of the potential mismatch between types as the code base evolves does arise here (and i have an open question on this in our community) BUT they do serve two different purposes. in theory, if a method is updated it wouldn't be a massive overhaul and one could update both.

in practice i hear you about the potential for disconnect as the code base evolves over time.
i'd still like to use best practices here and see what others say before we make a call however - @jsamoocha curious what your thoughts are. i know i dropped the ball on some of that flake8 / docstring / black work so that doesn't help here! :)

@jsamoocha
Copy link
Collaborator

I know very little about Sphinx/autodoc. Maybe this autodoc-typehints package could help? That seems to resolve the documented types of e.g. function parameters into a single source of truth.

@lwasser
Copy link
Collaborator

lwasser commented Jun 28, 2023

ok no problem! i'm still digging around and talking with folks about this. there's no doubt in my mind that using mypy is a good idea. it's more that i'm worried about poor quality user facing API docs which i think also are what people see when they call help() if remove the type descriptions in the docstrings.

This shouldn't however stop us from continuing to work on adding annotations to stravalib of course! but maybe once i get a bit more info on the sphinx end we can make a call on what we want to do with docstrings.

in theory because our code base doesn't drastically change pr to pr it wouldn't be too terrible to maintain both in some way. I also found this discussion. . seems like autodoc-typehints may not play nicely with numpy style docstrings. 😞

@enadeau
Copy link
Contributor Author

enadeau commented Jun 28, 2023

I agree that it would definitely be possible to keep them in two places. It's only a little bit of extra work that is nice to avoid.

As you can see I've started looking a bit at the doc but got slightly distracted by minor bugs in it.

I'll come back to you with some screenshot of how the docs could look like with the type annotation. I didn't think about help because I don't work a lot in interactive environment but I'll try to have a look at how they look too.

@lwasser
Copy link
Collaborator

lwasser commented Jun 28, 2023

wonderful - thank you @enadeau One other thing to consider. it's much easier for us to review small pr's to chunks of the code than larger ones. so i suggest working on a single module to completion or a partial module, submit fully, let us know it's ready for review and we review. Then we can work on another chunk of the code base.

this will allow us to merge pr's much more quickly and keep the workload manageable. does that sound reasonable to you? we really really appreciate this thought that you're putting into stravalib. thank you!! looking forward to exploring this more. it seems like we are NOT the only ones with this issue and we may just need to make some decisions that work for both us and our users. :)

@enadeau
Copy link
Contributor Author

enadeau commented Jun 28, 2023

I agree it's best to keep it small. I'll stop working on #374 and set it ready for review.

I've included here a screenshot of how the annotation looks in the docs. At first try I could not get it to build with doc with autodoc-typehints. Ran into some version conflict issue with sphinx conflicted with the one required by pydata-sphinx-theme.

How it looks with the type annotation added

image

How it look with help

In [3]: ApiV3.exchange_code_for_token?
Signature:
ApiV3.exchange_code_for_token(
    self,
    client_id: 'int',
    client_secret: 'str',
    code: 'str',
) -> 'AccessInfo'
Docstring:
Exchange the temporary authorization code (returned with redirect from strava authorization URL)
for a short-lived access token and a refresh token (used to obtain the next access token later on).

:param client_id: The numeric developer client id.
:type client_id: int

:param client_secret: The developer client secret
:type client_secret: str

:param code: The temporary authorization code
:type code: str

:return: Dictionary containing the access_token, refresh_token
         and expires_at (number of seconds since Epoch when the provided access token will expire)
:rtype: dict
File:      ~/Documents/stravalib/stravalib/protocol.py
Type:      function

@enadeau enadeau mentioned this issue Jun 29, 2023
10 tasks
@lwasser
Copy link
Collaborator

lwasser commented Jun 30, 2023

ok so i did a bit of digging here as well. And there is currently NO good solution that i see for this redundancy between in-code annotations vs docstring "types" (using that word carefully as it's not a proper type in the sense of mypy/pyright checks, etc). We HAD begun to move all docstrings to numpy-style as they are easier to read / cleaner and that autodoc-typehints package looks awesome but there are some issues with it playing nicely with numpy style docs.

Finally - numpy has considered but not yet implemented - something that supports generating docstrings FROM the parameter types but that is just an ongoing several year discussion.

SO - i'm not sure where to land here. i feel like we want this information in our user-facing documentation and we also want the in-code type hints and checks that robust annotation provides.

is everyone ok (for now) with leaving our plan to migrate and enhance docstrings as is? i will continue to explore whether there is some other option.

My thoughts: generally it should be fairly easy to maintain both once this up front work is done given our API is relatively stable now. so i'd vote for keeping the docstring information as well and still moving forward with migration to numpy.

@enadeau
Copy link
Contributor Author

enadeau commented Jun 30, 2023

I do agree that this has become a bigger question than I expected at first when I asked it. Let's keep the types in the docs for now and focus on adding the type annotations.

@enadeau enadeau mentioned this issue Jul 2, 2023
10 tasks
@lwasser
Copy link
Collaborator

lwasser commented Jul 4, 2023

@enadeau in working on model.py docstrings / documentation issues i'm starting to type a bit.

Do you mind if i take a stab at typing it? you could then review if you are open to that?

@enadeau
Copy link
Contributor Author

enadeau commented Jul 5, 2023

Yes definitly give it a go if you want to 💯

@lwasser
Copy link
Collaborator

lwasser commented Jul 30, 2023

ok friends. i just want to let you know that i've been super swamped at work and haven't had time to work on this at all. So i think to avoid holding things up, i'll focus on addressing that documentation issue related to model.py that we discussed and will finish up docstrings. But, i may not be able to type the entire module. i'll try to get that PR that is open to a state where it can be reviewed in the next few weeks.

@enadeau
Copy link
Contributor Author

enadeau commented Aug 13, 2023

Now that the model bit is completed I'm back at the typing thing. I've open a PR on the swagger2pydantic repo.

It basically chages the script to use field constraint instead of conint. This allows the generated file to pass type checking. Here's an example:

-    kom_rank: Optional[conint(ge=1, le=10)] = None
+    kom_rank: Optional[int] = Field(None, ge=1, le=10)

I'm not quite sure how it works from here. Will an automatic pull request update strava_model.py here once it's merged?

@lwasser
Copy link
Collaborator

lwasser commented Aug 14, 2023

i haven't looked at the build in a while but it is running on a cron job that runs daily. i haven't seen too many pr's so i presume strava doesn't update the swagger.json file that often? so whenever that change is merged it will run the next day at midnight essentially is what i see in the action.

@jsamoocha
Copy link
Collaborator

I merged that model generator config change and ran that daily job manually. You can see the results in #403.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants