Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: Auto-populate related software using mne-installers yaml #12731

Merged
merged 13 commits into from
Jul 19, 2024

Conversation

larsoner
Copy link
Member

@larsoner larsoner commented Jul 18, 2024

Related to mne-tools/mne-installers#104
Closes #11831

  1. Make the list of related packages be built / checked against mne-installers recipe.yml
  2. Check installed package metadata using importlib.metadata to get URLs and descriptions (DRY; we don't need to maintain them, if they are bad, packages should fix their packaging!). I considered using conda-forge info or scraping PyPI, but it's easy to get this info if packages are installed -- so install them on CircleCI using pip (or maybe eventually uv or whatever!) and use that info
  3. Add the package list as a Sphinx directive (caching the mne-installer YAML for ease of development, won't affect CircleCI)
  4. Add a pip install mne[doc_related] developer / CirlceCI option to facilitate CircleCI installing the necessary packages. This has the side benefit of making sure that all packages are installable. Can trivially be moved to tools/circleci_dependencies.sh if it's too visible in pyproject.toml.
  5. List packages which should be updated to be Python 3.12 and/or conda-forge compatible, making an explicit future TODO to improve or remove those packages in our code.
  6. Make CircleCI go through this pain, but local devs building the docs won't hit errors (will just have incomplete info in the list, which they probably won't ever notice and won't affect their dev process)

This way we make sure that our list stays up to date. Dependencies that can't be installed will be flagged implicitly by CIs. Updates to mne-installers will trigger errors, so ideally when we add stuff there we should add it here simultaneously. But if we forget, we'll get a red CI with a decent error message telling us to update stuff.

Not 100% sure I got the doc_related optional dep list correct since I derived it by running locally (where I might have already had stuff installed), but CircleCI should tell us quickly if it needs to be updated.

@larsoner larsoner added this to the 1.8 milestone Jul 18, 2024
@larsoner larsoner marked this pull request as ready for review July 18, 2024 05:26
Copy link
Member

@drammock drammock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm -0.5 on this change. Seems like replacing ~20 lines of maintained text with ~200 lines of maintained code. Also I don't love the fact that folks will now need to install a bunch more packages to build the docs, just so that we can print the name, URL, and description of those packages on a webpage (i.e., not because they're needed for our tutorials).

Final point in opposition: previously to add something folks would just make a PR to MNE-Python. Now, they must package their software for PyPI and conda-forge, and make the PR to some other repo which they may not understand well (recipes? constructors? etc).

Is there some middle ground here? Like maintaining a JSON or YAML file that gets read in by the sphinx directive?

doc/install/mne_tools_suite.rst Outdated Show resolved Hide resolved
doc/sphinxext/related_software.py Outdated Show resolved Hide resolved
@larsoner
Copy link
Member Author

Thanks for the quick review @drammock ! I think there are a few misconceptions that might help to clear up:

Seems like replacing ~20 lines of maintained text with ~200 lines of maintained code... Is there some middle ground here? Like maintaining a JSON or YAML file that gets read in by the sphinx directive?

Well a decent chunk of the code is doing such a parsing of a YAML file already from mne-installers, which was an idea +1'ed in #11831 (comment), and is also why this would close mne-tools/mne-installers#104. So in this sense yes it does already implement a YAML parsing. We could simplify it (e.g., get rid of the _ignore_re by making some simple changes in the installer YAML, like adding a comment

  # BEGIN RELATED SOFTWARE LIST
  - mne-ari
  - ...
  # END RELATED SOFTWARE LIST

this would be super easy to parse and would cut out probably 20-30 lines.

Another 40ish lines is the Sphinx directive part, which you'd have in a simple solution.

Then maybe 100 lines are the metadata parsing step, which I guess is the point of contention, then, since I think even your middle ground would probably end up close to 100 lines assuming you want to parse the MNE-Installers YAML. (And if you want to create a new JSON or YAML, it's less DRY and also doesn't solve mne-tools/mne-installers#104, so not an appealing solution). But the pip installation + metadata parsing has some worthwhile benefits I'll mention below.

Also I don't love the fact that folks will now need to install a bunch more packages to build the docs, just so that we can print the name, URL, and description of those packages on a webpage (i.e., not because they're needed for our tutorials).

They don't! Only CircleCI needs to do this. You can build docs right now on my branch and whatever packages you have will have the summary and URL populated, ones that you don't will just show up as - mne-ari or whatever in the list (no more info). I don't foresee any practical dev issues with this?

I think it's a big benefit actually that CircleCI must install these packages to get their info. Currently, we have no validation that the packages we suggest to people are even installable -- and some aren't at the moment, and this PR revealed it in local testing. So with this PR we have constant checks that whatever we're telling people to install can actually easily be installed.

Final point in opposition: previously to add something folks would just make a PR to MNE-Python. Now, they must package their software for PyPI and conda-forge, and make the PR to some other repo which they may not understand well (recipes? constructors? etc).

This part was an opinionated choice I made. My thinking was that a reasonable bare minimum for inclusion in the list is that your package should be on PyPI, as it shows that you've at least met a minimal requirement for thinking about maintenance etc. If you buy this, I figured conda-forge wasn't far behind but I agree could be too much -- I can reword it to say just PyPI. (It does need to be on conda-forge to land in our installers, though, so it would be nice to encourage people to conda-forge-ify their packages in text somewhere at least!)

But if you don't buy this argument about needing to be on PyPI, ~50 lines in this PR are a dict of manual entries, precisely because some packages we list in main don't meet the PyPI criterion. So we could relax it if you want so that:

  1. Packages on conda-forge and PyPI should add themselves to MNE-Installers and they'll automagically get included here
  2. Packages only on PyPI get their info auto-populated once their package name is added to a simple list here
  3. Packages neither on conda-forge nor PyPI can add a manual entry to related_packages.py

All of the mechanics for resolving these are already in the current code, it's just a matter of us choosing what should be required. And while this does make it a little bit harder to add a package to the list, it's not a ton harder, and pushes people hopefully toward including their packages in our installers for example.

@larsoner
Copy link
Member Author

@drammock I pushed a commit to simplify the parsing along the lines I suggested above, feel free to look and see if you're convinced it's less painful / more useful now

@drammock
Copy link
Member

TL;DR: right now I'm leaning toward decoupling this "related software" list from "extensions shipped with our installer", and using the code in this PR to address mne-tools/mne-installers#104 but putting the resulting list on a different page. LMKWYT

They don't! Only CircleCI needs to do this.

Indeed this was a misunderstanding

I think it's a big benefit actually that CircleCI must install these packages

agreed, I see your point

maybe 100 lines are the metadata parsing step, which I guess is the point of contention, then, since I think even your middle ground would probably end up close to 100 lines assuming you want to parse the MNE-Installers YAML. (And if you want to create a new JSON or YAML, it's less DRY and also doesn't solve mne-tools/mne-installers#104, so not an appealing solution).

IMO:

  1. mne-tools/mne-installers#104 should get closed when the full list of what gets installed is documented somewhere (probably on https://mne.tools/dev/install/installers.html). That full list (incl. IDEs, PyTorch, etc) doesn't belong on the "suite of related tools" page. Using code such as in this PR to generate that list seems fine to me.

  2. being less DRY doesn't bother me here; I don't mind the conceptual separation between "compatible packages we tell people about on our website" and "extension packages we ship with our installer". It's not clear to me that those should / need to be coextensive.

  3. IIRC there have been some issues with the installer artifact sizes being too big sometimes (maybe I'm misremembering the problem?), so having a blanket policy of "include anything folks say is related and is properly packaged" risks hitting that problem again as the list grows.

We could simplify it (e.g., get rid of the _ignore_re by making some simple changes in the installer YAML

if we do end up parsing this list out of the installer YAML, then I like this idea. But see point 2 above.

This part was an opinionated choice I made. My thinking was that a reasonable bare minimum for inclusion in the list is that your package should be on PyPI, as it shows that you've at least met a minimal requirement for thinking about maintenance etc. If you buy this, I figured conda-forge wasn't far behind but I agree could be too much

I'm OK with saying "must be on PyPI to make this list" and I do think that requiring conda-forge also is too high a bar; suggesting conda forge (and linking to grayskull or whatever) is fine. But again see point 2 above, still not 100% convinced that this list should come from the installer YAML.

@larsoner
Copy link
Member Author

larsoner commented Jul 18, 2024

TL;DR: right now I'm leaning toward decoupling this "related software" list from "extensions shipped with our installer", and using the code in this PR to address mne-tools/mne-installers#104 but putting the resulting list on a different page. LMKWYT

I think we need two lists, one for related neurosciencey software, and a different one for everything the installers install, as they answer different questions for people. The second list is trivial to make -- parsing the installer YAML is < 10 lines of code, so it's not difficult to pull things out. You get it almost for free if you make the first list partially based on the installer YAML. So I think we need to decide if parsing the installer YAML for the "related software" list makes sense first.

being less DRY doesn't bother me here; I don't mind the conceptual separation between "compatible packages we tell people about on our website" and "extension packages we ship with our installer". It's not clear to me that those should / need to be coextensive.

I don't think they need to be identical, but I think that the list in MNE-Python should be a superset of the list of neurosciencey stuff we put in the installers. After all, if a neurosciencey thing is in our installer, it should be related (somehow) to what MNE is used for. But going the other way, if a package is related to MNE, yes we should ideally include it our installer, but we can't always do it (not on conda-forge, artifact limitations, etc.)! And as far as "neurosciencey" or not it's easy enough to make this decision at the time of addition to MNE-Installers by which section we put it in if we merge mne-tools/mne-installers#285.

But even if you don't agree with this part or find that line of thinking constructive, reading the installer YAML in this PR should result in less work/effort and a more complete list -- this PR lists 44 packages and main lists 22. That means there are 22 packages from mne-installers that would have been gradually automatically added to MNE-Python if these mechanics were in place, rather than been missing from the list (until someone thought sometime to add them in a PR to MNE).

And importantly, in the end I don't see it any tangible drawback to inspecting the installer YAML for package names -- it only uses a few lines (it's < 10 now!) and has some tangible benefits.

IIRC there have been some issues with the installer artifact sizes being too big sometimes (maybe I'm misremembering the problem?), so having a blanket policy of "include anything folks say is related and is properly packaged" risks hitting that problem again as the list grows.

We haven't hit that in a while because we improved our artifact removal policy. But even if at some point we have to be more selective or have a higher bar in terms of what we include in the installers... they should still remain as consistent as possible. Neurosciencey things in the installer should be in our related software list here (the "superset" idea), which is what this PR helps ensure.

@drammock
Copy link
Member

if a neurosciencey thing is in our installer, it should be related (somehow) to what MNE is used for.

Yep agree

But going the other way, if a package is related to MNE, yes we should ideally include it our installer, but we can't always do it (not on conda-forge, artifact limitations, etc.)!

This is the part I'm struggling to fully agree with. To me the bar is much lower to say "FYI here's a related package" than to say "we're installing this on your system". I'm other words I want to leave open the possibility of cases where we can but choose not to include a package in our installer even if we are ok putting it on this list.

That said: I can see that your code supports that situation. So my main concern is how we describe this mechanism publicly to package authors. We shouldn't give the impression that all it takes is a PR and we'll blindly start shipping their code to our users.

@larsoner
Copy link
Member Author

To me the bar is much lower to say "FYI here's a related package" than to say "we're installing this on your system". I'm other words I want to leave open the possibility of cases where we can but choose not to include a package in our installer even if we are ok putting it on this list.

That's fine with me -- there are already some packages that we don't install because we can't. I think it's okay if we decide there are some that we shouldn't (for whatever reason). It's consistent with the superset idea in this PR anyway!

my main concern is how we describe this mechanism publicly to package authors. We shouldn't give the impression that all it takes is a PR and we'll blindly start shipping their code to our users.

I think this can be mostly orthogonal to this PR. To separate it, rather than make any suggestion of "you should add your package to PyPI, conda-forge, then our installers!" here, we can just say for now "modify related_packages.py to add your package!". Then we still get the benefits of auto-adding any neurosciencey stuff that we do ship in mne-installers, but we can punt on suggesting to people that if they try to add to mne-installers themselves.

Copy link
Member

@drammock drammock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I'm convinced / happy with where this has landed. See my inline comments for the last few nits.

doc/sphinxext/related_software.py Outdated Show resolved Hide resolved
doc/sphinxext/related_software.py Outdated Show resolved Hide resolved
doc/sphinxext/related_software.py Outdated Show resolved Hide resolved
doc/sphinxext/related_software.py Outdated Show resolved Hide resolved
doc/sphinxext/related_software.py Outdated Show resolved Hide resolved
pyproject.toml Outdated Show resolved Hide resolved
@larsoner
Copy link
Member Author

Okay thanks for working through it with me @drammock, I think it's much cleaner now than what I originally had 👍 Commited fixes and moved to circleci_dependencies.sh, I'll double check the CircleCI output one last time then mark for merge-when-green.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MAINT: Check list of other packages against mne-installers
2 participants