Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support the usage of Markdown-based notebooks with "Lite" directives #221

Merged

Conversation

agriyakhetarpal
Copy link
Member

@agriyakhetarpal agriyakhetarpal commented Dec 3, 2024

Description

Please refer to the proposal in #191 for additional context and the rationale behind this change. This PR adds preprocessing functionalities to the base "Lite" directive (in the _LiteIframe class), which will allow a user to pass in a plain Markdown/MyST-Markdown notebook file to a directive, which will then get converted on the fly when JupyterLite is triggered. The conversion is performed via jupytext, which has been included as a dependency.

There is also the option of leaving this as an experimental feature behind an experimental flag and/or an optional dependency on jupytext, as suggested in #191 (comment). However, I haven't implemented that change and would like to wait for further direction on this before I do so.

Changes made

  • Preprocessing added to the run() method of the _LiteIframe class to convert .md files to .ipynb files and pass them on to JupyterLite
  • Refactored the notebook conversion and cell stripping logic to two separate hidden functions that we subsequently call in _LiteIframe.run()
  • Handled potential notebook collisions (in case notebooks of the same name but with different formats are present anywhere in the docs) – here, we handle only those notebooks that are referenced in a directive, and we raise a ValueError for any conflicts, recommending that the user rename their notebooks to have unique names.
  • jupytext added as a dependency for facilitating the conversion
  • Documentation for all directives updated to showcase this functionality
  • A sample MyST Markdown notebook added

Additional context

This is supposed to make interacting with Markdown-based notebook workflows easier – a few notable examples are scipy/scipy#20303 and PyWavelets/pywt#741, which have their own ways of dealing with this intermediate step that might make sense to include in jupyterlite-sphinx itself through this PR, such as https://github.com/scipy/scipy/blob/82ee02556872b9f91576b835a8f40f1d9eb1fd8d/doc/convert_notebooks.py and https://github.com/agriyakhetarpal/pywt/blob/fd3e01c3003a18cca770fc6f33752f8c07084e21/doc/source/conf.py#L33-L49 respectively.

cc: @melissawm @steppi @gabalafou

@agriyakhetarpal agriyakhetarpal added the enhancement New feature or request label Dec 3, 2024
@agriyakhetarpal agriyakhetarpal added this to the 0.16.6 milestone Dec 3, 2024
@agriyakhetarpal
Copy link
Member Author

I am facing some unrelated troubles when testing it locally with the SciPy documentation from SciPy's main branch right now, and I intend to test this a bit more thoroughly with multiple projects, also to ensure that things like scipy/scipy#21433 do not break with this change.

@agriyakhetarpal
Copy link
Member Author

I am facing some unrelated troubles when testing it locally with the SciPy documentation from SciPy's main branch right now, and I intend to test this a bit more thoroughly with multiple projects, also to ensure that things like scipy/scipy#21433 do not break with this change.

Update: I've tested it multiple times by now and this is working perfectly well with the SciPy docs 🎉 though the use of a UUID to append to the filename would break the caching (obviously) of the notebook builds and also adds additional notebooks to the built docs. So, I've chosen to adapt the caching-related check from scipy/scipy#21433 in e3b6882, and I can confirm that there are no regressions as those that were noticed in scipy/scipy#21394. This does mean that I've had to revert the commit via d1c7f7d where I added a sample my_notebook.md file to accompany our existing my_notebook.ipynb file, because of the naming conflict that will be caused after conversion (which is something I'm looking for opinions on – please see #221 (comment) for context!)

When this PR becomes part of a release, we can remove the convert_notebooks.py script and the convert-notebooks Make target in SciPy, essentially eliminating the changes introduced through scipy/scipy#21433, and instead of having to write:

.. jupyterlite:: ../../_contents/hypothesis_chi2_contingency.ipynb
   :new_tab: True

we can just write:

.. jupyterlite:: hypothesis_chi2_contingency.md
   :new_tab: True

which would be much cleaner – and I can submit a follow-up PR to SciPy for this, or step back if you wish to do so, @melissawm. :) This PR should now be ready for review now!

@agriyakhetarpal agriyakhetarpal marked this pull request as ready for review December 3, 2024 23:42
@agriyakhetarpal
Copy link
Member Author

Note; we can also allow both notebook_X.ipynb and notebook_X.md to coexist in the same folder, and just add a suffix for the Markdown-based case, so that using .. notebooklite:: notebook_X.ipynb opens the original notebook, but using .. notebooklite:: notebook_X.md would convert notebook_X.md to notebook_X_converted.ipynb (or perhaps something like notebook_X.md.jupyterlite) and open it, which would perform both actions: resolving such an edge case, and keeping both notebooks intact in the documentation's source folder.

The docs build is failing because of this, but I would like to receive further direction before I proceed with this approach.

@melissawm
Copy link
Contributor

Go ahead - that would be great!

@agriyakhetarpal
Copy link
Member Author

Thanks – will do when this is reviewed and released!

@jtpio
Copy link
Member

jtpio commented Dec 4, 2024

Thanks @agriyakhetarpal, nice to see support for markdown-based notebooks being added here!

Looks like the docs failure may be relevant though: https://readthedocs.org/projects/jupyterlite-sphinx/builds/26477597/

FileNotFoundError: [Errno 2] No such file or directory: '/home/docs/checkouts/readthedocs.org/user_builds/jupyterlite-sphinx/checkouts/221/docs/directives/my_notebook.md'

@agriyakhetarpal
Copy link
Member Author

Thanks @agriyakhetarpal, nice to see support for markdown-based notebooks being added here!

Looks like the docs failure may be relevant though: readthedocs.org/projects/jupyterlite-sphinx/builds/26477597

FileNotFoundError: [Errno 2] No such file or directory: '/home/docs/checkouts/readthedocs.org/user_builds/jupyterlite-sphinx/checkouts/221/docs/directives/my_notebook.md'

Yes, this is related to #221 (comment) and #221 (comment). Let me push a few more changes to handle that; however, I'll fix it for now by reverting the previous commit!

@agriyakhetarpal
Copy link
Member Author

Hi @gabalafou, I've ensured that these Markdown/MyST notebooks can also be stripped via 6a6d7a2, and added docs on how to use the stripping functionality with Markdown notebooks in 5867428 – on the lines of what we discussed yesterday. I have tested the changes that I have also upstreamed to a SciPy PR here: scipy/scipy#22042. From a self-review, this looks alright to me to go ahead with.

@agriyakhetarpal
Copy link
Member Author

The case where my_notebook.ipynb and my_notebook.md both exist in the same folder has been handled, but I found yet another edge case based on these collisions, which is slightly trickier to resolve (but also much less likely to happen). If we have multiple files of the same name that exist across folders, we would have a problem. For example,

  • my_notebook.ipynb exists in folder1/, my_notebook.md exists in folder1/ and my_notebook.ipynb exists in folder2/

We would have a conflict again because all of the notebooks are going to be copied under the contents/ directory. We should preserve the folder structure when copying to the contents/ folder (i.e., create contents/folder1/ and contents/folder2/ and copy the notebooks in there) and indeed add UUID suffixes for the files to ensure that no collisions happen.

However, it is safe to say that this case existed prior to this PR as well (my example concerns two .ipynb files). I am not sure if anyone has encountered it in the wild – at least, there have been no bug reports about it so far. Would it be okay to proceed with this PR as is, mark this behaviour as a known case of failure, and open a separate issue for it?

@agriyakhetarpal agriyakhetarpal force-pushed the feat/support-markdown-notebooks branch from 5867428 to 6d2336b Compare December 13, 2024 17:13
@agriyakhetarpal
Copy link
Member Author

The case where my_notebook.ipynb and my_notebook.md both exist in the same folder has been handled, but I found yet another edge case based on these collisions, which is slightly trickier to resolve (but also much less likely to happen). If we have multiple files of the same name that exist across folders, we would have a problem. For example,

  • my_notebook.ipynb exists in folder1/, my_notebook.md exists in folder1/ and my_notebook.ipynb exists in folder2/

We would have a conflict again because all of the notebooks are going to be copied under the contents/ directory. We should preserve the folder structure when copying to the contents/ folder (i.e., create contents/folder1/ and contents/folder2/ and copy the notebooks in there) and indeed add UUID suffixes for the files to ensure that no collisions happen.

However, it is safe to say that this case existed prior to this PR as well (my example concerns two .ipynb files). I am not sure if anyone has encountered it in the wild – at least, there have been no bug reports about it so far. Would it be okay to proceed with this PR as is, mark this behaviour as a known case of failure, and open a separate issue for it?

This was discussed with @steppi today, and we agreed upon making a naming conflict a hard error, regardless of file extension. He suggested that I check if any projects have notebooks with duplicate names, so that we could correct them downstream as a safety measure before we release. I didn't find any for NumPy, SciPy, or scikit-learn; and neither has anyone reported an issue about this before – ever since jupyterlite-sphinx has had its inception. So, it is all the more the right time to conform to correct behaviour, and if someone raises this, we could just ask them to rename the duplicate notebook they have – at most, it would cost them a broken URL or few, which they can fix with redirects via RTD or through any other methods that we are not concerned about. Or, if we get a lot of bug reports about it, we could look into mangling the filenames with UUIDs and the like (which was my previous approach).

@agriyakhetarpal
Copy link
Member Author

I've gone one step ahead in the check and also allowed having notebooks of the same filename stem—such as my_notebook.ipynb and my_notebook.md—to co-exist anywhere in the folder structure, but only under the condition where both of them are not referenced in our directives. This way, we are not going to deal with notebooks that are not our business – if, say, if only my_notebook.ipynb has been included in a .. jupyterlite:: my_notebook.ipynb directive somewhere in the docs by a user, it is only then that we process it further.

Also, we perform this lookup with a set, so that operation happens in linear time anyway – I did think about caching it using a frozenset with paths serialised to strings and a simple LRU approach, but I didn't feel it was worth keeping because we are never going to expect notebooks in the order of thousands.

This is ready for review, now!

Copy link
Collaborator

@steppi steppi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good Agriya. A couple small suggestions. Only real blocker I think is that the error message for when duplicate notebooks are found could use a little work. Let's discuss it offline when you have time.

jupyterlite_sphinx/jupyterlite_sphinx.py Outdated Show resolved Hide resolved
jupyterlite_sphinx/jupyterlite_sphinx.py Outdated Show resolved Hide resolved
jupyterlite_sphinx/jupyterlite_sphinx.py Outdated Show resolved Hide resolved
jupyterlite_sphinx/jupyterlite_sphinx.py Outdated Show resolved Hide resolved
jupyterlite_sphinx/jupyterlite_sphinx.py Outdated Show resolved Hide resolved
@agriyakhetarpal
Copy link
Member Author

Albert and I discussed the error message offline and in consideration, reduced its size because of its verbosity; plus, it would be weird to see it as such from a user's perspective. It made more sense to make it more impersonal, highlighting the problem of conflicting notebook names at hand. Implemented in 81c0c09.

@agriyakhetarpal
Copy link
Member Author

With 29bb810, we don't keep going until we find all conflicts at once, and stop at the first one itself.

@steppi
Copy link
Collaborator

steppi commented Dec 20, 2024

Closing and reopening to investigate XPython kernel not being found in the readthedocs preview in #228 (comment)

@steppi steppi closed this Dec 20, 2024
@steppi steppi reopened this Dec 20, 2024
@agriyakhetarpal
Copy link
Member Author

🤔 I cannot reproduce it here either


# We only consider conflicts if notebooks are actually referenced in
# a directive, to prevent false posiitves from being raised.
if hasattr(self.env, "jupyterlite_notebooks"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be fine for now, but the $O(n^2)$ algorithm used here might be painful for projects with a very large number of notebooks. We could do $O(n)$ if we store the basenames in a set. Just making a note of it in case this ends up being a bottleneck for someone.

@steppi steppi merged commit 8061c30 into jupyterlite:main Dec 21, 2024
7 checks passed
@steppi
Copy link
Collaborator

steppi commented Dec 21, 2024

Thanks @agriyakhetarpal. The issue with the docs was transient. I'm still not sure what the problem was

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support Jupytext (MyST-Markdown flavoured) notebooks in NotebookLite, JupyterLite, and Voici directives
4 participants