Support the usage of Markdown-based notebooks with "Lite" directives #221

agriyakhetarpal · 2024-12-03T02:26:39Z

Description

Please refer to the proposal in #191 for additional context and the rationale behind this change. This PR adds preprocessing functionalities to the base "Lite" directive (in the _LiteIframe class), which will allow a user to pass in a plain Markdown/MyST-Markdown notebook file to a directive, which will then get converted on the fly when JupyterLite is triggered. The conversion is performed via jupytext, which has been included as a dependency.

There is also the option of leaving this as an experimental feature behind an experimental flag and/or an optional dependency on jupytext, as suggested in #191 (comment). However, I haven't implemented that change and would like to wait for further direction on this before I do so.

Changes made

Preprocessing added to the run() method of the _LiteIframe class to convert .md files to .ipynb files and pass them on to JupyterLite
Refactored the notebook conversion and cell stripping logic to two separate hidden functions that we subsequently call in _LiteIframe.run()
Handled potential notebook collisions (in case notebooks of the same name but with different formats are present anywhere in the docs) – here, we handle only those notebooks that are referenced in a directive, and we raise a ValueError for any conflicts, recommending that the user rename their notebooks to have unique names.
jupytext added as a dependency for facilitating the conversion
Documentation for all directives updated to showcase this functionality
A sample MyST Markdown notebook added

Additional context

This is supposed to make interacting with Markdown-based notebook workflows easier – a few notable examples are scipy/scipy#20303 and PyWavelets/pywt#741, which have their own ways of dealing with this intermediate step that might make sense to include in jupyterlite-sphinx itself through this PR, such as https://github.com/scipy/scipy/blob/82ee02556872b9f91576b835a8f40f1d9eb1fd8d/doc/convert_notebooks.py and https://github.com/agriyakhetarpal/pywt/blob/fd3e01c3003a18cca770fc6f33752f8c07084e21/doc/source/conf.py#L33-L49 respectively.

cc: @melissawm @steppi @gabalafou

agriyakhetarpal · 2024-12-03T02:30:14Z

I am facing some unrelated troubles when testing it locally with the SciPy documentation from SciPy's main branch right now, and I intend to test this a bit more thoroughly with multiple projects, also to ensure that things like scipy/scipy#21433 do not break with this change.

jupyterlite_sphinx/jupyterlite_sphinx.py

agriyakhetarpal · 2024-12-03T23:42:49Z

I am facing some unrelated troubles when testing it locally with the SciPy documentation from SciPy's main branch right now, and I intend to test this a bit more thoroughly with multiple projects, also to ensure that things like scipy/scipy#21433 do not break with this change.

Update: I've tested it multiple times by now and this is working perfectly well with the SciPy docs 🎉 though the use of a UUID to append to the filename would break the caching (obviously) of the notebook builds and also adds additional notebooks to the built docs. So, I've chosen to adapt the caching-related check from scipy/scipy#21433 in e3b6882, and I can confirm that there are no regressions as those that were noticed in scipy/scipy#21394. This does mean that I've had to revert the commit via d1c7f7d where I added a sample my_notebook.md file to accompany our existing my_notebook.ipynb file, because of the naming conflict that will be caused after conversion (which is something I'm looking for opinions on – please see #221 (comment) for context!)

When this PR becomes part of a release, we can remove the convert_notebooks.py script and the convert-notebooks Make target in SciPy, essentially eliminating the changes introduced through scipy/scipy#21433, and instead of having to write:

.. jupyterlite:: ../../_contents/hypothesis_chi2_contingency.ipynb
   :new_tab: True

we can just write:

.. jupyterlite:: hypothesis_chi2_contingency.md
   :new_tab: True

which would be much cleaner – and I can submit a follow-up PR to SciPy for this, or step back if you wish to do so, @melissawm. :) This PR should now be ready for review now!

agriyakhetarpal · 2024-12-03T23:51:48Z

Note; we can also allow both notebook_X.ipynb and notebook_X.md to coexist in the same folder, and just add a suffix for the Markdown-based case, so that using .. notebooklite:: notebook_X.ipynb opens the original notebook, but using .. notebooklite:: notebook_X.md would convert notebook_X.md to notebook_X_converted.ipynb (or perhaps something like notebook_X.md.jupyterlite) and open it, which would perform both actions: resolving such an edge case, and keeping both notebooks intact in the documentation's source folder.

The docs build is failing because of this, but I would like to receive further direction before I proceed with this approach.

melissawm · 2024-12-04T14:46:52Z

Go ahead - that would be great!

agriyakhetarpal · 2024-12-04T15:10:02Z

Thanks – will do when this is reviewed and released!

jtpio · 2024-12-04T15:26:00Z

Thanks @agriyakhetarpal, nice to see support for markdown-based notebooks being added here!

Looks like the docs failure may be relevant though: https://readthedocs.org/projects/jupyterlite-sphinx/builds/26477597/

FileNotFoundError: [Errno 2] No such file or directory: '/home/docs/checkouts/readthedocs.org/user_builds/jupyterlite-sphinx/checkouts/221/docs/directives/my_notebook.md'

agriyakhetarpal · 2024-12-11T14:17:59Z

Thanks @agriyakhetarpal, nice to see support for markdown-based notebooks being added here!

Looks like the docs failure may be relevant though: readthedocs.org/projects/jupyterlite-sphinx/builds/26477597
FileNotFoundError: [Errno 2] No such file or directory: '/home/docs/checkouts/readthedocs.org/user_builds/jupyterlite-sphinx/checkouts/221/docs/directives/my_notebook.md'

Yes, this is related to #221 (comment) and #221 (comment). Let me push a few more changes to handle that; however, I'll fix it for now by reverting the previous commit!

agriyakhetarpal · 2024-12-11T18:25:46Z

Hi @gabalafou, I've ensured that these Markdown/MyST notebooks can also be stripped via 6a6d7a2, and added docs on how to use the stripping functionality with Markdown notebooks in 5867428 – on the lines of what we discussed yesterday. I have tested the changes that I have also upstreamed to a SciPy PR here: scipy/scipy#22042. From a self-review, this looks alright to me to go ahead with.

agriyakhetarpal · 2024-12-11T18:43:59Z

The case where my_notebook.ipynb and my_notebook.md both exist in the same folder has been handled, but I found yet another edge case based on these collisions, which is slightly trickier to resolve (but also much less likely to happen). If we have multiple files of the same name that exist across folders, we would have a problem. For example,

my_notebook.ipynb exists in folder1/, my_notebook.md exists in folder1/ and my_notebook.ipynb exists in folder2/

We would have a conflict again because all of the notebooks are going to be copied under the contents/ directory. We should preserve the folder structure when copying to the contents/ folder (i.e., create contents/folder1/ and contents/folder2/ and copy the notebooks in there) and indeed add UUID suffixes for the files to ensure that no collisions happen.

However, it is safe to say that this case existed prior to this PR as well (my example concerns two .ipynb files). I am not sure if anyone has encountered it in the wild – at least, there have been no bug reports about it so far. Would it be okay to proceed with this PR as is, mark this behaviour as a known case of failure, and open a separate issue for it?

This reverts commit 3ed8cbd.

This reverts commit d1c7f7d.

agriyakhetarpal · 2024-12-16T20:50:59Z

The case where my_notebook.ipynb and my_notebook.md both exist in the same folder has been handled, but I found yet another edge case based on these collisions, which is slightly trickier to resolve (but also much less likely to happen). If we have multiple files of the same name that exist across folders, we would have a problem. For example,

my_notebook.ipynb exists in folder1/, my_notebook.md exists in folder1/ and my_notebook.ipynb exists in folder2/

We would have a conflict again because all of the notebooks are going to be copied under the contents/ directory. We should preserve the folder structure when copying to the contents/ folder (i.e., create contents/folder1/ and contents/folder2/ and copy the notebooks in there) and indeed add UUID suffixes for the files to ensure that no collisions happen.

However, it is safe to say that this case existed prior to this PR as well (my example concerns two .ipynb files). I am not sure if anyone has encountered it in the wild – at least, there have been no bug reports about it so far. Would it be okay to proceed with this PR as is, mark this behaviour as a known case of failure, and open a separate issue for it?

This was discussed with @steppi today, and we agreed upon making a naming conflict a hard error, regardless of file extension. He suggested that I check if any projects have notebooks with duplicate names, so that we could correct them downstream as a safety measure before we release. I didn't find any for NumPy, SciPy, or scikit-learn; and neither has anyone reported an issue about this before – ever since jupyterlite-sphinx has had its inception. So, it is all the more the right time to conform to correct behaviour, and if someone raises this, we could just ask them to rename the duplicate notebook they have – at most, it would cost them a broken URL or few, which they can fix with redirects via RTD or through any other methods that we are not concerned about. Or, if we get a lot of bug reports about it, we could look into mangling the filenames with UUIDs and the like (which was my previous approach).

agriyakhetarpal · 2024-12-16T22:14:04Z

I've gone one step ahead in the check and also allowed having notebooks of the same filename stem—such as my_notebook.ipynb and my_notebook.md—to co-exist anywhere in the folder structure, but only under the condition where both of them are not referenced in our directives. This way, we are not going to deal with notebooks that are not our business – if, say, if only my_notebook.ipynb has been included in a .. jupyterlite:: my_notebook.ipynb directive somewhere in the docs by a user, it is only then that we process it further.

Also, we perform this lookup with a set, so that operation happens in linear time anyway – I did think about caching it using a frozenset with paths serialised to strings and a simple LRU approach, but I didn't feel it was worth keeping because we are never going to expect notebooks in the order of thousands.

This is ready for review, now!

steppi

This is looking good Agriya. A couple small suggestions. Only real blocker I think is that the error message for when duplicate notebooks are found could use a little work. Let's discuss it offline when you have time.

jupyterlite_sphinx/jupyterlite_sphinx.py

Co-Authored-By: Albert Steppi <[email protected]>

agriyakhetarpal · 2024-12-20T21:35:01Z

Albert and I discussed the error message offline and in consideration, reduced its size because of its verbosity; plus, it would be weird to see it as such from a user's perspective. It made more sense to make it more impersonal, highlighting the problem of conflicting notebook names at hand. Implemented in 81c0c09.

Co-authored-by: Albert Steppi <[email protected]>

Co-Authored-By: Albert Steppi <[email protected]>

agriyakhetarpal · 2024-12-20T22:31:05Z

With 29bb810, we don't keep going until we find all conflicts at once, and stop at the first one itself.

steppi · 2024-12-20T23:25:18Z

Closing and reopening to investigate XPython kernel not being found in the readthedocs preview in #228 (comment)

agriyakhetarpal · 2024-12-20T23:39:27Z

🤔 I cannot reproduce it here either

steppi · 2024-12-21T04:27:12Z

jupyterlite_sphinx/jupyterlite_sphinx.py

+
+            # We only consider conflicts if notebooks are actually referenced in
+            # a directive, to prevent false posiitves from being raised.
+            if hasattr(self.env, "jupyterlite_notebooks"):


I think it should be fine for now, but the $O(n^2)$ algorithm used here might be painful for projects with a very large number of notebooks. We could do $O(n)$ if we store the basenames in a set. Just making a note of it in case this ends up being a bottleneck for someone.

steppi · 2024-12-21T04:30:12Z

Thanks @agriyakhetarpal. The issue with the docs was transient. I'm still not sure what the problem was

agriyakhetarpal added the enhancement New feature or request label Dec 3, 2024

agriyakhetarpal added this to the 0.16.6 milestone Dec 3, 2024

agriyakhetarpal commented Dec 3, 2024

View reviewed changes

jupyterlite_sphinx/jupyterlite_sphinx.py Outdated Show resolved Hide resolved

agriyakhetarpal marked this pull request as ready for review December 3, 2024 23:42

agriyakhetarpal requested review from Carreau and steppi December 3, 2024 23:51

agriyakhetarpal mentioned this pull request Dec 4, 2024

MAINT, DOC: Use Jupytext's API, and fix cross-platform usage for doc/convert_notebooks.py scipy/scipy#21998

Merged

agriyakhetarpal force-pushed the feat/support-markdown-notebooks branch from ccd22f6 to 0401426 Compare December 11, 2024 14:26

agriyakhetarpal mentioned this pull request Dec 11, 2024

Implement the Simple Interface jupyterlite/jupyterlite#385

Open

agriyakhetarpal force-pushed the feat/support-markdown-notebooks branch from 5867428 to 6d2336b Compare December 13, 2024 17:13

agriyakhetarpal added 11 commits December 13, 2024 22:44

Add Jupytext as a dependency

e780819

Add ability to convert MyST-MD notebooks on-the-fly

1df1689

Document ability to use MyST notebooks

a8847e8

Add a sample MyST notebook

0c2829e

Showcase .md example for NotebookLite, Voici

58b6718

Fix linter errors

e973a88

Add jupytext for RTD builds

69d18bc

Drop UUID suffix, cache based on timestamps

4e19907

Revert "Add a sample MyST notebook"

e83fdba

This reverts commit 3ed8cbd.

Reapply "Add a sample MyST notebook"

0d633c4

This reverts commit d1c7f7d.

Clarify MyST notebook in notebook contents

b6cec81

agriyakhetarpal linked an issue Dec 13, 2024 that may be closed by this pull request

Support Jupytext (MyST-Markdown flavoured) notebooks in NotebookLite, JupyterLite, and Voici directives #191

Closed

agriyakhetarpal added 4 commits December 17, 2024 03:03

Avoid conflict: rename Markdown notebook

9720b69

Handle collisions among included files in docs build

beedf9b

Fix linter error

5a31d9b

Check notebooks across different folders as well

0185722

Merge branch 'main' into feat/support-markdown-notebooks

34228e5

basnijholt mentioned this pull request Dec 19, 2024

DOC: Enable jupyterlite-sphinx pipefunc/pipefunc#468

Open

steppi reviewed Dec 20, 2024

View reviewed changes

agriyakhetarpal and others added 2 commits December 21, 2024 03:02

Raise a RuntimeError instead of a ValueError

d770338

Reduce verbosity of error message

81c0c09

Co-Authored-By: Albert Steppi <[email protected]>

agriyakhetarpal and others added 4 commits December 21, 2024 03:07

Rename NB conversion caching function to _target_is_stale()

2c1ef54

Co-authored-by: Albert Steppi <[email protected]>

Merge main

e970cd2

Follow single responsibility principle for checking duplicate filenames

574acae

Co-Authored-By: Albert Steppi <[email protected]>

Raise an error on the first conflict itself

29bb810

Co-Authored-By: Albert Steppi <[email protected]>

agriyakhetarpal requested a review from steppi December 20, 2024 22:31

steppi closed this Dec 20, 2024

steppi reopened this Dec 20, 2024

steppi reviewed Dec 21, 2024

View reviewed changes

steppi merged commit 8061c30 into jupyterlite:main Dec 21, 2024
7 checks passed

agriyakhetarpal deleted the feat/support-markdown-notebooks branch December 21, 2024 07:41

This was referenced Dec 22, 2024

Add missing dependency on jupytext #236

Merged

DOC, MAINT: Add updates for interactive notebooks via jupyterlite-sphinx 0.17 scipy/scipy#22161

Merged

agriyakhetarpal mentioned this pull request Jan 9, 2025

Size reduction aproaches #247

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support the usage of Markdown-based notebooks with "Lite" directives #221

Support the usage of Markdown-based notebooks with "Lite" directives #221

agriyakhetarpal commented Dec 3, 2024 •

edited

Loading

agriyakhetarpal commented Dec 3, 2024

agriyakhetarpal commented Dec 3, 2024

agriyakhetarpal commented Dec 3, 2024

melissawm commented Dec 4, 2024

agriyakhetarpal commented Dec 4, 2024

jtpio commented Dec 4, 2024

agriyakhetarpal commented Dec 11, 2024

agriyakhetarpal commented Dec 11, 2024

agriyakhetarpal commented Dec 11, 2024

agriyakhetarpal commented Dec 16, 2024

agriyakhetarpal commented Dec 16, 2024

steppi left a comment

agriyakhetarpal commented Dec 20, 2024

agriyakhetarpal commented Dec 20, 2024

steppi commented Dec 20, 2024

agriyakhetarpal commented Dec 20, 2024

steppi Dec 21, 2024

steppi commented Dec 21, 2024

Support the usage of Markdown-based notebooks with "Lite" directives #221

Support the usage of Markdown-based notebooks with "Lite" directives #221

Conversation

agriyakhetarpal commented Dec 3, 2024 • edited Loading

Description

Changes made

Additional context

agriyakhetarpal commented Dec 3, 2024

agriyakhetarpal commented Dec 3, 2024

agriyakhetarpal commented Dec 3, 2024

melissawm commented Dec 4, 2024

agriyakhetarpal commented Dec 4, 2024

jtpio commented Dec 4, 2024

agriyakhetarpal commented Dec 11, 2024

agriyakhetarpal commented Dec 11, 2024

agriyakhetarpal commented Dec 11, 2024

agriyakhetarpal commented Dec 16, 2024

agriyakhetarpal commented Dec 16, 2024

steppi left a comment

Choose a reason for hiding this comment

agriyakhetarpal commented Dec 20, 2024

agriyakhetarpal commented Dec 20, 2024

steppi commented Dec 20, 2024

agriyakhetarpal commented Dec 20, 2024

steppi Dec 21, 2024

Choose a reason for hiding this comment

steppi commented Dec 21, 2024

agriyakhetarpal commented Dec 3, 2024 •

edited

Loading