Meetings

A place to keep notes etc from meetings that we have

2019-11-20 - Show and Tell

Jupinx

Organically grew over time.

Project: https://jupinx.quantecon.org

Example: https://python.quantecon.org

html, integrated testing, PDFs, etc
end product is still missing some things but in general pretty happy with the result
Basic features
- Everything is text-based (RST) and version controlled and this is crucial
limitations of the system
- Everything is based on text source files, which means that there are many intermediate steps to explore and debug the code. E.g. rST -> notebook -> explore and run cells -> back to rST -> re-execute all of it.
- Often you then edit the notebook and you've accidentally edited an intermediate file.
- There are lots edge-cases that have been coded into the build system (execution, testing, etc weren't imagined at the start but were built in later). Planning major refactor for Nov - Jan 2020.
- Full clean build takes a long time (due to code execution), so how can we more efficiently manage the execution process? How can we only execute code once and then produce multiple kinds of outputs?
Project started with sphinx as it supported a wide range of scientific inputs including math and code support etc.
Overt time wanted more theming and customization, and also wanted to be able to provide all of the final content in Jupyter Notebook format.
- That led to sphinxcontrib-jupyter
QuantEcon publishing relies on:
- sphinxcontrib-jupyter
- jupinx - A CLI to manage sphinxcontrib-jupyter with consistent interface. Relatively new, primarily a means of more easily configuring sphinx for scientific type publishing projects.
  - Uses Sphinx for the building, caching, etc and sphinxcontrib-jupyter for jupyter work
Once they had the notebook format as static outputs, they started using them more as an execution layer itself and removed static elements from rST files.
(Example Feature: support for download notebooks) build system can update links in the notebooks for relative links / references / etc and replace them with web locations so download notebooks include images and figures from the parent website.
System can build multiple kinds of notebooks - some that include tests, others that are meant for readers to download w/o tests.
Used Dask for paralellizing the build system at the notebook level.

Pros

Supports a wide range of scientific writing requirements and is able to support:
can support html, pdf outputs
can provide site generation style assistance through the inclusion of execution statistics and download notebooks that offer remote (rather than local) images.
provides execution support of all embedded code.

Cons

editing content workflow needs to be improved as the current iterative style doesn't work very well.
nbconvert templates require a high degree of technical skills to assemble
current execution approach is not optimal and takes a long time to compile the projects
some users don't like RST
code base needs a refactor to remove path dependent workflows and design the translators and builders

An Introduction to Applied Bioinformatics (IAB)

Lectures were originally IPython notebooks that were stored on GitHub.
Over time, the amount of text grew and it started to look more like a book.
Jupyter wasn't very convenient as a text editor or for reviewing pull requests.
Decided to convert all the source content to markdown.
- Used the ipymd project for this.
- Did all the text editing in markdown, and code editing in Jupyter. ipymd allowed us to quickly jump back and forth between a markdown file and a notebook because there's still a just single source file (the markdown file).
- The two modes of editing (text editor and Jupyter) is a critical feature for us as authors, but ipymd probably isn't the way to achieve this anymore (it doesn't seem to be under active development anymore). Maybe Jupytext is the way to achieve this now?
Multiple outputs types from markdown
- Static HTML
- Notebooks (e.g. added links to Binder automatically)
Build system (build-iab)
- Not meant for public consumption, just for build and deployment of IAB
- We're happy to scrap the system and switch over to something new, porting features from it as is useful
- it's completely undocumented because it's just for internal use, so just let us know if you have any questions.
- One step of the content markdown -> markdown with extra HTMl for cross refs etc -> jupyter notebooks -> html
Editing buttons are a cool feature (demo video here)
- Edit links - take you to the markdown representation of the notebook inside the GitHub repository for the file in the "edit" window and make a pull request if they wish.
- lots of students will make edits to the books as they're reading by clicking the "edit" button
- Students get participation points etc if you make PRs against the book
They allow for cross linking
- They have a pre-processing step for first parsing all of the notebooks
- Use the commonmark python module to create a syntax tree of each notebook
- They use this to generate cross-references, document headers, etc
- Have <link src='hash' /> syntax for linking to things
- Also use the syntax tree to determine section numbers etc
We are very interested in producing PDFs, but haven't got around to this functionality. This feature is ultimately what drove us to approach Sloan about this project.

Jupyter Book

Primary audience

The primary audience for Jupyter Book is instructors for courses that do some computational work, but are not necessarily well-versed in git / latex / etc. From the beginning this was a tool meant to be used by a lot of people (as opposed to a build system for one specific book). As such, it tries to adopt practices that are approachable by anybody (e.g. using Markdown and Jupyter Notebooks as primary content holders).

Design philosophy

Here are a few of the main goals of Jupyter Book:

Simplify the user experience - they shouldn't need to learn new tooling, syntax, etc beyond what is already in the open source ecosystem.
Make Jupyter / JupyterHub / Binder a first class citizen - build in assumptions and flexibility that leverage features of Jupyter Notebooks (e.g. HTML outputs or cell metadata features)
Optimize for maintainability by others - AKA, try to keep the complexity of the tool itself down. Leverage other open source tools as much as possible (e.g. nbconvert for notebook execution, jupytext for text->notebook conversion). I think the Python module for Jupyter Book is probably only about 1,000 - 2,000 lines of code. The bulk of it is in CSS and Javascript files.
Follow best-practices in document layout and design - I'm not a designer, so I tried to riff off of other best-practices in document visualization. I drew heavily from the Edward Tufte CSS guide.
Run as an open project - this means putting a lot of effort into creating contribution guides, to inviting feedback and input from users, to mentoring people who are interested in getting involved, etc.

General workflow

This is a short workflow for Jupyter Book - much of it is pulled from the Jupyter Book guide here.

Broadly speaking, there are three phases to building a Jupyter Book:

Put your content in a single self-contained folder (which can itself have sub-folders)
Build a simple HTML file from each page of your book (jupyter-book build)
Stitch these HTML pages into a website with CSS/JS using Jekyll (make serve)

Here's a more in-depth set of steps:

A user puts a collection of Jupyter Notebooks or text files (generally .md) in a single self-contained folder (called content/).
Run jupyter-book create mybookname --content-folder path/to/content - this creates a Jekyll site template and places the content folder in the right place inside.
The user builds a Table of Contents file (_data/toc.yml). This is a YAML file that defines the structure of the book. Each entry is a path relative to the content/ folder.

The user can optionally configure some features via a _config.yml file as well (such as automatically adding Binder links to pages)

The user runs jupyter-book build path/to/mybookname/. This will

Loop through each page defined in toc.yml
(optionally) run its content if specified. If not specified, just do a straight conversion
(optionally) if the file is a text file, check for jupytext header metadata and, if found, convert the file to an .ipynb file
Convert the page to a simple HTML format and place in the _build/ folder. Also include some metadata in that output file in the YAML header of the file

The user either uploads this site to GitHub and gh-pages will build it automatically (because the site is a valid Jekyll website).
OR, the user can preview and build the HTML of their book locally with make serve.

Pros

I think the biggest benefit of Jupyter Book comes from the design and layout of the final product. People really like the minimalist layout, and the way that the interface is (relatively) dynamic depending on screen sizes, configuration, etc. Moreover, they like the relatively simple workflow and the fact that it works with Jupyter Notebooks from day one, and the fact that the outputs of cells look the same on the site as they do in a notebook. People also like the ability to add interactive links etc into their book, and control the layout using cell metadata (in particular, the "show/hide" inputs/outputs is a heavily used feature).

Cons

Jekyll sucks, and is a huge pain in the butt to install for many people. It's also fairly slow and buggy. Yay!
- As an aside, I'd love to completely re-work the second step of the build system (either replacing Jekyll with Hugo, or using a different tool like Sphinx or Pandoc)
It doesn't support feature-rich Markdown (it uses the Jupyter Markdown flavor which is basically CommonMark)
- I'd love to support RMarkdown or something like it
The build process is more complex than I'd prefer
- I'd love if we can get it down to one step that is clever about cacheing to be faster, rather than have two separate steps.
Support for PDFs and non-HTML output isn't great. It currently uses PrintJS to print pages to PDF for each page, and this works OK, but it doesn't render the whole book as PDF.
It lacks support for cross-document references and labels, which makes it cumbersome to link to other things in the book.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Meetings

2019-11-20 - Show and Tell

Jupinx

Pros

Cons

An Introduction to Applied Bioinformatics (IAB)

Jupyter Book

Primary audience

Design philosophy

General workflow

Pros

Cons

Clone this wiki locally