Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

explore "Web Bundles"-like distribution #1082

Open
proppy opened this issue May 13, 2023 · 3 comments
Open

explore "Web Bundles"-like distribution #1082

proppy opened this issue May 13, 2023 · 3 comments
Labels
enhancement New feature or request

Comments

@proppy
Copy link

proppy commented May 13, 2023

Problem

Sharing and distributing a jupyterlite instance for local use, currently require zipping all the file and running a local web server.

Proposed Solution

It would be nice to leverage solution from https://github.com/WICG/webpackage to create single-file bundle that could be opened directly in the browser.

Additional context

maybe @KenjiBaheux @slightlyoff have more context on the current status of the spec and possible alternative.

@proppy proppy added the enhancement New feature or request label May 13, 2023
@proppy proppy changed the title explore webbundle-like distribution explore "Web Bundles"-like distribution May 13, 2023
@westurner
Copy link

westurner commented May 16, 2023

@KenjiBaheux
Copy link

KenjiBaheux commented May 17, 2023

Thanks for reaching out.

A lot has changed since we originally started working on "navigation to bundled exchanges" and related use cases. We also had a few assumptions about how the ecosystem would evolve and how the different use cases we anticipated would play out. Some of those assumptions didn't pan out. In addition, there were some implementation challenges and open questions on performance aspects.

In light of these headwinds, we've decided to pause our investment on the remaining use cases until there is clearer evidence of alignment between users & developers needs and this particular solution or something related.

It's possible that other browser vendors are still exploring the space (cc/ @slightlyoff). I'm at least aware of strong interest in packaging from the folks working on mini-apps standardization. Their proposed solution is different from Web Bundles though.

So, in the meantime, I'd recommend engaging with the mini-apps working group (see this article for an overview of standardization efforts). If there are aspects from Web Bundles that were appealing but missing in their exploration, it would be useful to surface the details via an issue on their github.

@westurner
Copy link

  • There's not yet a Python implementation of any of the Web Bundle tools.
  • Why Web Bundle over WARC?
  • .
  • https://github.com/WICG/webpackage#web-bundles
  • https://github.com/WICG/webpackage/blob/main/go/bundle/README.md ::

    We currently provide three command-line tools: gen-bundle, sign-bundle and dump-bundle.

    • gen-bundle command is a bundle generator tool. gen-bundle consumes a set of http exchanges (currently in the form of HAR format, URL list file, or static files in a local directory), and emits a web bundle.
    • sign-bundle command attaches a signature to a bundle. There are two supported ways to sign: using signatures section or integrity block. sign-bundle takes an existing bundle file, a private key and possibly a certificate, and emits a new bundle file with cryptographic signature added.
    • dump-bundle command is a bundle inspector tool. dump-bundle dumps the enclosed http exchanges of a given web bundle file in a human readable form.
  • https://developer.chrome.com/docs/web-platform/web-bundles/#explaining-web-bundles ::

    To be precise, a Web Bundle is a CBOR file with a .wbn extension (by convention) which packages HTTP resources into a binary format, and is served with the application/webbundle MIME type.

CBOR

WICG

Web Bundle

why web bundles (or similar)?

  • repo2jupyterlite can build a static WASM site
    from a jupyter-book (Sphinx) and .md, .rst, .ipynb in _toc.yml
  • to fight "Reference Rot" and "Link Rot" in notebooks
  • users should include a copy of all resources necessary for their argument
  • users should distribute signed reproducible archives of the data and code resources necessary for their argument
  • practically, a Jupyter notebook .ipynb doesn't work a couple years later because there's not a copy of e.g. http://cdn.example/jquery-v1.02-custom.js which something in the notebook and notebook extensions required
    • jupyter governance, nbformat,: jupyter extension authors SHOULD / MAY include copies of all external resources necessary at runtime
  • What value do Web Bundles or HARs provide if e.g. repo2jupyterlite (or docker+wasm) can ensure that every referenced URL is inlined into the static site build?
    • Jupyter notebooks may download schema:Datasets e.g. as (csv, csvw, json) at runtime

    • Jupyter notebooks may make HTTP API requests at runtime

    • Jupyter and/or Jupyter extensions may depend upon referenced external HTTP resources which are not inlined/vendored but are retrieved at runtime and not at reproducible archive build/compile time

    • A log of all requests and responses made in generating (and then viewing and interacting with a) notebook would be a more reproducible thing to distribute than an .ipynb and hopefully an adjacent environment.yml that could easily fail some time later (before the HTTP data fetch fails when you run the actual notebook)

    • Generating a notebook:

      • Test/Build:
        pytest example.ipynb
        ipython --TerminalIPythonApp.file_to_run=example.ipynb
        ipython -c '%run example.ipynb'
        cp example.ipynb example2-inplace.ipynb; cp example.ipynb example3-clearoutput.ipynb; cp example.ipynb example4-
        jupyter nbconvert --execute --inplace --to=notebook example2-inplace.ipynb
        jupyter nbconvert --execute --clear-output --to=notebook example3-clearoutput.ipynb
        jupyter nbconvert --execute --to=notebook example.ipynb
        papermill example.ipynb example4-papermill.ipynb  # streams output while running
        papermill example.ipynb s3://bkt/example5-papermill.ipynb -y 'x=3 y=4.0 z="5"'
        nbdiff example.ipynb example2.ipynb
        nbdiff-web example.ipynb example2.ipynb
      • Viewing/Rendering a notebook:
        alias web='python -m webbrowser'
        web ./example.ipynb
        
        python -m http.server . & sleep 5; web ./example.ipynb
        
        web https://username.github.io/notebooks/example.ipynb.html
        • resources_requested_by_chromium_when_viewing_url ?!= resources_included_in_archive
    • A log of all external HTTP requests and responses made generating a notebook could be generated by logging to HAR with an HTTPS proxy (or runtime monkeypatching or LD_PRELOAD (but e.g. go binaries don't have ro respect LD_PRELOAD))

    • A HAR archive is a ZIP of HTTP requests and responses.

    • You can right-click to generate a HAR archive from the 'Network' tab of Chrome/Firefox Devtools

    • Users don't prefer to review HARs; users prefer to distribute one file (with a commit checksum and an optional version string) like PDF, which doesn't support code or linked data.

    • A one-file archive must necessarily include HTTP resources that already have URLs and content signatures.

    • If you rewrite URL paths within HTML/JS/TS/PyScript resources (e.g. in a HAR),
      their content hashes change
      and then you must re-sign/hash every resource
      (and then update the SRI hashes in HTML that had already referenced them)
      (but then they have different etag hashes, and there are unnecessary cache misses)
      in order to assure data integrity.

    • SRI: Subresource Integrity https://developer.mozilla.org/en-US/docs/Web/Security/Subresource_Integrity#subresource_integrity_with_the_script_element ::

      <script
        src="https://example.com/example-framework.js"
        integrity="sha384-oqVuAfXRKap7fdgcCY5uykM6+R9GqQ8K/uxy9rx7HNQlGYl1kPzQho1wx4JwY8wC"
        crossorigin="anonymous"></script>
    • Web Bundles (Bundled HTTP Exchanges) don't need to rewrite paths in archived resources

    • Web Bundles require HTTP headers to serve the .wbn file?
      https://wpack-wg.github.io/bundled-responses/draft-ietf-wpack-bundled-responses.html#section-4.4 :

      4.4. Serving constraints
      When served over HTTP, a response containing an application/webbundle payload MUST include at least the following response header fields, to reduce content sniffing vulnerabilities (Section 5.2):

      Content-Type: application/webbundle
      X-Content-Type-Options: nosniff
    • Is WARC a sufficient packaging format for notebooks, either?

    • repo2docker and repo2jupyterlite build reproducible archives from e.g. git repos and figshare and zenodo DOIs

    • https://github.com/emscripten-forge/empack :

      [empack is a tool] to pack a conda / mamba environment into a JS & WASM bundle

      empack pack env --env-prefix /path/to/env --outname python_data  --config /path/to/config.yaml

      This will generate two files python_data.js and python_data.data that you can use in the browser. A sample config is located in tests/empack_test_config.yaml

    • https://www.docker.com/blog/announcing-dockerwasm-technical-preview-2/ (2023) ::

      $ docker run --rm --runtime=io.containerd.wasmedge.v1 --platform=wasi/wasm secondstate/rust-example-hello:latest
      Hello WasmEdge!
      
      $ docker run --rm --runtime=io.containerd.wasmtime.v1 --platform=wasi/wasm secondstate/rust-example-hello:latest
      Hello WasmEdge!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants