Advanced topics (#17)

* Initial draft of Chapter 4 * Try again to save notebook * Update names of files * initial commit of tutorials 2 and 7 * Anatomy tutorial * moved to subdirectory * Avanced Tutorial * Avanced Tutorial * Avanced Tutorial * expanded compression example * expanded compression example * expanded compression example * fix first cell * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove old notebook * Fix CI Co-authored-by: William Jamieson <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
asdf-format · Jul 7, 2022 · 99d02d8 · 99d02d8
1 parent 1c0bfa2
commit 99d02d8
Showing 1 changed file with 305 additions and 0 deletions.
diff --git a/06-Advanced_Topics_and_Plans/06-Advanced_Topics_and_Plans.ipynb b/06-Advanced_Topics_and_Plans/06-Advanced_Topics_and_Plans.ipynb
@@ -0,0 +1,305 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "93df0a25",
+   "metadata": {},
+   "source": [
+    "Advanced Topics and Plans\n",
+    "===================================="
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0d14c2ff",
+   "metadata": {},
+   "source": [
+    "Capabilities not yet covered\n",
+    "-----------------------------------------"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "23315037",
+   "metadata": {},
+   "source": [
+    "- Serializing analytic models without pickle\n",
+    "- When arrays are loaded\n",
+    "- Memory mapping\n",
+    "- Handling in-line vs binary array choices\n",
+    "  - E.g., files editable with a text editor\n",
+    "  - Easily printed out\n",
+    "  - Enables scientists to use own code to read contents without ASDF libraries\n",
+    "- Handling compression of arrays\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e2829669",
+   "metadata": {},
+   "source": [
+    "Serializing Models\n",
+    "---------------------------\n",
+    "\n",
+    "Uses astropy models"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "7793c9be",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "import astropy.modeling.models as amm\n",
+    "import numpy as np\n",
+    "import asdf\n",
+    "from matplotlib import pyplot as plt\n",
+    "\n",
+    "# Create a simple model\n",
+    "\n",
+    "poly_model = amm.Polynomial1D(degree=3, c0=1.0, c3=1.0)\n",
+    "poly_model.coeff = (1.0, 0.0, 0.0, 1.0)\n",
+    "x = np.linspace(-5, 5, 101)\n",
+    "plt.plot(x, poly_model(x))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "e650ff40",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "af = asdf.AsdfFile()\n",
+    "af.tree = {\"model\": poly_model}\n",
+    "af.write_to(\"model.asdf\")\n",
+    "af2 = asdf.open(\"model.asdf\")\n",
+    "poly_model2 = af2.tree[\"model\"]\n",
+    "print(poly_model2(x) - poly_model(x))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "07746137",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Compound model example\n",
+    "compound = (\n",
+    "    poly_model + amm.Sine1D(amplitude=20.0, frequency=1.0, phase=0.0)\n",
+    ") | amm.Gaussian1D(amplitude=10.0, mean=0.0, stddev=10.0)\n",
+    "plt.plot(x, compound(x))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f5b2b025",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "af.tree = {\"compound\": compound}\n",
+    "af.write_to(\"compound.asdf\", all_array_storage=\"inline\")\n",
+    "with open(\"compound.asdf\", \"rb\") as asdftext:\n",
+    "    text = asdftext.read()\n",
+    "    print(text.decode(\"utf-8\"))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "899c6efd",
+   "metadata": {},
+   "source": [
+    "Controlling how arrays are stored\n",
+    "-----------------------------------------\n",
+    "\n",
+    "**inline vs binary**\n",
+    "\n",
+    "Arrays may be stored one of two ways. By default they will be stored as binary blocks. But it is possible to save them in the YAML itself. The following illustrates the various ways that can be done.\n",
+    "\n",
+    "*Globally selecting inline storage*\n",
+    "\n",
+    "That is, all arrays will be stored in the YAML"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "3f6212e1",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "a = np.arange(10)\n",
+    "b = np.zeros((5, 5))\n",
+    "tree = {\"a\": a, \"b\": b}\n",
+    "af = asdf.AsdfFile(tree)\n",
+    "af.write_to(\"inline.asdf\", all_array_storage=\"inline\")\n",
+    "text = open(\"inline.asdf\").read()\n",
+    "print(text)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "68d5a1e2",
+   "metadata": {},
+   "source": [
+    "*Selecting specific array to store inline*"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "1aa9faa8",
+   "metadata": {
+    "tags": [
+     "raises-exception"
+    ]
+   },
+   "outputs": [],
+   "source": [
+    "af.set_array_storage(b, \"inline\")\n",
+    "af.set_array_storage(a, \"internal\")\n",
+    "af.write_to(\"partial_inline.asdf\")\n",
+    "text = open(\"partial_inline.asdf\", \"rb\").read(768)\n",
+    "print(text.decode(\"utf-8\"))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "833868f6",
+   "metadata": {},
+   "source": [
+    "**Compression options**\n",
+    "\n",
+    "Currently two block compression options are available. As with the inline vs internal, this can be done globally or for specific arrays. Current options are `'zlib'` or `'bzp2'`\n",
+    "\n",
+    "*global compression example*\n",
+    "\n",
+    "Where all arrays are stored inline"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "80e0230d",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "af = asdf.AsdfFile()\n",
+    "a = np.arange(10)\n",
+    "b = np.zeros(100000)\n",
+    "af.tree = {\"a\": a, \"b\": b}\n",
+    "af.write_to(\"all_compressed.asdf\", all_array_compression=\"zlib\")\n",
+    "bcontent = open(\"all_compressed.asdf\", \"rb\").read()\n",
+    "print(len(bcontent))  # note much smaller than 800000 bytes\n",
+    "af2 = asdf.open(\"all_compressed.asdf\")\n",
+    "print(af2.tree[\"b\"].shape)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "aec8ee21",
+   "metadata": {},
+   "source": [
+    "*specific array compression*"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f0b11048",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "af = asdf.AsdfFile()\n",
+    "a = np.arange(10)\n",
+    "b = np.zeros(100000)\n",
+    "af.tree = {\"a\": a, \"b\": b}\n",
+    "af.set_array_compression(b, \"bzp2\")\n",
+    "af.write_to(\"selected_compressed.asdf\")\n",
+    "bcontent = open(\"selected_compressed.asdf\", \"rb\").read()\n",
+    "print(len(bcontent))  # note much smaller than 800000 bytes\n",
+    "af2 = asdf.open(\"selected_compressed.asdf\")\n",
+    "print(af2.tree[\"b\"].shape)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "844f5e91",
+   "metadata": {},
+   "source": [
+    "Plans for Development\n",
+    "-------------------------------"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "93b10975",
+   "metadata": {},
+   "source": [
+    "- Support for fsspec for cloud usage (probably based on zarr and possibly DASK)\n",
+    "- Support for chunking (also based on zarr, incorporate into zarr itself)\n",
+    "  - Improved block management\n",
+    "- More compression options, mostly based on BLOSC\n",
+    "- C/C++ support (initially through Python, but ideally as a native library)\n",
+    "- Schema information retrieval, e.g., get description associated with an attribute\n",
+    "- Support custom validators"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a35a7da1",
+   "metadata": {},
+   "source": [
+    "Questions?\n",
+    "------------------"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e4de9fca",
+   "metadata": {},
+   "source": [
+    "Exercises\n",
+    "---------------\n",
+    "\n",
+    "1. Write a C or C++ library to read and write ASDF files. Provide documentation and a full test suite.\n",
+    "\n",
+    "Bonus: support all astropy models."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "5d9dc89d",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "celltoolbar": "Tags",
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.5"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}