-
Notifications
You must be signed in to change notification settings - Fork 10
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Initial draft of Chapter 4 * Try again to save notebook * Update names of files * initial commit of tutorials 2 and 7 * Anatomy tutorial * moved to subdirectory * Avanced Tutorial * Avanced Tutorial * Avanced Tutorial * expanded compression example * expanded compression example * expanded compression example * fix first cell * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove old notebook * Fix CI Co-authored-by: William Jamieson <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
- Loading branch information
1 parent
1c0bfa2
commit 99d02d8
Showing
1 changed file
with
305 additions
and
0 deletions.
There are no files selected for viewing
305 changes: 305 additions & 0 deletions
305
06-Advanced_Topics_and_Plans/06-Advanced_Topics_and_Plans.ipynb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,305 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"id": "93df0a25", | ||
"metadata": {}, | ||
"source": [ | ||
"Advanced Topics and Plans\n", | ||
"====================================" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "0d14c2ff", | ||
"metadata": {}, | ||
"source": [ | ||
"Capabilities not yet covered\n", | ||
"-----------------------------------------" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "23315037", | ||
"metadata": {}, | ||
"source": [ | ||
"- Serializing analytic models without pickle\n", | ||
"- When arrays are loaded\n", | ||
"- Memory mapping\n", | ||
"- Handling in-line vs binary array choices\n", | ||
" - E.g., files editable with a text editor\n", | ||
" - Easily printed out\n", | ||
" - Enables scientists to use own code to read contents without ASDF libraries\n", | ||
"- Handling compression of arrays\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "e2829669", | ||
"metadata": {}, | ||
"source": [ | ||
"Serializing Models\n", | ||
"---------------------------\n", | ||
"\n", | ||
"Uses astropy models" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "7793c9be", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"import os\n", | ||
"import astropy.modeling.models as amm\n", | ||
"import numpy as np\n", | ||
"import asdf\n", | ||
"from matplotlib import pyplot as plt\n", | ||
"\n", | ||
"# Create a simple model\n", | ||
"\n", | ||
"poly_model = amm.Polynomial1D(degree=3, c0=1.0, c3=1.0)\n", | ||
"poly_model.coeff = (1.0, 0.0, 0.0, 1.0)\n", | ||
"x = np.linspace(-5, 5, 101)\n", | ||
"plt.plot(x, poly_model(x))" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "e650ff40", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"af = asdf.AsdfFile()\n", | ||
"af.tree = {\"model\": poly_model}\n", | ||
"af.write_to(\"model.asdf\")\n", | ||
"af2 = asdf.open(\"model.asdf\")\n", | ||
"poly_model2 = af2.tree[\"model\"]\n", | ||
"print(poly_model2(x) - poly_model(x))" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "07746137", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# Compound model example\n", | ||
"compound = (\n", | ||
" poly_model + amm.Sine1D(amplitude=20.0, frequency=1.0, phase=0.0)\n", | ||
") | amm.Gaussian1D(amplitude=10.0, mean=0.0, stddev=10.0)\n", | ||
"plt.plot(x, compound(x))" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "f5b2b025", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"af.tree = {\"compound\": compound}\n", | ||
"af.write_to(\"compound.asdf\", all_array_storage=\"inline\")\n", | ||
"with open(\"compound.asdf\", \"rb\") as asdftext:\n", | ||
" text = asdftext.read()\n", | ||
" print(text.decode(\"utf-8\"))" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "899c6efd", | ||
"metadata": {}, | ||
"source": [ | ||
"Controlling how arrays are stored\n", | ||
"-----------------------------------------\n", | ||
"\n", | ||
"**inline vs binary**\n", | ||
"\n", | ||
"Arrays may be stored one of two ways. By default they will be stored as binary blocks. But it is possible to save them in the YAML itself. The following illustrates the various ways that can be done.\n", | ||
"\n", | ||
"*Globally selecting inline storage*\n", | ||
"\n", | ||
"That is, all arrays will be stored in the YAML" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "3f6212e1", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"a = np.arange(10)\n", | ||
"b = np.zeros((5, 5))\n", | ||
"tree = {\"a\": a, \"b\": b}\n", | ||
"af = asdf.AsdfFile(tree)\n", | ||
"af.write_to(\"inline.asdf\", all_array_storage=\"inline\")\n", | ||
"text = open(\"inline.asdf\").read()\n", | ||
"print(text)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "68d5a1e2", | ||
"metadata": {}, | ||
"source": [ | ||
"*Selecting specific array to store inline*" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "1aa9faa8", | ||
"metadata": { | ||
"tags": [ | ||
"raises-exception" | ||
] | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"af.set_array_storage(b, \"inline\")\n", | ||
"af.set_array_storage(a, \"internal\")\n", | ||
"af.write_to(\"partial_inline.asdf\")\n", | ||
"text = open(\"partial_inline.asdf\", \"rb\").read(768)\n", | ||
"print(text.decode(\"utf-8\"))" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "833868f6", | ||
"metadata": {}, | ||
"source": [ | ||
"**Compression options**\n", | ||
"\n", | ||
"Currently two block compression options are available. As with the inline vs internal, this can be done globally or for specific arrays. Current options are `'zlib'` or `'bzp2'`\n", | ||
"\n", | ||
"*global compression example*\n", | ||
"\n", | ||
"Where all arrays are stored inline" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "80e0230d", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"af = asdf.AsdfFile()\n", | ||
"a = np.arange(10)\n", | ||
"b = np.zeros(100000)\n", | ||
"af.tree = {\"a\": a, \"b\": b}\n", | ||
"af.write_to(\"all_compressed.asdf\", all_array_compression=\"zlib\")\n", | ||
"bcontent = open(\"all_compressed.asdf\", \"rb\").read()\n", | ||
"print(len(bcontent)) # note much smaller than 800000 bytes\n", | ||
"af2 = asdf.open(\"all_compressed.asdf\")\n", | ||
"print(af2.tree[\"b\"].shape)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "aec8ee21", | ||
"metadata": {}, | ||
"source": [ | ||
"*specific array compression*" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "f0b11048", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"af = asdf.AsdfFile()\n", | ||
"a = np.arange(10)\n", | ||
"b = np.zeros(100000)\n", | ||
"af.tree = {\"a\": a, \"b\": b}\n", | ||
"af.set_array_compression(b, \"bzp2\")\n", | ||
"af.write_to(\"selected_compressed.asdf\")\n", | ||
"bcontent = open(\"selected_compressed.asdf\", \"rb\").read()\n", | ||
"print(len(bcontent)) # note much smaller than 800000 bytes\n", | ||
"af2 = asdf.open(\"selected_compressed.asdf\")\n", | ||
"print(af2.tree[\"b\"].shape)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "844f5e91", | ||
"metadata": {}, | ||
"source": [ | ||
"Plans for Development\n", | ||
"-------------------------------" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "93b10975", | ||
"metadata": {}, | ||
"source": [ | ||
"- Support for fsspec for cloud usage (probably based on zarr and possibly DASK)\n", | ||
"- Support for chunking (also based on zarr, incorporate into zarr itself)\n", | ||
" - Improved block management\n", | ||
"- More compression options, mostly based on BLOSC\n", | ||
"- C/C++ support (initially through Python, but ideally as a native library)\n", | ||
"- Schema information retrieval, e.g., get description associated with an attribute\n", | ||
"- Support custom validators" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "a35a7da1", | ||
"metadata": {}, | ||
"source": [ | ||
"Questions?\n", | ||
"------------------" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "e4de9fca", | ||
"metadata": {}, | ||
"source": [ | ||
"Exercises\n", | ||
"---------------\n", | ||
"\n", | ||
"1. Write a C or C++ library to read and write ASDF files. Provide documentation and a full test suite.\n", | ||
"\n", | ||
"Bonus: support all astropy models." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "5d9dc89d", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [] | ||
} | ||
], | ||
"metadata": { | ||
"celltoolbar": "Tags", | ||
"kernelspec": { | ||
"display_name": "Python 3 (ipykernel)", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.10.5" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 5 | ||
} |