From 0fc83d8446bb54911df50136c8e238e7f9ad814b Mon Sep 17 00:00:00 2001 From: Ian Czekala Date: Sun, 25 Dec 2022 11:07:59 -0500 Subject: [PATCH 1/8] started tutorial on how to produce fake datasets. --- docs/ci-tutorials/fakedata.md | 39 +++++++++++++++++++++++++ docs/ci-tutorials/loose-visibilities.md | 1 - 2 files changed, 39 insertions(+), 1 deletion(-) create mode 100644 docs/ci-tutorials/fakedata.md diff --git a/docs/ci-tutorials/fakedata.md b/docs/ci-tutorials/fakedata.md new file mode 100644 index 00000000..ae04add4 --- /dev/null +++ b/docs/ci-tutorials/fakedata.md @@ -0,0 +1,39 @@ +--- +jupytext: + text_representation: + extension: .md + format_name: myst + format_version: 0.13 + jupytext_version: 1.14.1 +kernelspec: + display_name: Python 3 + language: python + name: python3 +--- + +```{code-cell} +:tags: [hide-cell] +%run notebook_setup +``` + +# Making a Mock Dataset + +This is most useful if you already have a real dataset, with real baseline distributions and noise weights. Alternatively, you could acquire some baseline distribution and noise distribution, possibly using CASA's simobserve. + + + +For this, you would take a realistic, known, sky-brightness distribution and then propagate this to the visibilities. For example, you could use an image from a simulation, a parametric model, or even an image from the Internet. + +```{code-cell} +url="https://upload.wikimedia.org/wikipedia/commons/thumb/1/1a/The_final_ALMA_antenna.jpg/2560px-The_final_ALMA_antenna.jpg" +``` + +How many pixels does it have? + +The routine just takes an Image cube, u,v, weights and produces visibilites with noise. + +But there's another concern about how to put the image cube in, right? I guess that's just a matter of matching the image cube to the size. You may want to pad the image, though. You probably also want to convolve it. + +Here is where it would be helpful to have a note about how changing pixel size and image dimensions affects the uv coverage. There needs to be some match up between the image and the uv size. + +We'll use the same u,v distribution and noise distribution from the mock dataset. The max baseline diff --git a/docs/ci-tutorials/loose-visibilities.md b/docs/ci-tutorials/loose-visibilities.md index 5c6affab..d6f8c025 100644 --- a/docs/ci-tutorials/loose-visibilities.md +++ b/docs/ci-tutorials/loose-visibilities.md @@ -13,7 +13,6 @@ kernelspec: ```{code-cell} :tags: [hide-cell] -%matplotlib inline %run notebook_setup ``` From 45eaaa1537a9916081e41559eadd06c99a9dfeb9 Mon Sep 17 00:00:00 2001 From: Ian Czekala Date: Sun, 25 Dec 2022 17:43:55 -0500 Subject: [PATCH 2/8] some progress towards downloading image. --- docs/ci-tutorials/fakedata.md | 28 +++++++++++++++++++++++++--- docs/index.rst | 1 + 2 files changed, 26 insertions(+), 3 deletions(-) diff --git a/docs/ci-tutorials/fakedata.md b/docs/ci-tutorials/fakedata.md index ae04add4..5ee1edec 100644 --- a/docs/ci-tutorials/fakedata.md +++ b/docs/ci-tutorials/fakedata.md @@ -21,16 +21,38 @@ kernelspec: This is most useful if you already have a real dataset, with real baseline distributions and noise weights. Alternatively, you could acquire some baseline distribution and noise distribution, possibly using CASA's simobserve. - For this, you would take a realistic, known, sky-brightness distribution and then propagate this to the visibilities. For example, you could use an image from a simulation, a parametric model, or even an image from the Internet. ```{code-cell} -url="https://upload.wikimedia.org/wikipedia/commons/thumb/1/1a/The_final_ALMA_antenna.jpg/2560px-The_final_ALMA_antenna.jpg" +# use python to download an image +import requests + +image_url="https://upload.wikimedia.org/wikipedia/commons/thumb/1/1a/The_final_ALMA_antenna.jpg/2560px-The_final_ALMA_antenna.jpg" + +img_data = requests.get(image_url).content +with open('alma.jpg', 'wb') as handler: + handler.write(img_data) +``` + +```{code-cell} ipython3 +--- +mystnb: + image: + width: 600px + alt: ALMA + classes: shadow bg-primary + figure: + caption: | + The ALMA antennas. + name: alma-ref +--- +from IPython.display import Image +Image("alma.jpg") ``` How many pixels does it have? -The routine just takes an Image cube, u,v, weights and produces visibilites with noise. +The routine just takes an Image cube, u,v, weights and produces visibilities with noise. But there's another concern about how to put the image cube in, right? I guess that's just a matter of matching the image cube to the size. You may want to pad the image, though. You probably also want to convolve it. diff --git a/docs/index.rst b/docs/index.rst index 6ec65343..b198cdb9 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -54,6 +54,7 @@ If you'd like to help build the MPoL package, please check out the :ref:`develop ci-tutorials/initializedirtyimage large-tutorials/HD143006_part_1 large-tutorials/HD143006_part_2 + ci-tutorials/fakedata .. toctree:: :hidden: From 503a1bc33869d527041b9551cc2b9e5021217b17 Mon Sep 17 00:00:00 2001 From: Ian Czekala Date: Sun, 25 Dec 2022 22:27:27 -0500 Subject: [PATCH 3/8] added links to image packing docs. --- docs/ci-tutorials/fakedata.md | 170 +++++++++++++++++++++++++++++++--- docs/units-and-conventions.md | 1 + setup.py | 13 ++- src/mpol/images.py | 4 +- 4 files changed, 173 insertions(+), 15 deletions(-) diff --git a/docs/ci-tutorials/fakedata.md b/docs/ci-tutorials/fakedata.md index 5ee1edec..e8d95fc8 100644 --- a/docs/ci-tutorials/fakedata.md +++ b/docs/ci-tutorials/fakedata.md @@ -6,28 +6,38 @@ jupytext: format_version: 0.13 jupytext_version: 1.14.1 kernelspec: - display_name: Python 3 + display_name: Python 3 (ipykernel) language: python name: python3 --- -```{code-cell} +```{code-cell} ipython3 :tags: [hide-cell] + %run notebook_setup ``` # Making a Mock Dataset -This is most useful if you already have a real dataset, with real baseline distributions and noise weights. Alternatively, you could acquire some baseline distribution and noise distribution, possibly using CASA's simobserve. +In this tutorial, we'll explore how you might construct a mock dataset from a known sky brightness distribution. In many ways, this problem is already solved in a realistic manner by CASA's [simobserve](https://casadocs.readthedocs.io/en/latest/api/tt/casatasks.simulation.simobserve.html) task. However, by replicating the key parts of this process with MPoL framework, we can easily make direct comparisons to images produced using RML techniques. + +In a nutshell, this process is works by +1. taking a known sky brightness distribution (i.e., a mock "true" image) +2. inserting it into an {class}`mpol.images.ImageCube` +3. using the {class}`mpol.fourier.NuFFT` to predict visibilities at provided $u,v$ locations +4. adding noise +The final two steps are relatively straightforward. The first two steps are conceptually simple but there are several technical concerns one should be aware of, which we'll cover now. -For this, you would take a realistic, known, sky-brightness distribution and then propagate this to the visibilities. For example, you could use an image from a simulation, a parametric model, or even an image from the Internet. +## Choosing a mock sky brightness distribution -```{code-cell} +You can choose a mock sky brightness distribution from a simulation, a parametric model, or even an image from the Internet. For this tutorial, we'll use a JPEG image from the internet, since it will highlight many of the problems one might run into. First, we'll download the image and display it. + +```{code-cell} ipython3 # use python to download an image import requests -image_url="https://upload.wikimedia.org/wikipedia/commons/thumb/1/1a/The_final_ALMA_antenna.jpg/2560px-The_final_ALMA_antenna.jpg" +image_url="https://cdn.eso.org/images/screen/alma-eight_close.jpg" img_data = requests.get(image_url).content with open('alma.jpg', 'wb') as handler: @@ -37,19 +47,147 @@ with open('alma.jpg', 'wb') as handler: ```{code-cell} ipython3 --- mystnb: + figure: + caption: 'The ALMA antennas. Credit: ALMA (ESO/NAOJ/NRAO) + + ' + name: alma-ref image: - width: 600px alt: ALMA classes: shadow bg-primary - figure: - caption: | - The ALMA antennas. - name: alma-ref + width: 600px --- from IPython.display import Image Image("alma.jpg") ``` +There are several operations we will need to perform on this image before it is suitable to insert into an {class}`mpol.images.ImageCube`. +1. convert the JPEG image to greyscale `float64` values +1. make the image square (if necessary) +1. choose a target {class}`~mpol.images.ImageCube` size via `cell_size` and `npix`. The range of acceptable image dimensions depends on the range $u,v$ coordinates +1. if the raw image size is larger than the target size of the {class}`~mpol.images.ImageCube`, smooth and downsample the image +1. scale flux units to be Jy/arcsec^2 + +To perform image manipulations, we'll use the [Pillow](https://pillow.readthedocs.io/en/stable/index.html) library. + +```{code-cell} ipython3 +from PIL import Image, ImageOps, ImageMath + +im_raw = Image.open("alma.jpg") + +# convert to greyscale +im_grey = ImageOps.grayscale(im_raw) +``` + +```{code-cell} ipython3 +xsize, ysize = im_grey.size +``` + +```{code-cell} ipython3 +im_grey +``` + +Now we'll need to make the image square. Depending on the image, we can either crop the longer dimension or pad the larger dimension. While we're doing this, we should also be thinking about the image edges. + +Because the discrete Fourier transform is involved in taking an image to the visibility plane, we are making the assumption that the image is infinite and periodic in space beyond the field of view. i.e., it tiles to infinity. Therefore, to avoid introducing spurious spatial frequencies, it is a good idea to make sure that the edges of the image all have the same value. The simplest thing to do here is to taper the image edges such that they all go to 0. We can do this by multiplying by an apodization function, like the Hann window. + +```{code-cell} ipython3 +xhann = np.hanning(xsize) +yhann = np.hanning(ysize) +# each is already normalized to a max of 1 +# so hann is also normed to a max of 1 +# broadcast to 2D +hann = np.outer(yhann, xhann) + +# scale to 0 - 255 and then convert to uint8 +hann8 = np.uint8(hann * 255) +im_apod = Image.fromarray(hann8) +``` + +```{code-cell} ipython3 +im_apod +``` + +Perform image math to multiply the taper against the greyscaled image + +```{code-cell} ipython3 +im_res = ImageMath.eval("a * b", a=im_grey, b=im_apod) +``` + +```{code-cell} ipython3 +im_res +``` + +```{code-cell} ipython3 +im_pad = ImageOps.pad(im_res, (1280,1280)) +``` + +```{code-cell} ipython3 +im_pad +``` + +```{code-cell} ipython3 +im_small = im_pad.resize((500,500)) +``` + +```{code-cell} ipython3 +im_small +``` + +```{code-cell} ipython3 +print(im_grey.mode) +``` + +In converting the JPEG image to greyscale (mode "L" or "luminance"), the Pillow library has reduced the color channel to a single axis with an 8-bit unsigned integer, which can take on the values from 0 to 255. More info on the modes is available [here](https://pillow.readthedocs.io/en/stable/handbook/concepts.html#concept-modes). + +```{code-cell} ipython3 + +``` + +Now that we have resized and tapered the image, we're ready to leave the Pillow library and work with numpy arrays and pytorch tensors. First we convert from a Pillow object to a numpy array + +```{code-cell} ipython3 +import numpy as np +a = np.array(im_small) +``` + +We can see that this array is now a 32-bit integer array (it was promoted during the ImageMath operation to save precision). + +```{code-cell} ipython3 +a +``` + +We will convert this array to a `float64` type and normalize its max value to 1. + +```{code-cell} ipython3 +b = a.astype("float64") +b = b/b.max() +``` + +Now, we can plot this array using matplotlib's `imshow`, as we might normally do with arrays of data + +```{code-cell} ipython3 +import matplotlib.pyplot as plt +plt.imshow(b, origin="upper") +``` + +But the main idea is that the values range from 0 to 255 + +```{code-cell} ipython3 +a.dtype +``` + +```{code-cell} ipython3 + +``` + +This image is rectangular, with more pixels in the East-West direction compared to North-South. MPoL and the {class}`mpol.images.ImageCube` routines work (for now) under the assumption of square images. To rectify this situation, we will pad the North-South direction with zeros. + ++++ + +This is most useful if you already have a real dataset, with real baseline distributions and noise weights. Alternatively, you could acquire some baseline distribution and noise distribution, possibly using CASA's simobserve. + + How many pixels does it have? The routine just takes an Image cube, u,v, weights and produces visibilities with noise. @@ -58,4 +196,14 @@ But there's another concern about how to put the image cube in, right? I guess t Here is where it would be helpful to have a note about how changing pixel size and image dimensions affects the uv coverage. There needs to be some match up between the image and the uv size. +Adjusting the `cell_size` changes the maximum spatial frequency that can be represented in the image. I.e., a smaller pixel cell size will enable an image to carry higher spatial frequencies. + +Changing the number of pixels via `npix` will change the number of $u,v$ cells between 0 and the max spatial frequency. + +Now, let's put this into a pytorch tensor, flip the directions, and insert it into an ImageCube. + +```{code-cell} ipython3 + +``` + We'll use the same u,v distribution and noise distribution from the mock dataset. The max baseline diff --git a/docs/units-and-conventions.md b/docs/units-and-conventions.md index b98e2328..1b8395aa 100644 --- a/docs/units-and-conventions.md +++ b/docs/units-and-conventions.md @@ -109,6 +109,7 @@ $$ For more information on this procedure as implmented in MPoL, see the {class}`~mpol.gridding.Gridder` class and the source code of its {func}`~mpol.gridding.Gridder.get_dirty_image` method. When the grid of ${\cal V}_{u,v}$ values is not fully sampled (as in any real-world interferometric observation), there are many subtleties beyond this simple equation that warrant consideration when synthesizing an image via inverse Fourier transform. For more information, consult the seminal [Ph.D. thesis](http://www.aoc.nrao.edu/dissertations/dbriggs/) of Daniel Briggs. +(cube-orientation-label)= ### Image Cube Packing for FFTs Numerical FFT routines expect that the first element of an input array (i.e., `array[i,0,0]`) corresponds to the zeroth spatial ($l,m$) or frequency ($u,v$) coordinate. This convention is quite different than the way we normally look at images. As described above, MPoL deals with three dimensional image cubes of shape `(nchan, npix, npix)`, where the "rows" of the image cube (axis=1) correspond to the $m$ or Dec axis, and the "columns" of the image cube (axis=2) correspond to the $l$ or R.A. axis. Normally, the zeroth spatial component $(l,m) = (0,0)$ is in the *center* of the array (at position `array[i,M/2,L/2]`), so that when an array is visualized (say with `matplotlib.pyplot.imshow`, `origin="lower"`), the center of the array appears in the center of the image. diff --git a/setup.py b/setup.py index 07f2fe4f..9d411f4c 100644 --- a/setup.py +++ b/setup.py @@ -37,7 +37,7 @@ def get_version(rel_path): "sphinx>=2.3.0", "numpy", "jupytext", - "ipython!=8.7.0", # broken version for syntax higlight https://github.com/spatialaudio/nbsphinx/issues/687 + "ipython!=8.7.0", # broken version for syntax higlight https://github.com/spatialaudio/nbsphinx/issues/687 "nbsphinx", "sphinx_book_theme", "sphinx_copybutton", @@ -49,6 +49,7 @@ def get_version(rel_path): "tensorboard", "myst-nb", "jupyter-cache", + "Pillow", ], } @@ -66,7 +67,15 @@ def get_version(rel_path): long_description=long_description, long_description_content_type="text/markdown", url="https://github.com/iancze/MPoL", - install_requires=["numpy", "scipy", "torch>=1.8.0", "torchvision", "torchaudio", "torchkbnufft", "astropy"], + install_requires=[ + "numpy", + "scipy", + "torch>=1.8.0", + "torchvision", + "torchaudio", + "torchkbnufft", + "astropy", + ], extras_require=EXTRA_REQUIRES, packages=setuptools.find_packages("src"), package_dir={"": "src"}, diff --git a/src/mpol/images.py b/src/mpol/images.py index 81a7544f..f9b4b1ee 100644 --- a/src/mpol/images.py +++ b/src/mpol/images.py @@ -26,7 +26,7 @@ class BaseCube(nn.Module): coords (GridCoords): an object already instantiated from the GridCoords class. If providing this, cannot provide ``cell_size`` or ``npix``. nchan (int): the number of channels in the base cube. Default = 1. pixel_mapping (torch.nn): a PyTorch function mapping the base pixel representation to the cube representation. If `None`, defaults to `torch.nn.Softplus() `_. Output of the function should be in units of [:math:`\mathrm{Jy}\,\mathrm{arcsec}^{-2}`]. - base_cube (torch.double tensor, optional): a pre-packed base cube to initialize the model with. If None, assumes ``torch.zeros``. + base_cube (torch.double tensor, optional): a pre-packed base cube to initialize the model with. If None, assumes ``torch.zeros``. See :ref:`cube-orientation-label` for more information on the expectations of the orientation of the input image. """ def __init__( @@ -173,7 +173,7 @@ class ImageCube(nn.Module): coords (GridCoords): an object already instantiated from the GridCoords class. If providing this, cannot provide ``cell_size`` or ``npix``. nchan (int): the number of channels in the image passthrough (bool): if passthrough, assume ImageCube is just a layer as opposed to parameter base. - cube (torch.double tensor, of shape ``(nchan, npix, npix)``): (optional) a prepacked image cube to initialize the model with in units of [:math:`\mathrm{Jy}\,\mathrm{arcsec}^{-2}`]. If None, assumes starting ``cube`` is ``torch.zeros``. + cube (torch.double tensor, of shape ``(nchan, npix, npix)``): (optional) a prepacked image cube to initialize the model with in units of [:math:`\mathrm{Jy}\,\mathrm{arcsec}^{-2}`]. If None, assumes starting ``cube`` is ``torch.zeros``. See :ref:`cube-orientation-label` for more information on the expectations of the orientation of the input image. """ def __init__( From 46030d276a536dbc81a26dfb55386cc5d917baf1 Mon Sep 17 00:00:00 2001 From: Ian Czekala Date: Sun, 25 Dec 2022 23:34:58 -0500 Subject: [PATCH 4/8] start to fake data routine. --- docs/ci-tutorials/fakedata.md | 152 ++++++++++++++++++++++++++-------- src/mpol/__init__.py | 2 +- src/mpol/gridding.py | 18 ++-- src/mpol/utils.py | 55 +++++++++--- 4 files changed, 171 insertions(+), 56 deletions(-) diff --git a/docs/ci-tutorials/fakedata.md b/docs/ci-tutorials/fakedata.md index e8d95fc8..57871553 100644 --- a/docs/ci-tutorials/fakedata.md +++ b/docs/ci-tutorials/fakedata.md @@ -17,6 +17,7 @@ kernelspec: %run notebook_setup ``` +(mock-dataset-label)= # Making a Mock Dataset In this tutorial, we'll explore how you might construct a mock dataset from a known sky brightness distribution. In many ways, this problem is already solved in a realistic manner by CASA's [simobserve](https://casadocs.readthedocs.io/en/latest/api/tt/casatasks.simulation.simobserve.html) task. However, by replicating the key parts of this process with MPoL framework, we can easily make direct comparisons to images produced using RML techniques. @@ -27,7 +28,7 @@ In a nutshell, this process is works by 3. using the {class}`mpol.fourier.NuFFT` to predict visibilities at provided $u,v$ locations 4. adding noise -The final two steps are relatively straightforward. The first two steps are conceptually simple but there are several technical concerns one should be aware of, which we'll cover now. +The final two steps are relatively straightforward. The first two steps are conceptually simple but there are several technical caveats one should be aware of, which we'll cover now. ## Choosing a mock sky brightness distribution @@ -48,9 +49,7 @@ with open('alma.jpg', 'wb') as handler: --- mystnb: figure: - caption: 'The ALMA antennas. Credit: ALMA (ESO/NAOJ/NRAO) - - ' + caption: 'The ALMA antennas. Credit: ALMA (ESO/NAOJ/NRAO)' name: alma-ref image: alt: ALMA @@ -70,26 +69,31 @@ There are several operations we will need to perform on this image before it is To perform image manipulations, we'll use the [Pillow](https://pillow.readthedocs.io/en/stable/index.html) library. +## Using Pillow to greyscale, apodize, pad, and resize + ```{code-cell} ipython3 from PIL import Image, ImageOps, ImageMath +import numpy as np im_raw = Image.open("alma.jpg") # convert to greyscale im_grey = ImageOps.grayscale(im_raw) -``` -```{code-cell} ipython3 +# get image dimensions xsize, ysize = im_grey.size +print(im_grey.mode) ``` +In converting the JPEG image to greyscale (mode "L" or "luminance"), the Pillow library has reduced the color channel to a single axis with an 8-bit unsigned integer, which can take on the values from 0 to 255. More info on the modes is available [here](https://pillow.readthedocs.io/en/stable/handbook/concepts.html#concept-modes). We can see the greyscale image + ```{code-cell} ipython3 im_grey ``` -Now we'll need to make the image square. Depending on the image, we can either crop the longer dimension or pad the larger dimension. While we're doing this, we should also be thinking about the image edges. +Now let's think about how to make the image square. Depending on the image, we can either crop the longer dimension or pad the larger dimension. Before we do that, though, we also need to think about the image edges. -Because the discrete Fourier transform is involved in taking an image to the visibility plane, we are making the assumption that the image is infinite and periodic in space beyond the field of view. i.e., it tiles to infinity. Therefore, to avoid introducing spurious spatial frequencies, it is a good idea to make sure that the edges of the image all have the same value. The simplest thing to do here is to taper the image edges such that they all go to 0. We can do this by multiplying by an apodization function, like the Hann window. +Because the discrete Fourier transform is used to take an image to the visibility plane, we make the assumption that the image is infinite and periodic in space beyond the field of view. i.e., it tiles to infinity. Therefore, to avoid introducing spurious spatial frequencies from discontinous edges, it is a good idea to make sure that the edges of the image all have the same value. The simplest thing to do here is to taper the image edges such that they all go to 0. We can do this by multiplying by the image by an apodization function, like the Hann window. We'll multiply two 1D Hann windows to create a 2D apodization window. ```{code-cell} ipython3 xhann = np.hanning(xsize) @@ -99,59 +103,62 @@ yhann = np.hanning(ysize) # broadcast to 2D hann = np.outer(yhann, xhann) +# now convert the numpy array to a Pillow object # scale to 0 - 255 and then convert to uint8 hann8 = np.uint8(hann * 255) im_apod = Image.fromarray(hann8) ``` +We can visualize the 2D Hann apodization window + ```{code-cell} ipython3 im_apod ``` -Perform image math to multiply the taper against the greyscaled image +And then use image math to multiply the apodization window against the greyscaled image ```{code-cell} ipython3 im_res = ImageMath.eval("a * b", a=im_grey, b=im_apod) ``` +To give an image with a vignette-like appearance. + ```{code-cell} ipython3 im_res ``` +Now, let's pad the image to be square + ```{code-cell} ipython3 -im_pad = ImageOps.pad(im_res, (1280,1280)) +max_dim = np.maximum(xsize, ysize) +im_pad = ImageOps.pad(im_res, (max_dim, max_dim)) ``` ```{code-cell} ipython3 im_pad ``` +Great, we now have a square, apodized image. The only thing is that a 1280 x 1280 image is still a bit too many pixels for most ALMA observations. I.e., the spatial resolution or "beam size" of most ALMA observations is such that for any single-pointing observation, we wouldn't need this many pixels to represent the full information content of the image. Therefore, let's resize the image to be a bit smaller. + ```{code-cell} ipython3 -im_small = im_pad.resize((500,500)) +npix = 500 +im_small = im_pad.resize((npix,npix)) ``` ```{code-cell} ipython3 im_small ``` -```{code-cell} ipython3 -print(im_grey.mode) -``` +## Exporting to a PyTorch tensor -In converting the JPEG image to greyscale (mode "L" or "luminance"), the Pillow library has reduced the color channel to a single axis with an 8-bit unsigned integer, which can take on the values from 0 to 255. More info on the modes is available [here](https://pillow.readthedocs.io/en/stable/handbook/concepts.html#concept-modes). - -```{code-cell} ipython3 - -``` - -Now that we have resized and tapered the image, we're ready to leave the Pillow library and work with numpy arrays and pytorch tensors. First we convert from a Pillow object to a numpy array +Now that we have done the necessary image preparation, we're ready to leave the Pillow library and work with numpy arrays and pytorch tensors. First we convert from a Pillow object to a numpy array ```{code-cell} ipython3 import numpy as np a = np.array(im_small) ``` -We can see that this array is now a 32-bit integer array (it was promoted during the ImageMath operation to save precision). +We can see that this array is now a 32-bit integer array (it was promoted from an 8-bit integer array during the ImageMath operation to save precision). ```{code-cell} ipython3 a @@ -164,35 +171,46 @@ b = a.astype("float64") b = b/b.max() ``` -Now, we can plot this array using matplotlib's `imshow`, as we might normally do with arrays of data +Now, we can plot this array using matplotlib's `imshow` and using the `origin="lower"` argument as we might normally do with arrays of data from MPoL. +```{margin} MPoL Image Orientations +Now might be a good time to familiarize yourself with the {ref}`cube-orientation-label`, if you aren't already familiar. +``` ```{code-cell} ipython3 import matplotlib.pyplot as plt -plt.imshow(b, origin="upper") +plt.imshow(b, origin="lower") ``` -But the main idea is that the values range from 0 to 255 +In doing so, we've uncovered an additional problem that the image is upside down! We can fix this using ```{code-cell} ipython3 -a.dtype +c = np.flipud(b) +plt.imshow(c, origin="lower") ``` -```{code-cell} ipython3 +In this example, we're only working with a single-channel mock sky brightness distribution, so we only need to add an extra channel dimension to make an image cube. If we were working with a multi-channel sky brightness distribution, we could repeat the above transformations for each channel of the image cube. +```{code-cell} ipython3 +d = np.expand_dims(c, axis=0) ``` -This image is rectangular, with more pixels in the East-West direction compared to North-South. MPoL and the {class}`mpol.images.ImageCube` routines work (for now) under the assumption of square images. To rectify this situation, we will pad the North-South direction with zeros. - -+++ +Now, we'll convert the numpy array to a PyTorch tensor -This is most useful if you already have a real dataset, with real baseline distributions and noise weights. Alternatively, you could acquire some baseline distribution and noise distribution, possibly using CASA's simobserve. +```{code-cell} ipython3 +import torch +img_tensor = torch.tensor(d.copy()) +``` +And finally, we'll shift the tensor from a "Sky Cube" to a "Packed Cube" as the {class}`~mpol.images.ImageCube` expects -How many pixels does it have? +```{code-cell} ipython3 +from mpol import utils +img_tensor_packed = utils.sky_cube_to_packed_cube(img_tensor) +``` -The routine just takes an Image cube, u,v, weights and produces visibilities with noise. +## Initializing {class}`~mpol.images.ImageCube` -But there's another concern about how to put the image cube in, right? I guess that's just a matter of matching the image cube to the size. You may want to pad the image, though. You probably also want to convolve it. +Now let's settle on how big Here is where it would be helpful to have a note about how changing pixel size and image dimensions affects the uv coverage. There needs to be some match up between the image and the uv size. @@ -200,6 +218,60 @@ Adjusting the `cell_size` changes the maximum spatial frequency that can be repr Changing the number of pixels via `npix` will change the number of $u,v$ cells between 0 and the max spatial frequency. +We already defined `npix` when we performed the resize operation. + +```{code-cell} ipython3 +cell_size = 0.03 # arcsec + +from mpol.images import ImageCube +image = ImageCube(cell_size=cell_size, npix=npix, nchan=1, cube=img_tensor_packed) +``` + +```{code-cell} ipython3 +# double check it went in correctly +# plt.imshow(np.squeeze(utils.packed_cube_to_sky_cube(image.forward()).detach().numpy()), origin="lower") +``` + +## Getting baseline distributions + +This is most useful if you already have a real dataset, with real baseline distributions and noise weights. Alternatively, you could acquire some baseline distribution and noise distribution, possibly using CASA's simobserve. + +In this example, we'll just use the baseline distribution from the mock dataset we've used in many of the tutorials. You can see a plot of it in the [Gridding and Diagnostic Images](gridder.md) tutorial. We'll only need the $u,v$ and weight arrays. + +```{code-cell} ipython3 +from astropy.utils.data import download_file + +# load the mock dataset of the ALMA logo +fname = download_file( + "https://zenodo.org/record/4930016/files/logo_cube.noise.npz", + cache=True, + show_progress=True, + pkgname="mpol", +) + +d = np.load(fname) +uu = d["uu"] +vv = d["vv"] +weight = d["weight"] +``` + +```{code-cell} ipython3 +max_uv = np.max(np.array([uu,vv])) +max_cell_size = utils.get_maximum_cell_size(max_uv) +print("The maximum cell_size that will still Nyquist sample the spatial frequency represented by the maximum u,v value is {:.2f} arcseconds".format(max_cell_size)) +``` + +```{code-cell} ipython3 +# will have the same shape as the uu, vv, and weight inputs +data_noise, data_noiseless = make_fake_data(image, u, v, weight) +``` + +How many pixels does it have? + +The routine just takes an Image cube, u,v, weights and produces visibilities with noise. + + + Now, let's put this into a pytorch tensor, flip the directions, and insert it into an ImageCube. ```{code-cell} ipython3 @@ -207,3 +279,15 @@ Now, let's put this into a pytorch tensor, flip the directions, and insert it in ``` We'll use the same u,v distribution and noise distribution from the mock dataset. The max baseline + + +## Making the mock dataset + + +Now you could save this to disk, for example + + + +## Verifying the mock dataset + +To make sure the whole process worked OK, we'll load the visibilities and then make a dirty image. diff --git a/src/mpol/__init__.py b/src/mpol/__init__.py index c4df8287..c6991c7d 100644 --- a/src/mpol/__init__.py +++ b/src/mpol/__init__.py @@ -1 +1 @@ -__version__ = "0.1.3dev" +__version__ = "0.1.13dev" diff --git a/src/mpol/gridding.py b/src/mpol/gridding.py index ff4b720f..e3acd4a7 100644 --- a/src/mpol/gridding.py +++ b/src/mpol/gridding.py @@ -35,9 +35,6 @@ def _check_data_inputs_2d(uu=None, vv=None, weight=None, data_re=None, data_im=N data_im.dtype == np.double ), "data_im should be type single or double" - # check to see that uu, vv and data do not contain Hermitian pairs - assert not contains_hermitian_pairs(uu, vv, data_re + 1.0j * data_im) - if uu.ndim == 1: uu = np.atleast_2d(uu) vv = np.atleast_2d(vv) @@ -45,9 +42,10 @@ def _check_data_inputs_2d(uu=None, vv=None, weight=None, data_re=None, data_im=N data_re = np.atleast_2d(data_re) data_im = np.atleast_2d(data_im) - return uu, vv, weight, data_re, data_im + # check to see that uu, vv and data do not contain Hermitian pairs + assert not contains_hermitian_pairs(uu, vv, data_re + 1.0j * data_im) - # expand to 2d with complex conjugates + return uu, vv, weight, data_re, data_im def contains_hermitian_pairs(uu, vv, data, test_vis=5, test_channel=0): @@ -148,11 +146,11 @@ class Gridder: cell_size (float): width of a single square pixel in [arcsec] npix (int): number of pixels in the width of the image coords (GridCoords): an object already instantiated from the GridCoords class. If providing this, cannot provide ``cell_size`` or ``npix``. - uu (numpy array): array of u spatial frequency coordinates. Units of [:math:`\mathrm{k}\lambda`] - vv (numpy array): (nchan, nvis) length array of v spatial frequency coordinates. Units of [:math:`\mathrm{k}\lambda`] - weight (2d numpy array): (nchan, nvis) length array of thermal weights. Units of [:math:`1/\mathrm{Jy}^2`] - data_re (2d numpy array): (nchan, nvis) length array of the real part of the visibility measurements. Units of [:math:`\mathrm{Jy}`] - data_im (2d numpy array): (nchan, nvis) length array of the imaginary part of the visibility measurements. Units of [:math:`\mathrm{Jy}`] + uu (numpy array): (nchan, nvis) array of u spatial frequency coordinates. Units of [:math:`\mathrm{k}\lambda`] + vv (numpy array): (nchan, nvis) array of v spatial frequency coordinates. Units of [:math:`\mathrm{k}\lambda`] + weight (2d numpy array): (nchan, nvis) array of thermal weights. Units of [:math:`1/\mathrm{Jy}^2`] + data_re (2d numpy array): (nchan, nvis) array of the real part of the visibility measurements. Units of [:math:`\mathrm{Jy}`] + data_im (2d numpy array): (nchan, nvis) array of the imaginary part of the visibility measurements. Units of [:math:`\mathrm{Jy}`] """ diff --git a/src/mpol/utils.py b/src/mpol/utils.py index 84aa5fcc..63219f48 100644 --- a/src/mpol/utils.py +++ b/src/mpol/utils.py @@ -1,17 +1,19 @@ import numpy as np import torch -from .constants import arcsec, cc, c_ms, deg, kB +from . import fourier +from .constants import arcsec, c_ms, cc, deg, kB + def ground_cube_to_packed_cube(ground_cube): r""" Converts a Ground Cube to a Packed Visibility Cube for visibility-plane work. See Units and Conventions for more details. - + Args: ground_cube: a previously initialized Ground Cube object (cube (3D torch tensor of shape ``(nchan, npix, npix)``)) Returns: - torch.double : 3D image cube of shape ``(nchan, npix, npix)``; The resulting array after applying ``torch.fft.fftshift`` to the input arg; i.e Returns a Packed Visibility Cube. + torch.double : 3D image cube of shape ``(nchan, npix, npix)``; The resulting array after applying ``torch.fft.fftshift`` to the input arg; i.e Returns a Packed Visibility Cube. """ shifted = torch.fft.fftshift(ground_cube, dim=(1, 2)) return shifted @@ -20,7 +22,7 @@ def ground_cube_to_packed_cube(ground_cube): def packed_cube_to_ground_cube(packed_cube): r""" Converts a Packed Visibility Cube to a Ground Cube for visibility-plane work. See Units and Conventions for more details. - + Args: packed_cube: a previously initialized Packed Cube object (cube (3D torch tensor of shape ``(nchan, npix, npix)``)) @@ -35,7 +37,7 @@ def packed_cube_to_ground_cube(packed_cube): def sky_cube_to_packed_cube(sky_cube): r""" Converts a Sky Cube to a Packed Image Cube for image-plane work. See Units and Conventions for more details. - + Args: sky_cube: a previously initialized Sky Cube object with RA increasing to the *left* (cube (3D torch tensor of shape ``(nchan, npix, npix)``)) @@ -50,7 +52,7 @@ def sky_cube_to_packed_cube(sky_cube): def packed_cube_to_sky_cube(packed_cube): r""" Converts a Packed Image Cube to a Sky Cube for image-plane work. See Units and Conventions for more details. - + Args: packed_cube: a previously initialized Packed Image Cube object (cube (3D torch tensor of shape ``(nchan, npix, npix)``)) @@ -77,13 +79,13 @@ def get_Jy_arcsec2(T_b, nu=230e9): """ # brightness temperature assuming RJ limit # units of ergs/s/cm^2/Hz/ster - I_nu = T_b * 2 * nu ** 2 * kB / cc ** 2 + I_nu = T_b * 2 * nu**2 * kB / cc**2 # convert to Jy/ster Jy_ster = I_nu * 1e23 # convert to Jy/arcsec^2 - Jy_arcsec2 = Jy_ster * arcsec ** 2 + Jy_arcsec2 = Jy_ster * arcsec**2 return Jy_arcsec2 @@ -221,7 +223,7 @@ def get_maximum_cell_size(uu_vv_point): Args: uu_vv_point (float): a single spatial frequency. Units of [:math:`\mathrm{k}\lambda`]. - Returns: + Returns: cell_size (in arcsec) """ @@ -433,7 +435,7 @@ def fourier_gaussian_lambda_radians(u, v, a, delta_l, delta_m, sigma_l, sigma_m, * 2 * np.pi * np.exp( - -2 * np.pi ** 2 * (sigma_l ** 2 * up ** 2 + sigma_m ** 2 * vp ** 2) + -2 * np.pi**2 * (sigma_l**2 * up**2 + sigma_m**2 * vp**2) - 2.0j * np.pi * (delta_l * u + delta_m * v) ) ) @@ -461,10 +463,41 @@ def fourier_gaussian_klambda_arcsec(u, v, a, delta_x, delta_y, sigma_x, sigma_y, return fourier_gaussian_lambda_radians( 1e3 * u, 1e3 * v, - a / arcsec ** 2, + a / arcsec**2, delta_x * arcsec, delta_y * arcsec, sigma_x * arcsec, sigma_y * arcsec, Omega, ) + + +def make_fake_dataset(imageCube, uu, vv, weight): + r""" + Create a fake dataset from a supplied :class:`mpol.images.ImageCube`. See :ref:`mock-dataset-label` for more details on how to prepare a generic image for use in an :class:`~mpol.images.ImageCube`. + + The provided visibilities can be 1d for a single continuum channel, or 2d for image cube. If 1d, visibilities will be converted to 2d arrays of shape ``(1, nvis)``. + + Args: + imageCube (:class:`~mpol.images.ImageCube`): the image layer to put into a fake dataset + uu (numpy array): (nchan, nvis) array of u spatial frequency coordinates, not including Hermitian pairs. Units of [:math:`\mathrm{k}\lambda`] + vv (numpy array): (nchan, nvis) array of v spatial frequency coordinates, not including Hermitian pairs. Units of [:math:`\mathrm{k}\lambda`] + weight (2d numpy array): (nchan, nvis) length array of thermal weights :math:`w_i = 1/\sigma_i^2`. Units of [:math:`1/\mathrm{Jy}^2`] + + Returns: + (2-tuple): a two tuple of the fake data. The first array is the mock dataset including noise, the second array is the mock dataset without noise. + """ + + # make into a multi-channel dataset, even if only a single-channel provided + if uu.ndim == 1: + uu = np.atleast_2d(uu) + vv = np.atleast_2d(vv) + weight = np.atleast_2d(weight) + + # instantiate a NuFFT object based on the ImageCube + nufft = fourier.NuFFT(coords=imageCube.coords, nchan=imageCube.nchan, uu=uu, vv=vv) + + # carry it forward to the visibilities + vis = nufft.forward(imageCube.forward()) + + return weight From 30efa3c873ef1485ae0d84e86ad66a874212c57e Mon Sep 17 00:00:00 2001 From: Ian Czekala Date: Sun, 25 Dec 2022 23:37:27 -0500 Subject: [PATCH 5/8] untested mock prototype. --- src/mpol/utils.py | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/src/mpol/utils.py b/src/mpol/utils.py index 63219f48..0a745862 100644 --- a/src/mpol/utils.py +++ b/src/mpol/utils.py @@ -498,6 +498,15 @@ def make_fake_dataset(imageCube, uu, vv, weight): nufft = fourier.NuFFT(coords=imageCube.coords, nchan=imageCube.nchan, uu=uu, vv=vv) # carry it forward to the visibilities - vis = nufft.forward(imageCube.forward()) + vis_noiseless = nufft.forward(imageCube.forward()) - return weight + # generate complex noise + sigma = 1 / np.sqrt(weight) + noise = np.random.normal( + loc=0, scale=sigma, size=uu.shape + ) + 1.0j * np.random.normal(loc=0, scale=sigma, size=uu.shape) + + # add to data + vis_noise = vis_noiseless + noise + + return vis_noise, vis_noiseless From dd33f5efb503b963efaec75f37552885ea4e32f2 Mon Sep 17 00:00:00 2001 From: Ian Czekala Date: Mon, 26 Dec 2022 09:41:27 -0500 Subject: [PATCH 6/8] moved fake data to Fourier moduel. --- src/mpol/fourier.py | 69 +++++++++++++++++++++++++++++++++++++-------- src/mpol/utils.py | 41 --------------------------- 2 files changed, 57 insertions(+), 53 deletions(-) diff --git a/src/mpol/fourier.py b/src/mpol/fourier.py index c735f636..a0984913 100644 --- a/src/mpol/fourier.py +++ b/src/mpol/fourier.py @@ -94,6 +94,7 @@ def ground_phase(self): """ return torch.angle(self.ground_vis) + def safe_baseline_constant_meters(uu, vv, freqs, coords, uv_cell_frac=0.05): r""" This routine determines whether the baselines can safely be assumed to be constant with channel when they converted from meters to units of kilolambda. @@ -118,18 +119,18 @@ def safe_baseline_constant_meters(uu, vv, freqs, coords, uv_cell_frac=0.05): Returns: boolean: `True` if it is safe to assume that the baselines are constant with channel (at a tolerance of ``uv_cell_frac``.) Otherwise returns `False`. """ - + # broadcast and convert baselines to kilolambda across channel uu, vv = utils.broadcast_and_convert_baselines(uu, vv, freqs) # should be (nchan, nvis) arrays # convert uv_cell_frac to a kilolambda threshold - delta_uv = uv_cell_frac * coords.du # [klambda] + delta_uv = uv_cell_frac * coords.du # [klambda] # find maximum change in baseline across channel # concatenate arrays to save steps - uv = np.array([uu, vv]) # (2, nchan, nvis) arrays - + uv = np.array([uu, vv]) # (2, nchan, nvis) arrays + # find max - min along channel axis uv_min = uv.min(axis=1) uv_max = uv.max(axis=1) @@ -140,14 +141,14 @@ def safe_baseline_constant_meters(uu, vv, freqs, coords, uv_cell_frac=0.05): # compare to uv_cell_frac return max_diff < delta_uv - + def safe_baseline_constant_kilolambda(uu, vv, coords, uv_cell_frac=0.05): r""" This routine determines whether the baselines can safely be assumed to be constant with channel, when the are represented in units of kilolambda. - Compared to :class:`mpol.fourier.safe_baseline_constant_meters`, this function works with multidimensional arrays of ``uu`` and ``vv`` that are shape (nchan, nvis) and have units of kilolambda. - + Compared to :class:`mpol.fourier.safe_baseline_constant_meters`, this function works with multidimensional arrays of ``uu`` and ``vv`` that are shape (nchan, nvis) and have units of kilolambda. + If this routine returns True, then it should be safe for the user to either average the baselines across channel or simply choose a single, representative channel. This would enable parallelization in the {class}`mpol.fourier.NuFFT` via the coil dimension. Args: @@ -162,12 +163,12 @@ def safe_baseline_constant_kilolambda(uu, vv, coords, uv_cell_frac=0.05): """ # convert uv_cell_frac to a kilolambda threshold - delta_uv = uv_cell_frac * coords.du # [klambda] + delta_uv = uv_cell_frac * coords.du # [klambda] # find maximum change in baseline across channel # concatenate arrays to save steps - uv = np.array([uu, vv]) # (2, nchan, nvis) arrays - + uv = np.array([uu, vv]) # (2, nchan, nvis) arrays + # find max - min along channel axis uv_min = uv.min(axis=1) uv_max = uv.max(axis=1) @@ -178,7 +179,7 @@ def safe_baseline_constant_kilolambda(uu, vv, coords, uv_cell_frac=0.05): # compare to uv_cell_frac return max_diff < delta_uv - + class NuFFT(nn.Module): r""" @@ -247,7 +248,11 @@ def __init__( ) else: import warnings - warnings.warn("Provided uu and vv arrays are multi-dimensional, suggesting an intent to parallelize using the 'batch' dimension. This feature is not yet available in TorchKbNuFFT v1.4.0 with sparse matrix interpolation (sparse_matrices=True), therefore we are proceeding with table interpolation (sparse_matrices=False).", category=RuntimeWarning) + + warnings.warn( + "Provided uu and vv arrays are multi-dimensional, suggesting an intent to parallelize using the 'batch' dimension. This feature is not yet available in TorchKbNuFFT v1.4.0 with sparse matrix interpolation (sparse_matrices=True), therefore we are proceeding with table interpolation (sparse_matrices=False).", + category=RuntimeWarning, + ) self.interp_mats = None self.sparse_matrices = False @@ -390,3 +395,43 @@ def forward(self, cube): output = torch.squeeze(output, dim=1) return output + + +def make_fake_dataset(imageCube, uu, vv, weight): + r""" + Create a fake dataset from a supplied :class:`mpol.images.ImageCube`. See :ref:`mock-dataset-label` for more details on how to prepare a generic image for use in an :class:`~mpol.images.ImageCube`. + + The provided visibilities can be 1d for a single continuum channel, or 2d for image cube. If 1d, visibilities will be converted to 2d arrays of shape ``(1, nvis)``. + + Args: + imageCube (:class:`~mpol.images.ImageCube`): the image layer to put into a fake dataset + uu (numpy array): (nchan, nvis) array of u spatial frequency coordinates, not including Hermitian pairs. Units of [:math:`\mathrm{k}\lambda`] + vv (numpy array): (nchan, nvis) array of v spatial frequency coordinates, not including Hermitian pairs. Units of [:math:`\mathrm{k}\lambda`] + weight (2d numpy array): (nchan, nvis) length array of thermal weights :math:`w_i = 1/\sigma_i^2`. Units of [:math:`1/\mathrm{Jy}^2`] + + Returns: + (2-tuple): a two tuple of the fake data. The first array is the mock dataset including noise, the second array is the mock dataset without noise. + """ + + # make into a multi-channel dataset, even if only a single-channel provided + if uu.ndim == 1: + uu = np.atleast_2d(uu) + vv = np.atleast_2d(vv) + weight = np.atleast_2d(weight) + + # instantiate a NuFFT object based on the ImageCube + nufft = fourier.NuFFT(coords=imageCube.coords, nchan=imageCube.nchan, uu=uu, vv=vv) + + # carry it forward to the visibilities + vis_noiseless = nufft.forward(imageCube.forward()) + + # generate complex noise + sigma = 1 / np.sqrt(weight) + noise = np.random.normal( + loc=0, scale=sigma, size=uu.shape + ) + 1.0j * np.random.normal(loc=0, scale=sigma, size=uu.shape) + + # add to data + vis_noise = vis_noiseless + noise + + return vis_noise, vis_noiseless diff --git a/src/mpol/utils.py b/src/mpol/utils.py index 0a745862..a3e0e7a1 100644 --- a/src/mpol/utils.py +++ b/src/mpol/utils.py @@ -1,7 +1,6 @@ import numpy as np import torch -from . import fourier from .constants import arcsec, c_ms, cc, deg, kB @@ -470,43 +469,3 @@ def fourier_gaussian_klambda_arcsec(u, v, a, delta_x, delta_y, sigma_x, sigma_y, sigma_y * arcsec, Omega, ) - - -def make_fake_dataset(imageCube, uu, vv, weight): - r""" - Create a fake dataset from a supplied :class:`mpol.images.ImageCube`. See :ref:`mock-dataset-label` for more details on how to prepare a generic image for use in an :class:`~mpol.images.ImageCube`. - - The provided visibilities can be 1d for a single continuum channel, or 2d for image cube. If 1d, visibilities will be converted to 2d arrays of shape ``(1, nvis)``. - - Args: - imageCube (:class:`~mpol.images.ImageCube`): the image layer to put into a fake dataset - uu (numpy array): (nchan, nvis) array of u spatial frequency coordinates, not including Hermitian pairs. Units of [:math:`\mathrm{k}\lambda`] - vv (numpy array): (nchan, nvis) array of v spatial frequency coordinates, not including Hermitian pairs. Units of [:math:`\mathrm{k}\lambda`] - weight (2d numpy array): (nchan, nvis) length array of thermal weights :math:`w_i = 1/\sigma_i^2`. Units of [:math:`1/\mathrm{Jy}^2`] - - Returns: - (2-tuple): a two tuple of the fake data. The first array is the mock dataset including noise, the second array is the mock dataset without noise. - """ - - # make into a multi-channel dataset, even if only a single-channel provided - if uu.ndim == 1: - uu = np.atleast_2d(uu) - vv = np.atleast_2d(vv) - weight = np.atleast_2d(weight) - - # instantiate a NuFFT object based on the ImageCube - nufft = fourier.NuFFT(coords=imageCube.coords, nchan=imageCube.nchan, uu=uu, vv=vv) - - # carry it forward to the visibilities - vis_noiseless = nufft.forward(imageCube.forward()) - - # generate complex noise - sigma = 1 / np.sqrt(weight) - noise = np.random.normal( - loc=0, scale=sigma, size=uu.shape - ) + 1.0j * np.random.normal(loc=0, scale=sigma, size=uu.shape) - - # add to data - vis_noise = vis_noiseless + noise - - return vis_noise, vis_noiseless From 94731923a85efadae12c8d5e3b33c2a9ae7bf156 Mon Sep 17 00:00:00 2001 From: Ian Czekala Date: Mon, 26 Dec 2022 17:02:06 -0500 Subject: [PATCH 7/8] completed mock data tutorial. --- .gitignore | 7 + docs/ci-tutorials/fakedata.md | 172 ++++++++++++++++++---- docs/ci-tutorials/initializedirtyimage.md | 2 +- src/mpol/fourier.py | 21 +-- 4 files changed, 159 insertions(+), 43 deletions(-) diff --git a/.gitignore b/.gitignore index 8995a302..5ee51227 100644 --- a/.gitignore +++ b/.gitignore @@ -120,6 +120,10 @@ venv/ # notebooks produced from jupytext docs/ci-tutorials/*.ipynb +docs/ci-tutorials/alma.jpg +docs/ci-tutorials/mock_data.npz + + # tensorboard outputs docs/ci-tutorials/runs docs/large-tutorials/runs @@ -134,3 +138,6 @@ dirty_image_model.pt # setup file project_setup.sh + +plotsdir +runs diff --git a/docs/ci-tutorials/fakedata.md b/docs/ci-tutorials/fakedata.md index 57871553..f72dd044 100644 --- a/docs/ci-tutorials/fakedata.md +++ b/docs/ci-tutorials/fakedata.md @@ -138,7 +138,11 @@ im_pad = ImageOps.pad(im_res, (max_dim, max_dim)) im_pad ``` -Great, we now have a square, apodized image. The only thing is that a 1280 x 1280 image is still a bit too many pixels for most ALMA observations. I.e., the spatial resolution or "beam size" of most ALMA observations is such that for any single-pointing observation, we wouldn't need this many pixels to represent the full information content of the image. Therefore, let's resize the image to be a bit smaller. +Great, we now have a square, apodized image. +```{margin} Simulations +We should note that all of these pre-processing steps were only necessary because we pulled a non-square JPEG image from the internet. If we were starting from an image produced from a radiative transfer situation (for example, a solitary protoplanetary disk in the center of a field), we could skip most of these previous steps. +``` +The next thing we should fix is that a 1280 x 1280 image is still a bit too many pixels for most ALMA observations. I.e., the spatial resolution or "beam size" of most ALMA observations is such that for any single-pointing observation, we wouldn't need this many pixels to represent the full information content of the image. Therefore, let's resize the image to be a bit smaller. ```{code-cell} ipython3 npix = 500 @@ -149,7 +153,7 @@ im_small = im_pad.resize((npix,npix)) im_small ``` -## Exporting to a PyTorch tensor +## Exporting to a Numpy array and setting flux scale Now that we have done the necessary image preparation, we're ready to leave the Pillow library and work with numpy arrays and pytorch tensors. First we convert from a Pillow object to a numpy array @@ -194,49 +198,76 @@ In this example, we're only working with a single-channel mock sky brightness di d = np.expand_dims(c, axis=0) ``` -Now, we'll convert the numpy array to a PyTorch tensor +Now let's choose how big we want our mock sky brightness to be on the sky. Adjusting the `cell_size` changes the maximum spatial frequency that can be represented in the image. I.e., a smaller pixel `cell_size` will enable an image to carry higher spatial frequencies. Changing the number of pixels in the image via `npix` will change the number of $u,v$ cells between 0 and the max spatial frequency. We effectively chose the `npix` when we performed the resize operation, so all that's left is to choose the `cell_size`. ```{code-cell} ipython3 -import torch -img_tensor = torch.tensor(d.copy()) +cell_size = 0.03 # arcsec ``` -And finally, we'll shift the tensor from a "Sky Cube" to a "Packed Cube" as the {class}`~mpol.images.ImageCube` expects +The final task is to scale the amplitude of the image to the desired level. The {class}`~mpol.images.ImageCube` object will expect the input tensor to be in units of Jy/arcsec^2. + +Let's assume that we would like the total flux of our mock image to be 30 Jy, which a very bright source for ALMA band 6. Then again, the noise levels in the mock baseline distribution we plan to use are relatively high, the baseline distribution lacks short spacings, and we want to make sure our source shines through. + +So, if we have assigned each pixel to be 0.03 arcseconds on each side, then each pixel has an area of ```{code-cell} ipython3 -from mpol import utils -img_tensor_packed = utils.sky_cube_to_packed_cube(img_tensor) +pixel_area = cell_size**2 # arcsec +print(pixel_area, "arcsec^2") ``` -## Initializing {class}`~mpol.images.ImageCube` +What is the current flux of the image? -Now let's settle on how big +```{code-cell} ipython3 +# if the raw image is supposed to be in Jy/arcsec^2, then to calculate +# total flux, we would convert to Jy/pixel by multiplying area / pixel +# and then summing all values +old_flux = np.sum(d * pixel_area) +print(old_flux, "Jy") +``` + +So, if we want the image to have a total flux of 30 Jy, we need to multiply by a factor of -Here is where it would be helpful to have a note about how changing pixel size and image dimensions affects the uv coverage. There needs to be some match up between the image and the uv size. +```{code-cell} ipython3 +flux_scaled = 30/old_flux * d +``` -Adjusting the `cell_size` changes the maximum spatial frequency that can be represented in the image. I.e., a smaller pixel cell size will enable an image to carry higher spatial frequencies. +```{code-cell} ipython3 +print("Total flux of image is now {:.1f} Jy".format(np.sum(flux_scaled * pixel_area))) +``` -Changing the number of pixels via `npix` will change the number of $u,v$ cells between 0 and the max spatial frequency. +## Initializing {class}`~mpol.images.ImageCube` -We already defined `npix` when we performed the resize operation. +Now, we'll convert the numpy array to a PyTorch tensor ```{code-cell} ipython3 -cell_size = 0.03 # arcsec +import torch +img_tensor = torch.tensor(flux_scaled.copy()) +``` + +And finally, we'll shift the tensor from a "Sky Cube" to a "Packed Cube" as the {class}`~mpol.images.ImageCube` expects +```{code-cell} ipython3 +from mpol import utils +img_tensor_packed = utils.sky_cube_to_packed_cube(img_tensor) +``` + +```{code-cell} ipython3 from mpol.images import ImageCube image = ImageCube(cell_size=cell_size, npix=npix, nchan=1, cube=img_tensor_packed) ``` -```{code-cell} ipython3 +If you want to double-check that the image was correctly inserted, you can do +``` # double check it went in correctly -# plt.imshow(np.squeeze(utils.packed_cube_to_sky_cube(image.forward()).detach().numpy()), origin="lower") +plt.imshow(np.squeeze(utils.packed_cube_to_sky_cube(image.forward()).detach().numpy()), origin="lower") ``` +to see that it's upright and not flipped. -## Getting baseline distributions +## Obtaining $u,v$ baseline and weight distributions -This is most useful if you already have a real dataset, with real baseline distributions and noise weights. Alternatively, you could acquire some baseline distribution and noise distribution, possibly using CASA's simobserve. +One of the key use cases for producing a mock dataset from a known sky brightness is to test the ability of an RML algorithm to recover the "true" image. $u,v$ baseline distributions from real interferometric arrays like ALMA, VLA, and others are highly structured sampling distributions that are difficult to accurately replicate using distributions available to random number generators. -In this example, we'll just use the baseline distribution from the mock dataset we've used in many of the tutorials. You can see a plot of it in the [Gridding and Diagnostic Images](gridder.md) tutorial. We'll only need the $u,v$ and weight arrays. +Therefore, we always recommend generating fake data using $u,v$ distributions from real datasets, or use those produced using realistic simulators like CASA's [simobserve](https://casadocs.readthedocs.io/en/latest/api/tt/casatasks.simulation.simobserve.html) task. In this example, we'll just use the baseline distribution from the mock dataset we've used in many of the tutorials. You can see a plot of it in the [Gridding and Diagnostic Images](gridder.md) tutorial. We'll only need the $u,v$ and weight arrays. ```{code-cell} ipython3 from astropy.utils.data import download_file @@ -249,45 +280,122 @@ fname = download_file( pkgname="mpol", ) +# select the components for a single channel +chan = 4 d = np.load(fname) -uu = d["uu"] -vv = d["vv"] -weight = d["weight"] +uu = d["uu"][chan] +vv = d["vv"][chan] +weight = d["weight"][chan] ``` +MPoL has a helper routine to calculate the maximum `cell_size` that can still Nyquist sample the highest spatial frequency in the baseline distribution. + ```{code-cell} ipython3 max_uv = np.max(np.array([uu,vv])) max_cell_size = utils.get_maximum_cell_size(max_uv) print("The maximum cell_size that will still Nyquist sample the spatial frequency represented by the maximum u,v value is {:.2f} arcseconds".format(max_cell_size)) +assert cell_size < max_cell_size ``` +Thankfully, we see that we already chose a sufficiently small `cell_size`. + +## Making the mock dataset + +With the {class}`~mpol.images.ImageCube`, $u,v$ and weight distributions now in hand, generating the mock visibilities is relatively straightforward using the {func}`mpol.fourier.make_fake_data` routine. This routine uses the {class}`~mpol.fourier.NuFFT` to produce loose visibilities at the $u,v$ locations and then adds random Gaussian noise to the visibilities, drawn from a probability distribution set by the value of the weights. + ```{code-cell} ipython3 +from mpol import fourier # will have the same shape as the uu, vv, and weight inputs -data_noise, data_noiseless = make_fake_data(image, u, v, weight) +data_noise, data_noiseless = fourier.make_fake_data(image, uu, vv, weight) + +print(data_noise.shape) +print(data_noiseless.shape) +print(data_noise) ``` -How many pixels does it have? +Now you could save this to disk. Since this is continuum dataset, we'll remove the channel dimension from the mock visibilities -The routine just takes an Image cube, u,v, weights and produces visibilities with noise. +```{code-cell} ipython3 +data = np.squeeze(data_noise) +# data = np.squeeze(data_noiseless) +np.savez("mock_data.npz", uu=uu, vv=vv, weight=weight, data=data) +``` + +And now you could use this dataset just like any other when doing RML inference, and now you will have a reference image to compare "ground truth" to. ++++ +## Verifying the mock dataset -Now, let's put this into a pytorch tensor, flip the directions, and insert it into an ImageCube. +To make sure the whole process worked OK, we'll load the visibilities and then make a dirty image. We'll set the coordinates of the gridder and dirty image to be exactly those as our input image, so that we can make a pixel-to-pixel comparison. Note that this isn't strictly necessary, though. We could make a range of images with different `cell_size`s and `npix`s. ```{code-cell} ipython3 +from mpol import coordinates, gridding + +# well set the +coords = coordinates.GridCoords(cell_size=cell_size, npix=npix) + +gridder = gridding.Gridder( + coords=coords, + uu=uu, + vv=vv, + weight=weight, + data_re=np.squeeze(np.real(data)), + data_im=np.squeeze(np.imag(data)), +) +``` +```{code-cell} ipython3 +C = 1 / np.sum(weight) +noise_estimate = C * np.sqrt(np.sum(weight)) +print(noise_estimate, "Jy / dirty beam") ``` -We'll use the same u,v distribution and noise distribution from the mock dataset. The max baseline +```{code-cell} ipython3 +img, beam = gridder.get_dirty_image(weighting="briggs", robust=1.0, unit="Jy/arcsec^2") +``` +```{code-cell} ipython3 +chan = 0 +kw = {"origin": "lower", "interpolation": "none", "extent": gridder.coords.img_ext} +fig, ax = plt.subplots(ncols=2, figsize=(6.0, 4)) +ax[0].imshow(beam[chan], **kw) +ax[0].set_title("beam") +ax[1].imshow(img[chan], **kw) +ax[1].set_title("image") +for a in ax: + a.set_xlabel(r"$\Delta \alpha \cos \delta$ [${}^{\prime\prime}$]") + a.set_ylabel(r"$\Delta \delta$ [${}^{\prime\prime}$]") +fig.subplots_adjust(left=0.14, right=0.90, wspace=0.35, bottom=0.15, top=0.9) +``` -## Making the mock dataset +We can even subtract this on a pixel-by-pixel basis and compare to the original image. +```{code-cell} ipython3 +chan = 0 +kw = {"origin": "lower", "interpolation": "none", "extent": gridder.coords.img_ext} +fig, ax = plt.subplots(ncols=3, figsize=(6.0, 3)) -Now you could save this to disk, for example +ax[0].imshow(flux_scaled[chan], **kw) +ax[0].set_title("original") +ax[1].imshow(img[chan], **kw) +ax[1].set_title("dirty image") +ax[2].imshow(flux_scaled[chan] - img[chan], **kw) +ax[2].set_title("difference") -## Verifying the mock dataset +ax[0].set_xlabel(r"$\Delta \alpha \cos \delta$ [${}^{\prime\prime}$]") +ax[0].set_ylabel(r"$\Delta \delta$ [${}^{\prime\prime}$]") + +for a in ax[1:]: + a.xaxis.set_ticklabels([]) + a.yaxis.set_ticklabels([]) + +fig.subplots_adjust(left=0.14, right=0.90, wspace=0.2, bottom=0.15, top=0.9) +``` -To make sure the whole process worked OK, we'll load the visibilities and then make a dirty image. +The subtraction revears some interesting artefacts. +1. the dirty image and difference image have substantial emission in regions away from the true locations of flux. This is because the dirty beam sidelobes spread flux from the center of the image to other regions. CLEAN or RML would remove most of these features. +2. the difference image has fine-featured residuals in the center, corresponding to the edges of the antenna dishes and support structures. This is because the dirty beam has some Briggs weighting applied to it, and is closer to natural weighting than uniform weighting. This means that the spatial resolution of the dirty image is not as high as the original image, and thus high spatial frequency features, like the edges of the antennae, are not reproduced in the dirty image. Pushing the beam closer to uniform weighting would capture some of these finer structured features, but at the expense of higher thermal noise in the image. +3. the faint "halo" surrounding the antennas in the original image (the smooth blue sky and brown ground, in the actual JPEG) has been spatially filtered out of the dirty image. This is because this mock baseline distribution was generated for a more extended ALMA configuration without a sufficient number of short baselines. diff --git a/docs/ci-tutorials/initializedirtyimage.md b/docs/ci-tutorials/initializedirtyimage.md index 9b13af71..276da74b 100644 --- a/docs/ci-tutorials/initializedirtyimage.md +++ b/docs/ci-tutorials/initializedirtyimage.md @@ -81,7 +81,7 @@ Now let's calculate the dirty image. Here we're using Briggs weighting with a ro ```{code-cell} # Calculate the dirty image -img, beam = gridder.get_dirty_image(weighting="briggs", robust=1.0, unit="Jy/beam") +img, beam = gridder.get_dirty_image(weighting="briggs", robust=1.0, unit="Jy/arcsec^2") ``` Let's visualize this dirty image. Here we're using an aggressive colormap to highlight the many negative flux pixels contained in this image. diff --git a/src/mpol/fourier.py b/src/mpol/fourier.py index a0984913..69ac15d2 100644 --- a/src/mpol/fourier.py +++ b/src/mpol/fourier.py @@ -397,7 +397,7 @@ def forward(self, cube): return output -def make_fake_dataset(imageCube, uu, vv, weight): +def make_fake_data(imageCube, uu, vv, weight): r""" Create a fake dataset from a supplied :class:`mpol.images.ImageCube`. See :ref:`mock-dataset-label` for more details on how to prepare a generic image for use in an :class:`~mpol.images.ImageCube`. @@ -405,25 +405,26 @@ def make_fake_dataset(imageCube, uu, vv, weight): Args: imageCube (:class:`~mpol.images.ImageCube`): the image layer to put into a fake dataset - uu (numpy array): (nchan, nvis) array of u spatial frequency coordinates, not including Hermitian pairs. Units of [:math:`\mathrm{k}\lambda`] - vv (numpy array): (nchan, nvis) array of v spatial frequency coordinates, not including Hermitian pairs. Units of [:math:`\mathrm{k}\lambda`] - weight (2d numpy array): (nchan, nvis) length array of thermal weights :math:`w_i = 1/\sigma_i^2`. Units of [:math:`1/\mathrm{Jy}^2`] + uu (numpy array): array of u spatial frequency coordinates, not including Hermitian pairs. Units of [:math:`\mathrm{k}\lambda`] + vv (numpy array): array of v spatial frequency coordinates, not including Hermitian pairs. Units of [:math:`\mathrm{k}\lambda`] + weight (2d numpy array): length array of thermal weights :math:`w_i = 1/\sigma_i^2`. Units of [:math:`1/\mathrm{Jy}^2`] Returns: - (2-tuple): a two tuple of the fake data. The first array is the mock dataset including noise, the second array is the mock dataset without noise. + (2-tuple): a two tuple of the fake data. The first array is the mock dataset including noise, the second array is the mock dataset without added noise. """ + # instantiate a NuFFT object based on the ImageCube + # OK if uu shape (nvis,) + nufft = NuFFT(coords=imageCube.coords, nchan=imageCube.nchan, uu=uu, vv=vv) + # make into a multi-channel dataset, even if only a single-channel provided if uu.ndim == 1: uu = np.atleast_2d(uu) vv = np.atleast_2d(vv) weight = np.atleast_2d(weight) - # instantiate a NuFFT object based on the ImageCube - nufft = fourier.NuFFT(coords=imageCube.coords, nchan=imageCube.nchan, uu=uu, vv=vv) - - # carry it forward to the visibilities - vis_noiseless = nufft.forward(imageCube.forward()) + # carry it forward to the visibilities, which will be (nchan, nvis) + vis_noiseless = nufft.forward(imageCube.forward()).detach().numpy() # generate complex noise sigma = 1 / np.sqrt(weight) From 32c24a03238e1380a7c24c9deda4b24ad6737586 Mon Sep 17 00:00:00 2001 From: Ian Czekala Date: Mon, 26 Dec 2022 17:25:16 -0500 Subject: [PATCH 8/8] bumped version and added changelog. --- docs/Makefile | 7 ++----- docs/changelog.md | 7 ++++++- src/mpol/__init__.py | 2 +- 3 files changed, 9 insertions(+), 7 deletions(-) diff --git a/docs/Makefile b/docs/Makefile index f7e346a6..82a027f7 100644 --- a/docs/Makefile +++ b/docs/Makefile @@ -16,15 +16,12 @@ help: CI-NOTEBOOKS := ci-tutorials/PyTorch.ipynb ci-tutorials/gridder.ipynb ci-tutorials/optimization.ipynb ci-tutorials/crossvalidation.ipynb ci-tutorials/initializedirtyimage.ipynb clean: rm -rf _build - # rm -rf ${CI-NOTEBOOKS} rm -rf ci-tutorials/.ipynb_checkpoints rm -rf ci-tutorials/runs - rm -rf ${CHARTS} + rm -rf ci-tutorials/alma.jpg + rm -rf ci-tutorials/mock_data.npz rm -rf _static/baselines/build/baselines.csv -# ci-tutorials/%.ipynb: ci-tutorials/%.py ${CHARTS} - # jupytext --to ipynb --execute $< - # baseline table _static/baselines/build/baselines.csv: _static/baselines/src/print_conversions.py mkdir -p _static/baselines/build diff --git a/docs/changelog.md b/docs/changelog.md index 0b072bd1..95b9f32c 100644 --- a/docs/changelog.md +++ b/docs/changelog.md @@ -2,11 +2,16 @@ # Changelog +## v0.1.3 + +- Added the {func}`mpol.fourier.make_fake_data` routine and the [Mock Data tutorial](ci-tutorials/fakedata.md). +- Fixed a bug in the [Dirty Image Initialization](ci-tutorials/initializedirtyimage.md) tutorial so that the dirty image is delivered in units of Jy/arcsec^2. + ## v0.1.2 - Switched documentation backend to [MyST-NB](https://myst-nb.readthedocs.io/en/latest/index.html). - Switched documentation theme to [Sphinx Book Theme](https://sphinx-book-theme.readthedocs.io/en/latest/index.html). -- Added {class}`~mpol.fourier.NuFFT` layer, allowing the direct forward modeling of un-gridded :math:`u,v` data. Closes GitHub issue [#17](https://github.com/MPoL-dev/MPoL/issues/17). +- Added {class}`~mpol.fourier.NuFFT` layer, allowing the direct forward modeling of un-gridded $u,v$ data. Closes GitHub issue [#17](https://github.com/MPoL-dev/MPoL/issues/17). ## v0.1.1 diff --git a/src/mpol/__init__.py b/src/mpol/__init__.py index c6991c7d..3cb7d95e 100644 --- a/src/mpol/__init__.py +++ b/src/mpol/__init__.py @@ -1 +1 @@ -__version__ = "0.1.13dev" +__version__ = "0.1.13"