Skip to content

Commit

Permalink
Implement NuFFT capability
Browse files Browse the repository at this point in the history
The ability to interpolate to "loose" visibility points is implemented through the non-uniform FFT or NuFFT. 

Closes #17
  • Loading branch information
iancze authored Dec 25, 2022
2 parents 9a2c52c + 62d1a7e commit e31d454
Show file tree
Hide file tree
Showing 32 changed files with 1,721 additions and 369 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/package.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ on:

jobs:
deploy:
runs-on: ubuntu-latest
runs-on: ubuntu-20.04

steps:
- uses: actions/checkout@v3
Expand Down
9 changes: 2 additions & 7 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ jobs:
# (but don't deploy to gh-pages)
docs:
needs: tests # don't bother running if a test failed
runs-on: ubuntu-latest
runs-on: ubuntu-20.04
steps:
- uses: actions/checkout@v3
- name: Set up Python
Expand All @@ -87,11 +87,6 @@ jobs:
- name: Install Pandoc dependency
run: |
sudo apt-get install pandoc
- name: Set up node
uses: actions/setup-node@v2
- name: Install mermaid.js dependency
run: |
npm install @mermaid-js/mermaid-cli
- name: Cache/Restore the .mpol folder cache
uses: actions/cache@v2
env:
Expand All @@ -104,4 +99,4 @@ jobs:
- name: Build the docs
run: |
make -C docs clean
make -C docs html MERMAID_PATH="../node_modules/.bin/"
make -C docs html
8 changes: 1 addition & 7 deletions docs/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,6 @@ help:
.PHONY: help Makefile html clean

CI-NOTEBOOKS := ci-tutorials/PyTorch.ipynb ci-tutorials/gridder.ipynb ci-tutorials/optimization.ipynb ci-tutorials/crossvalidation.ipynb ci-tutorials/initializedirtyimage.ipynb
CHARTS := _static/mmd/build/SimpleNet.svg _static/mmd/build/ImageCube.svg _static/mmd/build/BaseCube.svg _static/mmd/build/SkyModel.svg
clean:
rm -rf _build
# rm -rf ${CI-NOTEBOOKS}
Expand All @@ -36,10 +35,5 @@ _static/fftshift/build/plot.png: _static/fftshift/src/plot.py
mkdir -p _static/fftshift/build
python _static/fftshift/src/plot.py $@

# mermaid.js files
_static/mmd/build/%.svg: _static/mmd/src/%.mmd
mkdir -p _static/mmd/build
${MERMAID_PATH}mmdc -i $^ -o $@

html: ${CHARTS} _static/baselines/build/baselines.csv _static/fftshift/build/plot.png
html: _static/baselines/build/baselines.csv _static/fftshift/build/plot.png
python -m sphinx -T -E -b html -d _build/doctrees -D language=en . _build/html
12 changes: 5 additions & 7 deletions docs/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,19 +22,21 @@ Gridding
--------

.. automodule:: mpol.gridding
:members:

Datasets and Cross-Validation
-----------------------------

.. automodule:: mpol.datasets
:members:

Images
------

.. automodule:: mpol.images
:members:

Fourier
-------

.. automodule:: mpol.fourier


Precomposed Modules
Expand All @@ -43,14 +45,11 @@ Precomposed Modules
For convenience, we provide some "precomposed" `modules <https://pytorch.org/docs/stable/notes/modules.html>`_ which may be useful for simple imaging or modeling applications. In general, though, we encourage you to compose your own set of layers if your application requires it. The source code for a precomposed network can provide useful a starting point. We also recommend checking out the PyTorch documentation on `modules <https://pytorch.org/docs/stable/notes/modules.html>`_.

.. automodule:: mpol.precomposed
:members:


Losses
------

.. automodule:: mpol.losses
:members:


Connectors
Expand All @@ -61,4 +60,3 @@ The objects in the Images and Precomposed modules are focused on bringing some i
Connectors are a PyTorch layer to help compute those residual visibilities (on a gridded form).

.. automodule:: mpol.connectors
:members:
2 changes: 2 additions & 0 deletions docs/changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@
## v0.1.2

- Switched documentation backend to [MyST-NB](https://myst-nb.readthedocs.io/en/latest/index.html).
- Switched documentation theme to [Sphinx Book Theme](https://sphinx-book-theme.readthedocs.io/en/latest/index.html).
- Added {class}`~mpol.fourier.NuFFT` layer, allowing the direct forward modeling of un-gridded :math:`u,v` data. Closes GitHub issue [#17](https://github.com/MPoL-dev/MPoL/issues/17).

## v0.1.1

Expand Down
33 changes: 14 additions & 19 deletions docs/ci-tutorials/PyTorch.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,17 +14,16 @@ kernelspec:

```{code-cell}
:tags: [hide-cell]
%matplotlib inline
%run notebook_setup
```

# Introduction to PyTorch: Tensors and Gradient Descent

This tutorial provides an introduction to PyTorch tensors, automatic differentiation, and optimization with gradient descent.
This tutorial provides a gentle introduction to PyTorch tensors, automatic differentiation, and optimization with gradient descent outside of any specifics about radio interferometry or the MPoL package itself.

## Introduction to Tensors

Tensors are matrices, similar to numpy arrays, with the added benefit that they can be used to calculate gradients (more on that later). MPoL is built on PyTorch, and uses a form of gradient descent optimization to find the "best" image given a dataset and choice of regularizers.
Tensors are multi-dimensional arrays, similar to numpy arrays, with the added benefit that they can be used to calculate gradients (more on that later). MPoL is built on the [PyTorch](https://pytorch.org/) machine learning library, and uses a form of gradient descent optimization to find the "best" image given some dataset and loss function, which may include regularizers.

We'll start this tutorial by importing the torch and numpy packages. Make sure you have [PyTorch installed](https://pytorch.org/get-started/locally/) before proceeding.

Expand Down Expand Up @@ -65,7 +64,7 @@ print(f"Torch tensor multiplication result: {prod_tensor}")

+++

PyTorch provides a key functionality---the ability to calculate the gradients on tensors. Let's start by creating a tensor with a single value. Here we are setting ``requires_grad = True``, we'll see why this is important in a moment.
PyTorch allows us to calculate the gradients on tensors, which is a key functionality underlying MPoL. Let's start by creating a tensor with a single value. Here we are setting ``requires_grad = True``; we'll see why this is important in a moment.

```{code-cell}
x = torch.tensor(3.0, requires_grad=True)
Expand All @@ -78,7 +77,7 @@ Let's define some variable $y$ in terms of $x$:
y = x ** 2
```

We see that the value of $y$ is as we expect---nothing too strange here.
We see that the value of $y$ is as we expect---nothing too strange here.

```{code-cell}
print(f"x: {x}")
Expand All @@ -87,26 +86,24 @@ print(f"y: {y}")

But what if we wanted to calculate the gradient of $y$ with respect to $x$? Using calculus, we find that the answer is $\frac{dy}{dx} = 2x$. The derivative evaluated at $x = 3$ is $6$.

The magic is that can use PyTorch to get the same answer---no analytic derivative needed!
We can use PyTorch to get the same answer---no analytic derivative needed!

```{code-cell}
y.backward() # populates gradient (.grad) attributes of y with respect to all of its independent variables
x.grad # returns the grad attribute (the gradient) of y with respect to x
```

PyTorch uses the concept of automatic differentiation to calculate the derivative. Instead of computing the derivative as we would by hand, the program is using a computational graph and mechanistic application of the chain rule. For example, a computational graph with several operations on $x$ resulting in a final output $y$ will use the chain rule to compute the differential associated with each operation and multiply these differentials together to get the derivative of $y$ with respect to $x$.
PyTorch uses the concept of [automatic differentiation](https://arxiv.org/abs/1502.05767) to calculate the derivative. Instead of computing the derivative as we would by hand, the program uses a computational graph and the mechanistic application of the chain rule. For example, a computational graph with several operations on $x$ resulting in a final output $y$ will use the chain rule to compute the differential associated with each operation and multiply these differentials together to get the derivative of $y$ with respect to $x$.

+++

## Optimizing a Function with Gradient Descent

If we were on the side of a hill in the dark and we wanted to get down to the bottom of a valley, how would we do it?
If we were on the side of a hill in the dark and we wanted to get down to the bottom of a valley, how might we do it?

We wouldn't be able to see all the way to the bottom of the valley, but we could feel which way is down based on the incline of where we are standing. We would take steps in the downward direction and we'd know when to stop when the ground felt flat.
We can't see all the way to the bottom of the valley, but we can feel which way is down based on the incline of where we are standing. We might take steps in the downward direction and we'd know when to stop when the ground finally felt flat. We would also need to consider how large our steps should be. If we take very small steps, it will take us a longer time than if we take larger steps. However, if we take large leaps, we might completely miss the flat part of the valley, and jump straight across to the other side of the valley.

Before we leap, though, we need to consider how large our steps should be. If we take very small steps, it will take us a longer time than if we take larger steps. However, if we take large leaps, we might completely miss the flat part of the valley, and jump straight across to the other side of the valley.

We can look at the gradient descent from a more mathematical lense by looking at the graph $y = x^2$:
Now let's take a more quantitative look at the gradient descent using the function $y = x^2$:

```{code-cell}
def y(x):
Expand All @@ -115,7 +112,7 @@ def y(x):

We will choose some arbitrary place to start on the left side of the hill and use PyTorch to calculate the tangent.

Note that Matplotlib requires numpy arrays instead of PyTorch tensors, so in the following code you might see the occasional ``detach().numpy()`` or ``.item()`` calls, which are used to convert PyTorch tensors to numpy arrays and scalar values, respectively. When it comes time to use MPoL for RML imaging, or any large production run, we'll try to keep the calculations native to PyTorch tensors as long as possible, to avoid the overhead of converting types.
Note that the plotting library Matplotlib requires numpy arrays instead of PyTorch tensors, so in the following code you might see the occasional ``detach().numpy()`` or ``.item()`` calls, which are used to convert PyTorch tensors to numpy arrays and scalar values, respectively, for plotting. When it comes time to use MPoL for RML imaging, or any large production run, we'll try to keep the calculations native to PyTorch tensors as long as possible, to avoid the overhead of converting types.

```{code-cell}
x = torch.linspace(-5, 5, 100)
Expand Down Expand Up @@ -143,16 +140,14 @@ plt.ylim(ymin=0, ymax=25)
plt.show()
```

We see we need to go to the right to go down toward the minimum. For a multivariate function, the gradient will point in the direction of the steepest downward slope. When we take steps, we find the x coordinate of our new location by this equation:
We see we need to go to the right to go down toward the minimum. For a multivariate function, the gradient will be a vector pointing in the direction of the steepest downward slope. When we take steps, we find the x coordinate of our new location by:

$x_\mathrm{new} = x_\mathrm{current} - \nabla y(x_\mathrm{current}) * (\mathrm{step\,size})$

where:

- $x_\mathrm{current}$ is our current x value

- $\nabla y(x_\mathrm{current})$ is the gradient at our current point

- $(\mathrm{step\,size})$ is a value we choose that scales our steps

We will choose ``step_size = 0.1``:
Expand Down Expand Up @@ -206,7 +201,7 @@ plt.ylabel(r"$y$")
plt.show()
```

The gradient at our new point (shown in orange) is still not close to zero, meaning we haven't reached the minimum. We continue this process of checking if the gradient is nearly zero, and taking a step in the direction of steepest descent until we reach the bottom of the valley. We'll say we've reached the bottom of the valley when the absolute value of the gradient is $<0.1$:
The gradient at our new point (shown in orange) is still not close to zero, meaning we haven't reached the minimum. We'll continue this process of checking if the gradient is nearly zero, and take a step in the direction of steepest descent until we reach the bottom of the valley. We'll say we've reached the bottom of the valley when the absolute value of the gradient is $<0.1$:

```{code-cell}
x = torch.linspace(-5, 5, 100)
Expand Down Expand Up @@ -288,7 +283,7 @@ y_large_coords.append(y_large_step_new.item())
plt.scatter(x_large_coords, y_large_coords) # plot points showing steps
plt.scatter(x_large_coords[-1], y_large_coords[-1], c="C1")
plt.text(9, 70, "step 1", va="center")
plt.xlim(xmin=-20, xmax=20)
plt.ylim(ymin=-1, ymax=260)
Expand All @@ -297,7 +292,7 @@ plt.ylabel(r"$y$")
plt.show()
```

*Note the change in scale.* With only one step, we already see that we stepped *right over* the minimum to somewhere far up the other side of the valley (orange point)! This is not good. If we kept iterating with the same learning rate, we'd find that the optimization process diverges and the step sizes start blowing up. This is why it is important to pick the proper step size by setting the learning rate appropriately. Steps that are too small take a long time while steps that are too large render the optimization process invalid. In this case, a reasonable choice appears to be ``step size = 0.6``, which would have reached pretty close to the minimum after only 3 steps.
*Note the change in scale!* With only one step, we already see that we stepped *right over* the minimum to somewhere far up the other side of the valley (orange point)! This is not good. If we kept iterating with the same learning rate, we'd find that the optimization process diverges and the step sizes start blowing up. This is why it is important to pick the proper step size by setting the learning rate appropriately. Steps that are too small take a long time while steps that are too large render the optimization process invalid. In this case, a reasonable choice appears to be ``step size = 0.6``, which would have reached pretty close to the minimum after only 3 steps.

To sum up, optimizing a function with gradient descent consists of

Expand Down
3 changes: 2 additions & 1 deletion docs/ci-tutorials/crossvalidation.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ from mpol import (
datasets,
gridding,
images,
fourier,
losses,
precomposed,
)
Expand Down Expand Up @@ -175,7 +176,7 @@ k_fold_datasets = [(train, test) for (train, test) in cv]
```

```{code-cell}
flayer = images.FourierCube(coords=coords)
flayer = fourier.FourierCube(coords=coords)
flayer.forward(torch.zeros(dset.nchan, coords.npix, coords.npix))
```

Expand Down
Loading

0 comments on commit e31d454

Please sign in to comment.