You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue was originally raised in #126 but is now split into it's own, more focused issue.
... I think major redesigns to GriddedDataset and Gridder are necessary to address the Hermitian pairs issue raised in #100. To solve the Hermitian pairs issue, I think it makes sense to break apart the gridding->imaging and gridding->GriddedDataset capabilities currently contained in Gridder. Although these gridding functionalities are similar, I think trying to combine them all in a single class actually makes things more complicated than they need to be.
Originally, I wrote the Gridder class to mimic the "gridding" of individual visibility data points to a UV grid so that they could be imaged using the inverse FFT, creating the dirty image. This is very useful for visualization and debugging and I propose that we keep something like this (perhaps with a more descriptive name, that somehow indicates its ability to produce dirty images). As far as I can currently tell, all of the code pertaining to the gridding->imaging functionalities is straight numpy.
It turns out that gridding visibilities using uniform weighting is equivalent to taking a cell-average of the visibility value. You can do a similar operation to the visibility weights. Both of these are needed for gridding->GriddedDataset. This way you can take, say, ~100 individual visibility measurements (and their weights) that fall in a single cell, perform a weighted average, and then equivalently represent the same measurement by a single average visibility value and a new (higher) equivalent weight value. Then, any computation using these visibilities goes much faster. All of this code could be straight numpy too, with the exception of export to a Pytorch GriddedDataset.
Since the gridding operation itself is kind of tedious and involves a lot of cell indexing and hacking of the np.histogram routine, it seemed like it made sense to combine both of these gridding routines into a single pass. This leads to some weird branching of what kind of "gridding" to use where, like in to_pytorch_dset().
But the biggest drawback of combining these two use cases in a single object (which I've only realized now that we've flushed out #100), is that it means our gridded dataset used in RML and other forward modeling contains Hermitian pairs (when it shouldn't). It's not a big deal for any routine that is just interested in the MLE or MAP estimates, but it will result in erroneous uncertainties on any routine that explores parameter uncertainties.
We should flesh this out a bit more, but my proposal for reorganization to the Gridder is to make two separate classes. The first will be purely for (uniform) averaging of visibilities to create a GriddedDataset. In its __init__ function it will not conjugate the visibilities to produce Hermitian pairs. This class will need to keep routines like _sum_cell_values_channel, _sum_cell_values_cube, _grid_weights, _check_scatter_error, _estimate_cell_standard_deviation, and a version of _grid_visibilities that only does uniform weighting.
Then, the second class is for imaging. I think it could reasonably inherit from the first class. It would need to modify __init__ such that visibilities are conjugated to produce Hermitian pairs, and it would introduce all of the routines for gridding with Briggs weighting, calculating beam areas, etc.
I think this would really help simplify this area of the codebase and prevent many errors pertaining to inference.
The text was updated successfully, but these errors were encountered:
This issue was originally raised in #126 but is now split into it's own, more focused issue.
... I think major redesigns to
GriddedDataset
andGridder
are necessary to address the Hermitian pairs issue raised in #100. To solve the Hermitian pairs issue, I think it makes sense to break apart the gridding->imaging and gridding->GriddedDataset capabilities currently contained inGridder
. Although these gridding functionalities are similar, I think trying to combine them all in a single class actually makes things more complicated than they need to be.Originally, I wrote the
Gridder
class to mimic the "gridding" of individual visibility data points to a UV grid so that they could be imaged using the inverse FFT, creating the dirty image. This is very useful for visualization and debugging and I propose that we keep something like this (perhaps with a more descriptive name, that somehow indicates its ability to produce dirty images). As far as I can currently tell, all of the code pertaining to the gridding->imaging functionalities is straight numpy.It turns out that gridding visibilities using uniform weighting is equivalent to taking a cell-average of the visibility value. You can do a similar operation to the visibility weights. Both of these are needed for gridding->GriddedDataset. This way you can take, say, ~100 individual visibility measurements (and their weights) that fall in a single cell, perform a weighted average, and then equivalently represent the same measurement by a single average visibility value and a new (higher) equivalent weight value. Then, any computation using these visibilities goes much faster. All of this code could be straight numpy too, with the exception of export to a Pytorch GriddedDataset.
Since the gridding operation itself is kind of tedious and involves a lot of cell indexing and hacking of the np.histogram routine, it seemed like it made sense to combine both of these gridding routines into a single pass. This leads to some weird branching of what kind of "gridding" to use where, like in
to_pytorch_dset()
.But the biggest drawback of combining these two use cases in a single object (which I've only realized now that we've flushed out #100), is that it means our gridded dataset used in RML and other forward modeling contains Hermitian pairs (when it shouldn't). It's not a big deal for any routine that is just interested in the MLE or MAP estimates, but it will result in erroneous uncertainties on any routine that explores parameter uncertainties.
We should flesh this out a bit more, but my proposal for reorganization to the
Gridder
is to make two separate classes. The first will be purely for (uniform) averaging of visibilities to create a GriddedDataset. In its__init__
function it will not conjugate the visibilities to produce Hermitian pairs. This class will need to keep routines like_sum_cell_values_channel
,_sum_cell_values_cube
,_grid_weights
,_check_scatter_error
,_estimate_cell_standard_deviation
, and a version of_grid_visibilities
that only does uniform weighting.Then, the second class is for imaging. I think it could reasonably inherit from the first class. It would need to modify
__init__
such that visibilities are conjugated to produce Hermitian pairs, and it would introduce all of the routines for gridding with Briggs weighting, calculating beam areas, etc.I think this would really help simplify this area of the codebase and prevent many errors pertaining to inference.
The text was updated successfully, but these errors were encountered: