-
-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement 1D wavelength calibration classes #162
Conversation
Codecov Report
@@ Coverage Diff @@
## main #162 +/- ##
==========================================
+ Coverage 76.58% 78.36% +1.78%
==========================================
Files 9 10 +1
Lines 709 809 +100
==========================================
+ Hits 543 634 +91
- Misses 166 175 +9
... and 1 file with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
specreduce/wavelength_calibration.py
Outdated
@classmethod | ||
def by_name(cls, input_spectrum, line_name, pixel, | ||
refinement_method=None, refinement_kwargs={}): | ||
# this will do wavelength lookup and passes that wavelength to __init__ allowing the user | ||
# to call CalibrationLine.by_name(spectrum, 'H_alpha', pixel=40) | ||
# We could also have a type-check for ``wavelength`` above, this just feels more verbose | ||
return cls(...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we just remove this for now?
specreduce/wavelength_calibration.py
Outdated
if self.refinement_method in ("gaussian", "min", "max"): | ||
if 'range' not in self.refinement_kwargs: | ||
raise ValueError(f"You must define 'range' in refinement_kwargs to use " | ||
f"{self.refinement_method} refinement.") | ||
elif self.refinement_method == "gradient" and 'direction' not in self.refinement_kwargs: | ||
raise ValueError("You must define 'direction' in refinement_kwargs to use " | ||
"gradient refinement") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we want defaults for these options or is it too hard to make reasonable defaults that can be generalized?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be lean toward making gaussian the default, personally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue is that the range
parameter depends heavily on the properties of your spectrum/array, I think. Hence why I left the default as no refinement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I meant automatic defaults for refinement_kwargs
, definitely refinement_method
-dependent, and perhaps even estimated from the input_spectrum
(not sure how complicated that would get - definitely could be a follow-up effort... I just feel like users are constantly going to be seeing this error message).
specreduce/wavelength_calibration.py
Outdated
|
||
return refined_pixel | ||
|
||
@cached_property |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should probably be coupled with custom setters for the input spectrum and options to clear the cache
# useful for plotting over spectrum to change input lines/refinement options | ||
return [(line.wavelength, line.refined_pixel) for line in self.lines] | ||
|
||
@cached_property |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cache needs to be cleared when appropriate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That can be accomplished by deleting the cached properties from the class __dict__
, right? I haven't use cached_property
before
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exactly. In jdaviz I wrote a helper method that might also be useful in this case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, I implemented what I think you had in mind, take a look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that looks good to me, but I think there are other cases that need to clear too (wcs
cache needs to clear for a change in pretty much any of the original inputs, I think). I wonder if there's a way to generalize away this logic so we don't need to have all this boilerplate code 🤔 (definitely can generalize later - it would probably take a custom __setattr__
which might be a little hacky-feeling)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also now clears the WCS cache on changing the lines
or model
attributes. Do you think that suffices for now? I imagine that at some point people would just make a new WavelengthCalibration1D
rather than updating an existing one.
|
||
return wcsobj | ||
|
||
def apply_to_spectrum(self, spectrum=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want __call__
instead, to just call this, to do something else, or to not exist? I think we will eventually want calling to do something since that pattern is supported through most/all of the other specreduce operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remembered @eteq was somewhat against having a __call__
at all, so I had left it out for now. If it's more consistent with the way the rest of the package works, then I guess we can think about what makes the most sense. Having it call this method on the default (input) spectrum seems like an ok behavior to me, but I don't feel strongly about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I think we'll want __call__
to do something eventually, but I'm fine deferring that until the rest of the API begins to converge in follow-up efforts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a workflow we will need to support is where we iterate the fitting process. one would start with a small number of matched lines and a simple model, use the model to match more lines, re-fit, tweak model, match more lines, re-fit/tweak/match, and repeat until a convergence criteria is met. my view is that this class/function should be within that loop, but not contain that loop. i don't see where a __call__
fits into that scheme and i think a apply_to_spectrum()
method is a clearer way of applying the results to a spectrum.
…implemented method
specreduce/wavelength_calibration.py
Outdated
y = [line[0].value for line in self.refined_pixels] * self.spectral_unit | ||
|
||
# Fit the model | ||
self._model = self.fitter(self._model, x, y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this prevent successive fitting calls?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so...I guess successive calls would be starting from the previous fitted parameter values, but I don't think it prevents fitting. Do you think having a separate _fitted_model
parameter would be better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure 🤷, I think the way you have it - assuming the output can be passed as the input - is both quite handy but also maybe unexpected behavior unless we document carefully.
specreduce/wavelength_calibration.py
Outdated
if fitter is None: | ||
if self.model.linear: | ||
self.fitter = LinearLSQFitter(calc_uncertainties=True) | ||
else: | ||
self.fitter = LMLSQFitter(calc_uncertainties=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if instead self._fitter
was stored as None
and then a fitter
property defaults based on the current value of self.model
(so that if model
is changed after initialization, the default can change to match)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this idea, I'll implement momentarily.
@kecnry I added docs - please look them over. I'm going to open an issue to revisit whether we want to store the fitted model in a separate property from the initial model. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a few high-level comments:
- there definitely needs to be a concept of a
LineList
. for example, theHeNeAr
lamps that are very commonly used for ground-based optical data provide several hundred possible calibration lines. different calibration lamps are often combined so there needs to be a way to manage and combine lists based on configuration. handling such sets of calibration lines is a far more common use case than managing individual ones. - i'm not sure why the
CalibrationLine
class needs pixel position refinement. the locating and centroiding of potential calibration lines in pixel-space should happen in a separate, previous step. - iterative fitting is something we'll likely want to implement. basically, start with
min_order
and increaseorder
until either the fit RMS stops improving ormax_order
is reached. it's probably something that can be added later, but we should be cognizant of the future need.
Thanks for the thoughts, Tim. Your points 1 and 3 are planned for future development, we're just trying to be iterative about this and not have one massive PR that takes 6 months to finish. Your second comment is the only one that might be an actual sticking point, I think, but I don't think that the current class precludes the workflow it sounds like you're imagining. If the pixel positions of the lines have already been refined, you can simply leave the refinement args in the |
my main concern, especially with point 2, is that sub-optimal architecture that gets baked in early causes significant issues down the road. wavelength calibration fundamentally requires managing lists of lines because you need multiple lines to formulate a wavelength solution. as implemented here, the in my head and how most other implementations work, you first create an extracted line list from an extracted spectrum. the peak-finding and accurate centroiding of the lines is done in the creation of that. this can fundamentally just be a list of pixel positions, but could optionally include line fluxes as well. the fluxes could be used to set thresholds for what lines to use or can be used by matching algorithms. then you have a list of wavelengths of potential calibration lines. usually these are archived catalogs, but it could be a manual list. again, fundamentally just a list of wavelengths, but could include other optional info like relative strengths or whether the wavelengths are vacuum or air. i should note that where a this matching is a process and there are many ways to do it. this initial example of manual matching is fine and there will always be a use case for that. my contribution will be porting over automated and template-based matching. the output from any of the methods should be the same format of wavelength/pixel pairs. so there should be some kind of class hierarchy of i'll see if i can put these words into a clearer flowchart... |
awesome! i'll take a quick look tonight and a deeper dive tomorrow... |
the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few comments/questions from a quick skim of the code. Are there any examples for different workflows that can be used for testing or should we just refer to the tests for now?
specreduce/wavelength_calibration.py
Outdated
catalog: list, str, optional | ||
The name of a catalog of line wavelengths to load and use in automated and | ||
template-matching line matching. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could this be moved to a class method? So WavelengthCalibration1D.from_line_list(line_list)
calls WavelengthCalibration1D(...)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what i implemented in #165 returns a line catalog as a QTable
. so this should be able to accept that and make sure there's a wave
or wavelength
column.
specreduce/wavelength_calibration.py
Outdated
line_wavelengths.value.sort() | ||
self._line_list["wavelength"] = line_wavelengths |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does line_wavelengths
support frequencies? Would the sorting need to be flipped in this case? What if they're mixed units?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how common it is to use frequencies for this (my assumption was "never" but I guess someone might want to), but we could support it. I can also add a check to make sure they're all in the same unit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, since this requires line_wavelengths
to be a Quantity
or QTable
(rather than allowing a list of Quantity
), it's already guaranteed that all elements have the same unit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kecnry I implemented reverse-sorting if frequencies are provided, although that does make the variable name line_wavelengths
a bit of a misnomer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my first take is that since this package is geared towards 2D optical/IR data obtained with CCD-like devices, i would stick with wavelengths internally and not worry about frequencies at all. at least for simplicity at this stage of development. if users have dispersed data with units of frequency, there are existing utilities to transform it to wavelengths. i can imagine possible use cases for 2D spatial vs frequency data in radio astronomy, but i've never seen it. radio data is either 1D (single-dish) or 3D (scanned single-dish or interferometry). eV or wavenumber might be more likely candidates for non-wavelength inputs, but they can also easily be transformed to wavelength.
@pytest.fixture | ||
def spec1d(): | ||
np.random.seed(7) | ||
flux = np.random.random(50)*u.Jy | ||
sa = np.arange(0, 50)*u.pix | ||
spec = Spectrum1D(flux, spectral_axis=sa) | ||
return spec | ||
|
||
|
||
@pytest.fixture | ||
def spec1d_with_emission_line(): | ||
np.random.seed(7) | ||
sa = np.arange(0, 200)*u.pix | ||
flux = (np.random.randn(200) + | ||
10*np.exp(-0.01*((sa.value-130)**2)) + | ||
sa.value/100) * u.Jy | ||
spec = Spectrum1D(flux, spectral_axis=sa) | ||
return spec | ||
|
||
|
||
@pytest.fixture | ||
def spec1d_with_absorption_line(): | ||
np.random.seed(7) | ||
sa = np.arange(0, 200)*u.pix | ||
flux = (np.random.randn(200) - | ||
10*np.exp(-0.01*((sa.value-130)**2)) + | ||
sa.value/100) * u.Jy | ||
spec = Spectrum1D(flux, spectral_axis=sa) | ||
return spec | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would this benefit from using the synthetic spectra from #165?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but I think using that to expand/improve the tests can be another PR once both of these are merged.
@tepickering Think you'll get a chance to take a deeper look by Monday afternoon or so? I'd like to close this out in line with our Sprint end. |
i will take a deeper dive later today, if not over the weekend. it was a little hard to follow the changes on github due to missing test coverage so i'll pull in a local copy. |
docs/wavelength_calibration.rst
Outdated
|
||
The example above uses the default model (`~astropy.modeling.functional_models.Linear1D`) | ||
to fit the input spectral lines, and then applies the calculated WCS solution to a second | ||
spectrum (``science_spectrum``). Any other ``astropy`` model can be provided as the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe specify "any 1D model"
from astropy.table import QTable | ||
pixels = [10, 20, 30, 40]*u.pix | ||
wavelength = [5340, 5410, 5476, 5543]*u.AA | ||
line_list = QTable([pixels, wavelength], names=["pixel_center", "wavelength"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: i'd call this matched_line_list
so as not to be confused with linelist
or line_list
which is usually understood to mean something else (i.e. list of wavelengths of lines due to specific ion/molecule or mixture of of them).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a much-improved implementation that gives us something to build on. i left some minor comments about some wording that should get sorted out. there'll be some meshing needed between this and what i've done in #165. that can happen in a future PR, though. the most substantive thing i'd like to see addressed before merging to cover the remaining missing lines with tests. there's not too many...
__all__ = ['WavelengthCalibration1D'] | ||
|
||
|
||
def get_available_catalogs(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thus far there is built-in access to some legacy line lists/line catalogs in the specreduce-data
repository and more well-curated ones in pypeit
. since either way would require spidering through remote directories to find what's there, i think this is probably better left to the documentation rather than a function like this. you need to know some details about which one to use because they can be very instrument-dependent.
actually, i forgot that i did make a global variable, PYPEIT_CALIBRATION_LINELISTS
, that does list which ones are available that we'd want to use. so after #165 is merged, this could just return that.
""" | ||
ToDo: Code logic to combine the lines from multiple catalogs if needed | ||
""" | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i already implemented this in #165 for pypeit
line lists. if you give a list of supported ions/molecules, it generates a combined line list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, we'll definitely need a PR after these are both merged to combine their powers 😃
specreduce/wavelength_calibration.py
Outdated
""" | ||
input_spectrum: `~specutils.Spectrum1D` | ||
A one-dimensional Spectrum1D calibration spectrum from an arc lamp or similar. | ||
line_list: list, array, `~astropy.table.QTable` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would call this line_pixels
to make it clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thought here was that the user can provide the whole table, including both pixels and wavelengths, in which case it doesn't make sense to call it 'line_pixels
. Maybe I'll change this slightly to have separate line_list
and line_pixels
arguments, where the former can be the whole table, and the latter is intended for just pixel values to then put internally into the table.
specreduce/wavelength_calibration.py
Outdated
catalog: list, str, optional | ||
The name of a catalog of line wavelengths to load and use in automated and | ||
template-matching line matching. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what i implemented in #165 returns a line catalog as a QTable
. so this should be able to accept that and make sure there's a wave
or wavelength
column.
Coverage is up to 90%, the rest is really dependent on integrating with #165 so I think it can wait. I think all I have left to address @tepickering's comments is to split out |
i'm happy with the latest round of changes. as previously mentioned, further work to mesh functionality and vocabulary with #165 is best left to a subsequent PR. |
Excellent! Tests passed again after my update splitting out those two input arguments, let's get this in 🎉 |
I don't have the power to merge here, @kecnry can do it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some questions/concerns about the big-picture plan for this as catalogs and the 2d cases get implemented, but so long as the plan is to iterate further in the future, won't object to merging this as-is. I'd love to see the actual pseudocode examples updated for various science workflows though - just to see where this is heading in advance. @tepickering, how do you imagine the catalog interface would work with this? There is no refinement implemented here anymore, so how does the 2d case work?
I know we haven't committed to backwards-compatibility yet, but would it make sense to merge this into a development branch instead of main until we're ready for it to be (more) public facing (@tepickering)?
Note that either ``matched_line_list`` or ``line_pixels`` must be specified, | ||
and if ``matched_line_list`` is not input, at least one of ``line_wavelengths`` | ||
or ``catalog`` must be specified. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this feels to me like it would be better as multiple classmethod, but we can revisit this as we iterate further on the API and various workflows as I recognize that might be a matter of taste:
WavelengthCalibration1D(input_spectrum, matched_line_list, model, fitter)
(__init__
)WavelengthCalibration1D.from_lists(input_spectrum, line_pixels, line_wavelengths, model, fitter)
WavelengthCalibration1D.from_catalog(input_spectrum, catalog, model, fitter)
(and perhapsline_pixels
?)
This also would allow getting rid of a lot of code that just checks for validity of the combinations of all of these technically-optional but-sorta-required kwargs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, this is still kinda messy. i think we should actually keep WavelengthCalibration1D
as simple as possible: just WavelengthCalibration1D(line_pixels, line_wavelengths, model, fitter)
where line_pixels
and line_wavelengths
are list-like of the same length. there are previous steps required to find/centroid lines in a calibration spectrum (specutils
provides tools for this) and then match some of those lines to known wavelengths to construct line_pixels
and line_wavelengths
(manually or via to-be-implemented automated techniques). i don't think we should wrap all of these steps into one big class. the interfacing with things like catalogs of known lines like i implement in #165 will happen at the line-matching stage.
anyway, it will be a lot easier to iterate on these ideas using realistic end-to-end examples once this work is merged with #165. the work may not be complete after these are merged, but nothing is broken so i don't think we need a separate develop
branch`.
from specreduce import WavelengthCalibration1D | ||
pixel_centers = [10, 22, 31, 43] | ||
wavelengths = [5340, 5410, 5476, 5543]*u.AA | ||
test_cal = WavelengthCalibration1D(lamp_spectrum, line_pixels=pixel_centers, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note that lamp_spectrum
is not defined in this example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i've left several more substantive comments. i still think WavelengthCalibration1D
is a bit too high-level, though a big step in the right direction. it works well enough to merge now and future work will be a lot easier with #165 merged in as well. being able to generate realistic data to prototype realistic workflows will be a significant benefit.
This implements the initial work needed for 1D wavelength calibration, defaulting to a basic linear fit to the input wavelength/pixel pairs. Does not implement autoidentification of lines or input of a preset named line list. I have a little bit of refinement left to do (e.g., making sure the internals are consistent on whether pixel values are stored as
Quantities
withu.pix
) and need to add tests, but I figured it's to a point where I should get other eyes on it to make sure I'm fulfilling the consensus reached in the discussion on #152.The following code snippet is what I was using to test this: