Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trace fix improvements #2386

Merged
merged 19 commits into from
Nov 19, 2024
Merged

Trace fix improvements #2386

merged 19 commits into from
Nov 19, 2024

Conversation

segasai
Copy link
Contributor

@segasai segasai commented Oct 9, 2024

Hi,

This is a PR WIP trying to address #2380 (and maybe also improve systematic rv floor)

ATM I don't think it fixes anything much but currently does a couple of things I was hoping could help.

  1. Use multiple PSFs at different wavelengths (rather than PSF at central wavelength) when cross-correlating with external spectra to avoid possible biases vs wavelength
  2. in the same external spectrum x-correlation routine use PSFs from multiple fibers as well for the same reason
  3. when doing 'internal wavelength' calibration, subtract the continuum in the same way we do for external calibration, to avoid issues if we have a bright stars and weak lines

I did also try to restructure the code a little bit/reduce duplication to make it easier to change.

I would ideally like to merge this type of patch even if it cannot fully fix #2380 (when it is ready and assuming nothing gets worse)

@coveralls
Copy link

coveralls commented Oct 9, 2024

Coverage Status

coverage: 30.086% (-0.1%) from 30.218%
when pulling 083ca2b on trace_fix_improvements
into 41da70a on main.

@segasai segasai added the WIP Work in Progress label Oct 9, 2024
ivar=ivar[fiber,ok],
hw=3., calibrate=True)
if fiber %10==0 :
log.info("Wavelength offset %f +/- %f for fiber #%03d at wave %f "%(dwave, err, fiber, block_wave))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor style comment, but for the record so that it doesn't propagate to more code:

When formatting strings for logging, it is better to use the structure

log.info("Wavelength offset %f +/- %f for fiber #%03d at wave %f", dwave, err, fiber, block_wave)

so that the string formatting is only evaluated by the logger if the log level would actually print it. In practice we almost always have INFO-level on so this particular case doesn't make a performance difference, but the readability is nearly the same as old-syle %-formatting and we have had cases in the past of string evaluation of unprinted debug-level logging taking a significant amount of time, so this style is something to keep in mind when adding new log messages.

Personal opinion: I'm also fine with pre-evaluated new-style format strings for INFO-level and above if the author feels it helps with readability, e.g.

log.info(f"Wavelength offset {dwave} +/- {err} for fiber #{fiber:03d} at wave {block_wave}")

My basic motivation is that readability trumps performance given that we use INFO-level for production running anyway so there isn't a performance impact. This should not be used for DEBUG-level logging though, due to performance issues especially if it is deep in loops.

[actual review of algorithmic changes takes more thought/time...]

@segasai
Copy link
Contributor Author

segasai commented Oct 27, 2024

I think to avoid feature creep and very large patch, I have decided to test what we have here.
I ran full processing of several tiles/nights (same set I used for previous version of trace_shifts patch)
/global/homes/k/koposov/desi_koposov/wavelength_fix/bulk.sh

To my surprise this already lead to significant improvements. I looked at the xmatch of velocities ( from redrock-) to APOGEE.

image

And here, the MAD wrt to APOGEE changes from 1.3 km/s to 1.1 km/s.
(I checked for all other surveys there are still improvements, but less visible due to their bigger errors)
Also I checked and the MEANDY's scatter also improves (MAD goes from 0.018 to 0.015)

So I think that's good evidence of things improving and I think it would be good to commit this.
Summarizing the changes here

  1. background subtraction when doing internal wavelength offsets (in the same way it's done when doing external wavelength corrections)
  2. using correct variance when doing internal wavelength offsets ( see Internal wavelength calibration errors #2113 )
  3. refactor by putting the background subtraction into separate function
  4. When doing external wavelength correction use PSFs sampled across 20 points along the spectrum, and 20 fibers across the petal as opposed to one single wavelength point for one fiber
  5. using a consistent and more reasonable weighting scheme for determined the 'mean' wavelength of the wavelength bin considered.
    None of these changes fixes the Fixing 1 km/s radial velocity systematic #2380

One thing that I feel makes these changes harder to analyse is that these internal/external wavelength offsets for each fiber and wavelength bins are not saved anywhere (other than printed in the log and saved on average in MEANDY). I think for future improvements, it'd be beneficial to save these kinds of offsets (i.e. either in psf- file or some other kind of output file). It'd be ~ 2500 numbers per single frame, so it's not that much. But maybe that should be dealt with separately.

@segasai segasai removed the WIP Work in Progress label Oct 27, 2024
@segasai
Copy link
Contributor Author

segasai commented Nov 2, 2024

I've verified on the the test subset of data processed /global/cfs/cdirs/desi/spectro/redux/koposov/wavelength_test_bulk_new/
that repeated observations also show reduced scatter in velocities See figure for z-band arm measurements
image
Improvement is MAD change from .7 km/s to .6 km/s.
I've done a separate analysis when using blue only arm there. MAD improves from 1 km/s to .9 km/s
Also looking at the large offsets, the number of those is significantly reduced with new code (this is B arm)
image

@julienguy julienguy self-requested a review November 4, 2024 23:08
Copy link
Contributor

@julienguy julienguy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I agree weighting with ivar x flux^2 x (flux>0) is better than ivar x flux x (flux>0) to get the effective wavelength in a bin.

  • Subtracting the continuum is a good idea to improve the precision on the cross-match (similarily to what was done to the match of the sky)

  • I see the effort to sample the PSF across wavelength and fibers to convolve the external reference spectrum (original code was using only wavelength and fiber for the PSF)

  • I ran the code on a random dark time exposure and 3 cameras and found tiny shifts <0.01A as expected

  • I have only one change request: you wrote a comment "The reason why we oversample is unclear to me".
    It is there because the wavelength arrays with the boxcar extraction are not aligned and for this reason I thought oversampling before stacking would help. Given the careful study you have done, maybe you have seen that this does not help, in which case please change the code, otherwise, you may remove the comment.

I think we can merge after that. Thank you for all the work.

fix a few typos in comments
@segasai
Copy link
Contributor Author

segasai commented Nov 5, 2024

Thanks for looking over the patch @julienguy . On the resampling, My comment was just a thought (it wasn't obvious to me there would be benefit from oversampling), but I didn't verify that. So I've removed that comment.
I think the only other comment in the patch that's worth dealing with, concerns this line of code

x=np.tile(x[hw]+np.arange(-hw,hw+1)*(y[-1]-y[0])/(2*hw+1),(y.size,1))

(it's now line 688 in my patch). I did not fully follow the reason behind the y[-1]-y[0] in this original code

so my code is just a more straightforward 2d grid in x,y, but maybe I missed the reason for that.

@segasai segasai requested a review from julienguy November 5, 2024 00:44
@julienguy
Copy link
Contributor

I checked the new changes and they are fine with me. I also do not really understand my intent with the term (y[-1]-y[0])/(2*hw+1) which must always be very close to one. So I agree with your simplification. I am merging this. Thanks for all the work on this.

@julienguy
Copy link
Contributor

This is a problem with the the documentation with the error:
/home/runner/work/desispec/desispec/doc/changes.rst:16: ERROR: Unknown target name: "#2409"
Can @weaverba137 help fix that so that we can merge?

@sbailey
Copy link
Contributor

sbailey commented Nov 19, 2024

The documentation error was mine, while making an earlier tag, and it has been corrected on main. We can merge this PR without fixing it in this branch.

@sbailey sbailey merged commit 1f75589 into main Nov 19, 2024
25 of 26 checks passed
@sbailey sbailey deleted the trace_fix_improvements branch November 19, 2024 05:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fixing 1 km/s radial velocity systematic
4 participants