-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discrepancy between stored best fit spectrum and reconstructed spectrum from predict
?
#298
Comments
@jrleja I've tested the log probabilities as you suggested and the regenerated values from your Going a bit deeper, I think the issue here is actually on the interpolation technique. The sps objects make identical spectra, but when I do it myself I use Adam's Spectres https://github.com/ACCarnall/spectres to resample the FSPS SSP to the observed NIRSpec grid. In prospector this is done by prospector/prospect/sources/ssp_basis.py Line 244 in f46ca4e
This seems to be the source of discrepancy. See the figure below: If one computes the chi2 for the fits: So there is a clear change in the goodness. I wonder if this will have follow on effects if the ssp_basis uses the numpy also when evaluating the full MC grid. For instruments like NIRSpec given the wavelength bins are highly non-linear (specially in the PRISM mode), I assume spectres would provide a better interpolation over numpy (but I haven't tested this though)? |
I think the issue is the undersampling of the LSF by the pixels, not the non-linearity of the bins. This will require implementing a different smoothing algorithm within the likelihood call. |
Just to add my oar in here. We have confirmed that our fitting issues were due to the use of As @themiyan mentioned we changed to I expect you don't want to have a This will affect anyone using Prospector to fit NIRSPEC PRISM spectra, (UNCOVER etc.) so it is probably worth informing people of this issue if you know they are working on this. |
Thanks for bringing this up. I've taken a closer look, and have a few thoughts. To clarify, I think in principle np.interp can work pretty well even for very low resolution spectroscopy as long as the instrument LSF, convolved with the LOSVD and before pixelization, is well sampled by the pixels (at least 2 pixels per LSF FWHM, and ideally > 3 though see Robertson 17 and Law 21 for caveats.) One can test this by convolving the FSPS/MILES spectra with a Gaussian LSF with sigma=5000 km/s on a fine grid and comparing the results of spectres and np.interp for pixels spaced every 3000km/s or so. However, I do think the nominal "R" quoted in Jdox files is a pre-flight estimate based simply on FWHM=2.2 nominal "pixels", so the actual instrumental LSF due to the optics or dispersing element might be narrower such that undersampling/pixelization play a significant role. Unfortunately the pre-pixel instrumental LSF is not quoted anywhere as far as I can tell, and likely depends on the light profile of the source within the slit (e.g. de Graaf et al 23). Indeed, I am curious about what exactly you mean by 'this happens after LSF smoothing, that part is done correctly'. Anyway, if I conservatively assume that the instrumental LSF is a Gaussian with FWHM = 1.0 pixel at every wavelength (so pretty undersampled, but not infinitely so), smooth a MILES-library 2 Gyr old solar metallicity SSP by this LSF (within FSPS, using FWIW Hope this helps. It's possible I've made a mistake somewhere; the code to make the below figures is in a gist here: |
Hi @bd-j I can't see the pixels in your above plots, the lines are too smooth, so I don't think you have accounted for the binning ups step. |
How you would define 'coarse' pixels, since there might be computational advantages to only doing the pixel integral when really necessary? What do you mean by 'correct results'; do you have a ground truth, e.g. a PNe or well characterized star observed with nirspec prism? Re binning up, I have explicitly used spectres to bin the LSF-smoothed spectra for comparison in the code gist that I linked above. I've updated that code to show a 0.5 Gyr SSP with stronger Balmer lines, using larger pixels corresponding to the "DLDS" column of the nirspec prism dispersion curves on JDox , and plotting with ax.step instead of ax.plot for visualization of the pixels. However, for a Gaussian instrumental LSF with FWHM=1.5 pixels the difference between using np.interp and spectres is still <~ 1%, see attached plots. Please let me know if you notice an error in the code in that gist, or a discrepancy in assumptions compared to your own tests. Thanks |
Hmm I see we have not posted the worked example. Sorry about that! To answer your questions: By 'coarse' I simply mean the low resolution of NIRSPEC PRISM, we are smoothing and binning to match our data and both are a f(λ). I will bug @themiyan - but if you look at the top (Oct 12 original post) you can see our example spectrum where spectres gives different results (red vs brown). This is for an older spectrum than 0.5 Gyr which may explain why you are not seeing it. What is truth? Well clearly the higher resolution when binned correctly. What is 'correct'? For us is was a significant effect you could see by eye, and by eye we determined that it was clearly spectres that was doing more correct binning from the higher resolution model. For an independent check I used yet another rebinning code, this one didn't do variable pixel size but I approximated as constant around the break and it agreed with spectres in that region. This all made sense to us as (1) linear interpolation can only ever be an approximation to integration across pixels. (2) we got much better and more stable fits with correct binning. I will ask @themiyan to post the actual data and code we used. e.g. high res spectrum, result of prospector binning, result of spectres binning. |
Hi, thanks for the additional info. It would be helpful to see a worked example - happy to continue via email if you don't feel like sharing data on github. I still would like to know what exactly you mean by smoothing the data (in addition to binning? By what LSF?) I agree that binning the native MILES resolution (R~2000) spectrum gives a different (and much more plausible) answer than just interpolating to the PRISM pixel centers. However, I think that unless some Gaussian-ish smoothing corresponding to the instrumental LSF (pre-pixelization) is applied, then this answer will also not be correct, potentially at the several 10s of percent level. And, once you have applied a Gaussian-ish instrumental LSF smoothing that is reasonable for PRISM, the difference between interpolating and binning is very small (sub-percent). |
Sorry for the delay. As suspected you both are correct. @bd-j 's code also works as intended and I think the issue was on how the lsf was handled. Taking a step back, the original issue was that the 'best-fit' spectrum saved by Prospector had some absorption features which were not observable in the observed spectrum. Now let's do this again with LSF. Grey thick line is the observed spectrum and the red and black is what we discussed above. Numpy clearly has some features but more or less does follow the pink high-R spectrum. So it's not wrong (because pink is the ground truth), it just looks more different to the observed spectrum compared to what spectres gives. So if we now regenerate the higher-R spectrum ( This demonstrate this clearly: comp with observed spectrum So this is good. Regardless of whether we use numpy and spectres, when a higher-R spectrum is made and convolved with an LSF and resampled to a lower R NIRspec wavelength grid, the result is similar. So now, if we go to the original issue of why the stored best-fit in the h5 file looked incorrect compared to observed spectrum, I think this is probably due to lsf smoothing not applied to the saved spectrum. It should have been but for some reason it wasn't. BTW @bd-j prospect/utils/smoothing.py |
The notebook for plots is |
OK. I think I understand why the hdf5 file 'best-fit' had the non-lsf smoothed spectrum stored. The smoothing applied here is based on a wavelength dependent LSF. So for smoothing I removed In the past I did prospector/prospect/sources/ssp_basis.py Line 236 in 2ead831
Later on I generalised it as:
so for newer runs Ideally a seperate generalised keyword should be implemented in ssp_basis so any form of smoothing triggers the smoothing function. So the only thing outstanding is if a user attempts to fit a spectrum without a smoothing function, in low-R instruments like NIRSpec PRISM there can be differences in the binned spectrum based on whether numpy or spectres is used like shown by the red and black spectra in the previous post. |
Hi @themiyan, apologies for the very delayed response. I agree that one needs to do smoothing of the model spectrum by the instrumental resolution (or technically by the difference between the instrumental resolution and the stellar library resolution) before comparing to data. Depending on the sampling of the instrumental LSF this may need to be done even in addition to pixelization/binning, even with low-R instruments. Re the smoothing, in more recent versions of prospector the smoothing is not handled by However, if Because of the complexities of the emission line marginalization, and including library, physical, and instrumental effects self-consistently but flexibly for nebular lines and for stars, the smoothing treatments have been overhauled in the v2.0 branch to hopefully make all this a bit more explicit. |
The stored MAP spectrum in .h5 file seems to have slightly different properties compared to the one that can be reproduced by the max_theta values.
In the screenshot, the cyan photometry and blue observed JWST/NIRSpec PRISM spectrum is fit with prospector. The stored best fit spectrum is shown by the brown spectrum. Clearly there are some balmer Ines that's are not in the observed spectrum.
If I reconstruct the best-fit spectrum at a higher R using the stored MAP theta values I get the green spectrum. Then when I resample the spectrum back to the NIRSpec wavelength resolution using
spectres
I get the red spectrum, which is still a good fit to the observed spectrum and does not have the strong abs lines.The theta values do match. For example:
from the full chain:
[-0.48803794, 0.4424453 , 11.43844738, 0.04454762, 0.15767742, 0.03109789, 0.06980556, -0.06348596, -3.00673872]
from the stored best-values dic in .h5 file:
[-0.48803794, 0.4424453 , 11.43844738, 0.04454762, 0.15767742, 0.03109789, 0.06980556, -0.06348596, -3.00673872]
If I read the .h5 file using h5py
So the possibility is that the sps object is different. I don't see any reason why
reader.get_sps(out_res)
would give an incorrect sps object.VS the param file here:
run_on_data.py.txt
Any ideas why this is happening? Which best fit spectrum should one trust?
The text was updated successfully, but these errors were encountered: