-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Step 4.0: Basic End-to-End #42
Comments
This expands on #41 by adding crosstalk and cable reflections, correct? If so, then, sometime down the road, we should come to a consensus on which crosstalk model we want to employ, as well as which parameters we want to use for the cable reflections. Regarding the criteria for success, this seems a bit ambitious, and I'm also a little confused on what exactly it means. First bit of confusion: are we taking the "known input" to be the model power spectrum, or the power spectrum from the perfectly calibrated foreground + EoR visibilities? Second bit of confusion/remark on ambitiousness: matches to 1% doesn't really make sense in light of results from #32; to summarize, the agreement between different power spectrum estimations depends on spectral window, and not even the EoR-only power spectra match the analytic expectation to 1% in any spectral window--so it's a little hard to say what a good "measure of correctness" metric would be, and what a satisfactory value of such a metric would be. |
Thanks @r-pascua. This is exactly the kind of discussion I was hoping to generate by creating these issues. I think we should let the whole team weigh in on this. I think what we want to validate is that in the "window" the power spectrum matches to a certain tolerance. I also agree that 1% is arbitrary, and perhaps a little ambitious. Certainly willing to change the requirement if we can define a better one. |
So, in light of a recent update to #32, 1% may or may not be ambitious, depending on which outputs we're trying to match to which expected values. If we're trying to match any calibrated output to an expected (analytic) power spectrum, then 1% is overly-ambitious. If we're happy with the calibrated results (the analysis pipeline end-products) matching the perfectly calibrated results, then 1% might be a realistic goal, at least for I agree that the whole team should weigh in on this. Hopefully we learn some lessons from the validation tests that precede this that will help us determine what "success" looks like. |
Flow of End-to-End: Specs of simulation:
|
This post contains updates regarding preparation of data from discussion with @steven-murray and @jsdillon, as well as a suggestion for the analysis to be performed, in accordance with discussions with @nkern and the larger validation group. Some aspects of this post are rough ideas of what should be done, and discussion is greatly encouraged—we still need to nail down some of the data preparation and analysis parameters. Data PreparationWe will choose 10 days from the H1C IDR2.2 data release. For each day, we will construct two base data sets as follows:
I propose using the following naming convention for the files: The data set containing only foregrounds will be used for extracting a true upper limit (that is, it's supposed to represent the case where EoR is hidden by the noise floor). The other data set will be used to see if we can detect EoR in a case where it should be above the noise floor for at least some delays in most spectral windows. @jaguirre should add clarification on this point if deemed necessary or if any of the information stated is incorrect. For each of the above data sets, we will corrupt the data according to the following routine:
Ideally, the routines for each step will exist as functions in Data ProcessingWe should write a makeflow for calibrating the corrupted visibilities that is based on the IDR2.2 pipeline. @jsdillon should be the authority on this, but @r-pascua has experience writing/running an analysis makeflow. The For absolute calibration, we should use the GLEAM + brights foregrounds, with some level of noise, smoothed out to some maximum delay. @jaguirre and @nkern should confirm or refute this point, adding extra detail as necessary (e.g. up to what delay are we smoothing?). Important note regarding analysis: we have agreed to not test @jsdillon to perform (or help with performing) the LST-binning step post-analysis. We should write our own YAML files for use with the power spectrum pre-processing and the power spectrum pipeline scripts in https://github.com/HERA-Team/H1C_IDR2/tree/master/pipeline. @nkern should be the authority on this; @jburba has experience working with the pre-processing pipeline (and soon should have experience working with the power spectrum pipeline?). The configuration files used for this step should live somewhere in Results and PresentationThis is a rather large project and cannot be run in a notebook, but we can still use a notebook for visualizing the data products at each stage of the test. This test will also constitute the meat of the validation paper, so we want to think very carefully about how we will be presenting our results. Below, I pose some questions that I think are important to answer—please add questions if you think of additional questions worth asking, and please offer answers for any questions you think you can answer, either completely or partially. How do we want to present our work? For each simulated systematic, do we want to have comparison plots that show the accuracy of the best-fit solutions for those systematics, assuming the solutions are retrievable from every step? For per-antenna systematics, do we want to devise a way to visualize the accuracy of the solutions for the entire array simultaneously? What about per-baseline effects? What will be our criteria for success? What do we want the reader to take away from the paper, and how do we visualize those points? Closing RemarksOver the next few weeks, we should come to a consensus on who is responsible for each part of this test, and what set of parameters will be used for each step. My understanding is that @jaguirre and @steven-murray should manage task assignment, @jsdillon should be the go-to person for questions regarding the analysis pipeline and LST-binning, and @nkern should be the go-to person for running the power spectrum pre-processing and estimation pipelines (although @acliu and @saurabh-astro should also be able to assist). My current understanding of task assignment is as follows:
|
Some comments:
Everything else to me makes sense. I think the real take-home point (if everything works as expected) should be that the output power matches the input. Given we have the flagging, we're going to have to have plots like @jburba's plots from 3.1 -- where we show the input/output power for many different cases. In this particular test, I'm not sure how useful it is to focus on any one systematic, since we've already verified that each step should work well. Obviously, if the test fails, we're going to have to dig further. I am happy to help with putting the notebook together and doing the scaffolding. |
This sounds like a grea plan. A few thoughts:
Actually, I think they should live in hera_opm. We'll make a new pipeline in the pipelines folder. I think that's a cleaner comparison, IMO.
I don't think we strictly need to add noise to the abscal model. This feels like an unnecessary complication that we could skip. Our abscal model already has the lack of realism from not being CASA-calibrated data. That said, if the EoR level is large, that might do weird things to the abscal if it's in the data and not in the model. Maybe it'll be fine... I'm not sure. A safer test would be to include the EoR in the abscal model (when the data has EoR in it).
We can work together on this @r-pascua, though I'm fine taking the lead.
Agreed. Also, let's make certain that the noise level in the data matches the expected noise from the autocorrelations (see, e.g. this function: https://github.com/HERA-Team/hera_cal/blob/c704901d45104e8d61f5015afac5f222bf36cdcf/hera_cal/noise.py#L37 ) |
Thanks for the comments @steven-murray and @jsdillon! I have just a few responses to some of the points the two of you raised; you may assume that I agree and have nothing further to add to any points not addressed in this comment. I'll update the original comment to reflect changes we agree should be made to the proposed plan of action. First change will address where the do scripts live—I'm happy to make this change without further discussion. I personally think we should have a focused discussion on what we'll use for the abscal model, and who will create it, at the next telecon—I think a discussion led by @jsdillon, @jaguirre, and @nkern would be very productive on this end. Please push back on this point if you disagree or do not completely agree. Note taken regarding inflating files when adding systematics. I'll be developing the file preparation script today (involving rephasing, relabeling of antennas, downselection to only include antennas present in IDR2.2 data, and chunking). Review by @steven-murray and @jsdillon would be appreciated. |
This is the first 'end-to-end' test.
hera_sim
,RIMEz
redcal
,smoothcal
,abscal
,xRFI,
pspec`Why this test is required
This is the most basic end-to-end test: using basic components for every part of the simulation, but enough to include all analysis steps.
Summary
A brief step-by-step description of the proposed test follows:
Simulation Details
Criteria for Success
The text was updated successfully, but these errors were encountered: