Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Step 4.1: End-to-End for H1C IDR3 #75

Open
steven-murray opened this issue Oct 12, 2021 · 2 comments
Open

Step 4.1: End-to-End for H1C IDR3 #75

steven-murray opened this issue Oct 12, 2021 · 2 comments
Labels
formal-test A formal Validation Test pipeline:abscal Tests the abscal pipeline component pipeline:pspec Tests the pspec pipeline component pipeline:redcal Tests the redcal pipeline component pipeline:smoothcal Tests the smoothcal pipeline component simcmp:eor:powerlaw Uses a power-law P(k) for EoR signal simcmp:fg:gleam Simulation Component: GLEAM simcmp:fg:gsm Simulator Component: Global Sky Model simcmp:sys:gains Simulation Component: Gains simcmp:sys:noise Simulation Component: Thermal noise simcmp:sys:reflections Simulator Component: Reflections simcmp:sys:xtalk Simulation Component: Cross-talk simulator:hera_sim Uses the hera_sim simulator simulator:viscpu Uses the vis_cpu simulator status:proposed A proposed formal test, not yet accepted as part of the project plan
Milestone

Comments

@steven-murray
Copy link
Contributor

steven-murray commented Oct 12, 2021

Step 4.1: End-to-End for H1C IDR3

This will be the full end-to-end test for H1C IDR3.

With respect to Step 4.0 (H1C IDR2), this has several updated components:

  • GLEAM in-filled with random sources (cf. Step 1.3: GLEAM + In-Filled Point Sources #74)
  • Get noise closer to actual data. Actually use all the baselines, LSTs.
  • New vis-cpu EoR/FG simulations with more antennas (and at the right locations).

The biggest difference, logistically, is that we want to simulate all days/epochs in the IDR3 dataset. This will be difficult in terms of compute/storage. Here's a plan:

Basic Flow:

  1. Ideal Vis Simulation (Diff FG, PS FG, EoR) + Fagnoni Beam + 5sec cadence + Ideal (redundant) IDR3 layout (from Josh's memo, excluding antennas always flagged, see also the a priori YAMLs)
  2. Produce Daily Datafiles for ALL days in a single EPOCH (epochs defined in Table 1 of this memo):
    a. Inflate ideal by non-redundancy
    b. Interpolate to times of single day/file
    c. Add noise, reflections, cross-talk
    d. Chunk sim and save
  3. Apply real flags to files
  4. Calibrate
  5. LST-bin the EPOCH
  6. Remove all but 1 of the daily files (probably best to keep the last day in the epoch).
  7. Pre-processing + Pspec etc. on LST-binned EPOCH
  8. Rinse and repeat for all four EPOCHS

Note that after doing a single epoch, we can fine-tune for remaining epochs. Some ideas would be to not produce ALL the days for each epoch, but instead do roughly 1/2 of the days (about 10). Note that throughout, we have N_EPOCHS=4 and N_COMBINATIONS=5.

CPU Time Estimates

  1. We don't have a good estimate of the ideal vis sim time yet (@hugh Garsden and @jburba are working on it). Relevant notes: vis_cpu can use MPI to distribute across a lot of nodes/processors. No shared memory as yet, so that limits how many processors per node. Can easily chunk simulation on frequency axis to reduce working memory.
  2. Systematic Sims: @bobby Pascua can provide more details, but it seems likely that I/O is the biggest bottleneck here. Can reduce that by reading in the ideal sims once and generating all days. Each day takes about ~10min of sim time. Thus, for an EPOCH (max 32 days), that is about 5 hours of CPU time (in serial), plus the IO overhead (1 hour?) = 6 hrs per EPOCH per COMBINATION.
  3. Apply flags: ?? probably negligible.
  4. Calibrate: in serial will take ~70hrs per day of observation. However, can be multitasked easily. Conservatively 2 hours of wall time per day = 60 hours per EPOCH per COMBINATION.
  5. LST-bin: ??
  6. Removing: probably negligible
  7. Pre-processing + Pspec: ??

Total wall-time estimate = IDEAL*3 + N_EPOCH * N_COMBINATION *(6+ 60 + PREPROCESS + PSPEC). If the latter two are negligible compared to the 60 hours for calibration, then we're looking at something like 100 hours per epoch and combination.

MAX Storage estimates

  1. 80 GB per ideal sim = 240 GB
  2. ~0.25TB/night for each night in a single epoch = 8 TB.
  3. Not sure if flagging takes any more space.
  4. Calibration files should be negligible? @joshdillon?
  5. 0.125 TB (half of the daily requirement) for each epoch and combination, as well as the LST-bin of all the epochs.
  6. Removing files obviously reduces requirements, but not the max. Since we keep one for each epoch (but NOT each combination), we should add (N_EPOCH-1)*0.25TB here.
  7. Pre-processing neligible? @joshdillon? @nkern?

So total MAX storage is 0.24 + 8 + (N_EPOCH+1)N_COMBINATION0.125 + (N_EPOCH-1)*0.25 = 9 + 3.5 = 12.5 (TB)

LONG TERM STORAGE REQUIREMENTS
Long term, we'll keep all the LST-binned datasets for all combinations, and a single day for all Epochs (in one combination) and the ideal data. This should be 0.675N_COMBINATION + 0.254 + 0.25 = 1.25 + 0.675*5 = 4.5 (TB)

Why this test is required

This is the final big test to make sure everything fits together well.

Simulation Details

  • Freq. range: 100-200 MHz
  • Channel width: 1024 channels
  • Baseline/antenna configuration: same as IDR3
  • Number of realisations: 1

Criteria for Success

  • Recovery of input EoR P(k) in window without bias (< 5%)
  • No recovery of EoR P(k) if not injected
@steven-murray steven-murray added formal-test A formal Validation Test pipeline:abscal Tests the abscal pipeline component pipeline:pspec Tests the pspec pipeline component pipeline:redcal Tests the redcal pipeline component pipeline:smoothcal Tests the smoothcal pipeline component simcmp:eor:powerlaw Uses a power-law P(k) for EoR signal simcmp:fg:gleam Simulation Component: GLEAM simcmp:fg:gsm Simulator Component: Global Sky Model simcmp:sys:gains Simulation Component: Gains simcmp:sys:noise Simulation Component: Thermal noise simcmp:sys:reflections Simulator Component: Reflections simcmp:sys:xtalk Simulation Component: Cross-talk simulator:hera_sim Uses the hera_sim simulator simulator:viscpu Uses the vis_cpu simulator status:proposed A proposed formal test, not yet accepted as part of the project plan labels Oct 12, 2021
@steven-murray steven-murray added this to the H1C IDR3 milestone Oct 12, 2021
@jaguirre
Copy link
Contributor

For reference, the EPOCH defining memo is http://reionization.org/manual_uploads/HERA097_H1C_IDR3_2_Memo.pdf

What is the motivation for saving one non-LST-binned file per epoch?

@steven-murray
Copy link
Contributor Author

@jaguirre I think the motivation is that we want something at the raw level for each epoch (each epoch having slightly different systematic parameters). This gives us one file to look back to if problems arise that we can't figure out at the LST-binned level.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
formal-test A formal Validation Test pipeline:abscal Tests the abscal pipeline component pipeline:pspec Tests the pspec pipeline component pipeline:redcal Tests the redcal pipeline component pipeline:smoothcal Tests the smoothcal pipeline component simcmp:eor:powerlaw Uses a power-law P(k) for EoR signal simcmp:fg:gleam Simulation Component: GLEAM simcmp:fg:gsm Simulator Component: Global Sky Model simcmp:sys:gains Simulation Component: Gains simcmp:sys:noise Simulation Component: Thermal noise simcmp:sys:reflections Simulator Component: Reflections simcmp:sys:xtalk Simulation Component: Cross-talk simulator:hera_sim Uses the hera_sim simulator simulator:viscpu Uses the vis_cpu simulator status:proposed A proposed formal test, not yet accepted as part of the project plan
Projects
None yet
Development

No branches or pull requests

7 participants