Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to generate DESI-Y3 Lya mocks #581

Merged
merged 5 commits into from
Dec 16, 2024
Merged

Conversation

HiramHerrera
Copy link
Contributor

This Pull Request includes updates in py/desisim/survey_release.py and the py/desisim/scripts/gen_qso_catalog.py script in order to be able to produce DESI-Y3 mocks.

Mayor updates:

  • survey_release.py does not require HEALpix Pixel map of the number of tiles covering a region (NPASS) to function anymore. Instead it is internally generated based on a tiles-<release>.fits file within the DESI_SPECTRO_REDUX directory by default. This tile file might be an input if a custom footprint is desired.
  • survey_release.py now includes the option to select mock objects in order to mimicking the characteristics of non-qso targets included in the observed data catalog.

Minor updates:

  • gen_qso_catalog.py: changed --input-data flag to None by default instead of the iron release observed Lya catalog.
  • gen_qso_catalog.py: default release is now "jura".
  • gen_qso_catalog.py: addition of two flags: --include-nonqso-targets to include nonqso targets and --tiles-file to use a custom footprint tiles.

Example to make a jura mock seed catalog:

gen_qso_catalog -i <input_master_catalog> --input-data <input_data_catalog> --seed <seed> --release jura --include-nonqso-targets -o <output_file_path>

In the above example:

  • <input_data_catalog> was set to: /global/cfs/cdirs/desi/users/martini/bal-catalogs/jura/QSO_cat_jura_main_dark_healpix_v1-bal.fits for the Y3 mocks.

@HiramHerrera
Copy link
Contributor Author

@andreicuceu could review this too? I couldn't assign you as reviewer.

@coveralls
Copy link

coveralls commented Jul 26, 2024

Coverage Status

coverage: 44.333% (-0.3%) from 44.679%
when pulling 69ecffa on HiramHerrera:y3mocks
into 038252e on desihub:main.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HiramHerrera - I'm not super familiar with this script, but the changes look good.

In a couple of years from now, when we want to make Y5 mocks, it will be confusing to have release="Y5" to refer to the forecasted final DESI instead of referring to the actual Y5 release, although I guess we could use either DR3 or whatever mountain is used to generate the Y5 release

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar with this script either, and the changes are quite long. Is there any particular part of the code that you'd like me to take a look at? Otherwise I'll let others review it.

@andreicuceu
Copy link
Contributor

Hi @HiramHerrera, I've been trying to test this branch, but I ran into an issue: desi_zcatalog doesn't appear to work anymore for mocks because the --prefix option has been removed. I know this has to do with a different repository, but it affects us because it was part of the standard mock-making process. So I think we should address that problem in some way in this pull-request. My proposal would be to modify the QSO catalog generated by the gen_qso_catalog script here so picca can directly use it. This would also allow us to skip the extra step of re-generating the QSO catalog after running QQ. Is there any downside to this?

I had a look at the seed catalog generated with gen_qso_catalog and the one that used to be generated with desi_zcatalog, and as far as I can tell they have exactly the same quasars. Also, I think the only missing column that picca and QSOnic need is TARGETID. Strangely this appears in the Y1 seed catalogs, but not in the Y3 ones. So the solution may be as easy as making a copy of the current MOCKID column that is named TARGETID so picca can directly use this catalog.

It would also be useful to name the hdu ZCATALOG instead of the current CATALOG. Even though picca will accept both, it looks like passing anything different from ZCATALOG will raise an error in the future, and in QSOnic this already raises an error.

@HiramHerrera
Copy link
Contributor Author

Hi @andreicuceu

In principle what you request should be doable. However, I'm concerned about the redshift in the catalog.

The catalog generated by desi_zcatalog and the seed catalog generated prior running QQ would slightly differ in redshift since QQ adds some random shifts: FoG effect (400 km/s by default), systematic (0 by default) and redshift errors (0 by default) which would be contained in the redshift column (Z) generated by desi_zcatalog. On the other hand, the seed catalog generated prior running QQ would have the redshift without any shift added (Z_QSO_RSD in the Raw mocks master catalog).

This would slightly alter the resulting correlations. Apart from that I can't think of other issue for picca. Maybe @paulmartini and Ting can comment on the necessary columns/HDUs for the BAL and DLA finders to work.

@andreicuceu
Copy link
Contributor

andreicuceu commented Aug 5, 2024

That's a great point @HiramHerrera. So we should not use the seed catalog for downstream analysis. In that case do you have some suggestions on how to solve the issue? It looks like the problem is that desi_zcatalog only works with redrock files now, and we have zbest files. So I guess we could either write our own script that opens the zbest files and makes the post-QQ quasar catalog (I'd be happy to give this a go), or alternatively try to modify desi_zcatalog so it works with zbest files again. What do you think?

@HiramHerrera
Copy link
Contributor Author

@andreicuceu I think the best option would be to either make our own script (my personal preference) that we could add to lyatools. In this regard I think that @p-slash already has a working code (see desihub/desispec#2140)

Or simply use an older version of the desicode for making the redshift catalog only (I use 23.1).

What do you think?

@alxogm
Copy link
Contributor

alxogm commented Aug 5, 2024

@HiramHerrera @andreicuceu adding to the question of whether making desi_zcatalog compatible again or having our own function, I wonder what people is currently using to generate truth DLA and BAL full catalogs? I remember having a script for that, which I shared with various people, including Hiram quite likely. So, I would say that the same script should generate the mock redshift catalog, the DLA and BAL catalog at the same time, and we should host it either in lyatools, or in desisim.

On the other, I think if someone have the time to bring back the desi_zcatalog functionality would be good and include a test to make sure is not broken again in the future.

@andreicuceu
Copy link
Contributor

Thanks for pointing me to that issue @HiramHerrera. So far I have indeed been using an older version of desi_zcatalog, but that is a hack, not a solution. So I think having our own script would make the most sense. @alxogm for DLA and BAL catalogs, I wrote separate scripts in lyatools (see this and this). So I would be happy to host a QSO catalog script in lyatools as well. I'll have a look at it over the next few days (maybe just using Naim's code).

@p-slash
Copy link
Contributor

p-slash commented Aug 6, 2024

Apologies I didn't have the time to go through this yet. Regarding the catalog creation. Yes, I have a script:
https://github.com/p-slash/desi-y1-p1d/blob/main/src/desi_y1_p1d/qq_zcatalog.py
This generates the true QSO, DLA, and BAL catalogs. If BALs are present, the output zcat.fits will have BALs appended similarly to the data. It sounds like we have a few options. In my opinion, there's not a lot of reason to fix the desi_zcatalog script other than backward compatibility.

Comment on lines 290 to 300
if 'TSNR2_LRG' in self.data.colnames:
log.info('Getting effective exposure time in data catalog by 12.15*TSNR2_LRG.')
exptime_data = 12.15*self.data['TSNR2_LRG']
elif 'TSNR2_LYA' in self.data.colnames:
log.info('Getting effective exposure time in data catalog by 11.8*TSNR2_LYA.')
exptime_data = 11.8*self.data['TSNR2_LYA']
elif 'TSNR2_QSO' in self.data.colnames:
log.info('Getting effective exposure time in data catalog by 33.61*TSNR2_QSO.')
exptime_data = 33.61*self.data['TSNR2_QSO']
else:
raise ValueError("Can't compute effective exposure time. Data catalog columns should include TSNR2_LRG, TSNR2_LYA or TSNR2_QSO.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the assumption for self.data here? TSNR2_LRG has the highest priority, so should you raise a big warning if that is not found? Or are you imagining that the absence of TSNR2_LRG column is deliberate in order to run some tests using other columns, so a warning is not required?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a warning if TSNR2_LRG is not in the data catalog.

We use TSNR2_LRG as this is what the spectroscopic pipeline uses to define the effective exposure time. However I've found some catalogs without this column so I added the option to use other target templates SNR.

Maybe @julienguy can provide more insight here.

Comment on lines +18 to +19
from desimodel.io import load_tiles
from desimodel.footprint import tiles2pix, is_point_in_desi, radec2pix
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small readability comment: You can import load_tiles as desimodel_load_tiles so that it is easier to understand in the code below. Not a big deal.

@p-slash p-slash self-requested a review August 9, 2024 16:53
Copy link
Contributor

@p-slash p-slash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left small comments for readability and best practices. I did not review the expected behavior of the code and its accuracy etc, so I left it to pipeline tests. Better to merge after the code has been tested.
I approve it as it is if you want to go ahead and merge.

@sbailey
Copy link
Contributor

sbailey commented Aug 13, 2024

Coming to the conversation late, but a few notes:

  • support for desi_zcatalog --prefix was dropped last year in #2117 . This wasn't directly purposeful, but more that I wasn't aware that it was still needed and the bookkeeping was a pain so I dropped it to keep the code less messy. If it would still be quite useful, I'd welcome a desispec update to readd that.
  • In general, it would be better for new mocks to follow the current data model structure of real data processing. We haven't used "zbest" for many years (dropped sometime before fuji, our first public release). That would make it easier to keep our tools in sync for mocks vs. data.

@HiramHerrera
Copy link
Contributor Author

@sbailey what would be the recommended name for the redrock like files?

We used zbest as it was the baseline back then but we could take advantage of this PR to update it so our mocks follow the same format as observed data as much as posible.

I wonder if changing this name would fix the desi_zcatalog compatibility issue.

What do you think @alxogm, @julienguy?

@alxogm
Copy link
Contributor

alxogm commented Aug 14, 2024

I wonder if changing this name would fix the desi_zcatalog compatibility issue.

Hi @HiramHerrera, it will not fix all the compatibility issues but it will remove the need of a specific flag and, will make it easier to fix the remaining ones.

I think is a good idea to use the current name for the catalogs, I believe the current name is redrock-{survey}-{program}-{healpix}.fits, so we'll need to figure out if it could be enough with something like redrock-{healpix}.fits for the mocks, or how we'll have to adapt the name...

@sbailey
Copy link
Contributor

sbailey commented Dec 3, 2024

Yes, for healpix based fits the individual Redrock fits are redrock-SURVEY-PROGRAM-HEALPIX.fits, e.g. /global/cfs/cdirs/desi/spectro/redux/loa/healpix/main/dark/100/10000/redrock-main-dark-10000.fits. Consider inventing your own PROGRAM name for sims, e.g. redrock-main-qsosim.fits.

I previously advocated for trying to make the sims match the real data model more closely, but that could be a mess. The latest desi_zcatalog might also be picky about directory structure, and a future version will be even more restrictive because it will start to bring in extra info from qso_qn, emline, etc. files that it expects to find in the same directory as the redrock file. It might be best to do something like @p-slash qq_zcatalog.py to assemble your own redshift catalog from sims without getting bogged down in details that don't apply to your case.

@andreicuceu
Copy link
Contributor

Just a quick note that we already have our own version of Naim's script to create the QSO catalog here. So I don't think we need to change anything in desisim on that side. Also, given this branch was already used to create the Y3 mocks, I think we should merge and tag it for future reference as is, and do any other desired changes only after it's been tagged.

@corentinravoux
Copy link

corentinravoux commented Dec 9, 2024

Hi all, if everybody agrees, we can merge this branch and tag it for Y3.
Do you know if the CI unit test fails can be solved, or need to be solved here ?

@HiramHerrera
Copy link
Contributor Author

I am merging this since it seems to be approved and that we don't have any problems :)

@HiramHerrera HiramHerrera merged commit 0f8e903 into desihub:main Dec 16, 2024
2 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants