Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimising configuration for high-z cluster runs #61

Open
jacobic opened this issue Nov 2, 2020 · 5 comments
Open

Optimising configuration for high-z cluster runs #61

jacobic opened this issue Nov 2, 2020 · 5 comments

Comments

@jacobic
Copy link

jacobic commented Nov 2, 2020

Hi Eli,

I was just wondering if it would be possible to provide some advice about adjusting the config for optimising completeness at high-z > 0.8. I am currently using DR8 of the Legacy Imaging Surveys (grz) with WISE (w1) and running redmapper in scanning mode (maximising lnlike as a function of redshift) for X-ray selected clusters.

I have experienced some excellent results with when performing optical-only calibrations with grz up to redshifts of 0.85 (with multiple iterations over the full footprint) but now I am trying to push to higher redshifts by adding the w1 band.

The configuration described below extends up to a redshift of 1.2 and although sub-optimal, I have seen some promising cluster candidates at z > 0.9. At the moment the redshift distribution of the resulting cluster sample is not smooth at for z > 0.9 and some redshift bins at high-z look very incomplete compared to others.

I suspect that the problem is most likely related to my colour mode settings (especially *_maxnodes). Please let me know if you spot any settings which could be tweaked to improve performance.

Basic set up

refmag: z
zrange: [0.05, 1.20]
bands: ['g', 'r', 'z', 'w1']
mstar_survey: des
mstar_band: z03

Training set

  • I am currently calibrating on ~20% of the healpixels (nside=32) in the Legacy Survey region which have the highest fraction of spectroscopically confirmed cluster members at high-z.
  • However, it appears there is still not enough spectroscopic galaxies at z > 1.0.
  • Spectroscopic galaxies that make up the training set are confirmed members from the literature but not necessarily the BCG.

Healsparse maps

  • The depth map (created from the Rykoff+15 model) is nside=2048 and uses refmag. I have just started a calibration with nside=4096 to see if it improves the bkg accuracy at high-z.
  • The depth map is always created using observed fluxes/magnitudes which have not been corrected for extinction (as in Rykoff+15), however, the magnitudes in the galaxy catalogues are corrected for extinction.
  • The mask contains fracgood and is nside=4096 and considers bad fields + bright stars etc in z and w1 bands.

Colour modes

  • Use all colours.

  • calib_colormem_colormodes: [0, 1, 2]

  • Cut spectroscopic training at zbounds just before the colour models start to become flat.

  • calib_colormem_zbounds: [0.35, 0.75]

  • Set maxnodes to the redshifts where the colour scatter starts to blow up due to being shallow/blue

  • calib_color_maxnodes: [0.6, 0.85, -1]

  • calib_covmat_maxnodes: [0.6, 0.85, -1]

I think I need to tweak the settings above as the sigma for r-z and z-w1 colours reduces to zero before the redshift limit of the calibration in the plots below. In particular the z-w1 sigma is not smooth. This results in a strange redshift distribution in the cluster sample for z > 0.9. I suspect this is partially due to the above settings as well as the lack of spectroscopic galaxies and the accuracy of the initial red sequence models at very high redshifts.

Given these diagnostic plots What values of calib_covmat_maxnodes and calib_covmat_maxnodes do you recommend?

calib_colormem_sigint

calib_colormem_sigint: [0.05, 0.03, 0.06]. This does not actually appear to be used in the code, is that correct?

Zreds

The iter1 zreds look ok for z < 0.9 but it is far from perfect. It is asymentic about ztrue=zred and the apparent gap at 1.0 < z < 1.1 is slightly worrying.

wcen_cal_zrange

  • I set the upper limit to be the maximum of z_range where scaleval=1.
  • wcen_cal_zrange: [0.05, 0.60].
  • Do I need to increase the upper limit to > 0.6 to have accurate centring at high-z?

limmag_catalog

  • limmag_catalog: 24.0.
  • 0.2L* = m*(z=1.2) + 1.75 = 23.75. Do you think I should go deeper?
  • I override limmag_hard in the master catalogue table with limmag_catalog in the config file. I have confirmed this works at each step of the calibration.

Initial red sequence models

  • My initial ezgal red-sequence models look very similar to the default DES redmapper files for g-r and r-z , however they start to deviate at very high-z. This comparison is shown in the figure below.

  • I created a z-w1 model. Do you think the level of accuracy is ok at high-z? I assume that it should not matter given the *_maxmodes settings above. My w1-w2 model that I also created (not shown here) is even more identical to yours (created in WISE (W1-W2) initial red sequence models #58) at all redshifts.

Background setting

  • calib_make_full_bkg: False
  • For speed this has been turned off (for now).

Minimum richness for computing z_lambda correction.

  • calib_zlambda_minlambda: 7.0
  • Default is 20.0 but as redmapper complains that there are some bins without enough spectra.
  • This is likely to be caused by an underlying problem.

As soon as I have the right settings I will increase the size of the training footprint and increase the number of calibration iterations.

I apologise for such a long report but thought it would be better to be verbose in order to speed up troubleshooting.
If you have any advice for me whatsoever I would be extremely grateful.

Thanks again for all your hard work. I really appreciate it!

Cheers,
Jacob

@erykoff
Copy link
Owner

erykoff commented Nov 5, 2020

Thank you for your detailed report. This is interesting, and very helpful. Hopefully some of these suggestions will prove to be useful. These answers follow your questions, and are not in order of importance.

  1. For the depth map, the raw depth map is using "reddened" magnitudes, but the depth map should be corrected for reddening to match the galaxy catalog. I doubt that higher resolution will make much difference. And fixing the reddening of the depth maps won't make a huge difference at high z because we're looking in the NIR where there is less reddening.
  2. Colormodes looks good. zbounds looks fine. Setting calib_color_maxnodes might not be necessary; this says that the calibration should fix the mean g-r color for all higher redshifts at the value at 0.6. This obviously isn't correct, you still have signal on the mean g-r color at z>0.6. However, I am confused about your diagnostic plots, since they extend to z=1.2? Or was this run without those settings? On the other hand, setting calib_covmat_maxnodes to something slightly below the redshift where it collapses would make sense. This will fix the scatter to these values at higher redshift. But they might not be too important, because the photometric errors are getting large enough for the intrinsic scatter not to matter much.
    At the same time, given the wiggliness of the sigma plots, I would recommend setting calib_covmat_nodesize to something larger than the default 0.15. This will hurt a little at the filter transitions, but will smooth things out. You could also consider increasing calib_slope_nodesizes because the slope is looking a bit jumpy, especially in z-W1. The default calib_color_nodesizes looks fine, but you could also try increasing this, but I don't think it's a problem. The calib_colormem_sigint I thought was being used as the first guess for the intrinsic width, but apparently I stopped using that. Huh!
  3. The features/outliers for zred are normal for the first iteration, this is a selection based on a guess of the color of the clusters. This will smooth itself out with further iterations. However, the lack of any galaxies at higher z means that it really did fail to grab any high z galaxies, which is bad. The red training galaxy plots look fine as a function of redshift, it may just be that the node sizes need to be adjusted to keep the fit from getting wonky at higher z. Another parameter to look at is calib_corr_pcut which is the membership probability cut of galaxies going into these diagnostics and the correction plots. If, due to photometric noise or some other reason, the probabilities are peaked at something below 0.9 then they're going to be missing from this plot and the rest of the calibration can go wonky.
  4. The initial red sequence model is probably fine, especially z-W1 which is tracking the spec galaxies fine.
  5. I'm not surprised you have to change calib_zlambda_minlambda. You can probably go down to 5 okay. And further iterations will really cut the outliers and the wiggles.

So what I would recommend is to play with the settings above until you're satisfied with (a) the zred plots (that they have what looks like a reasonable number of galaxies selected), and (b) the z_lambda plots (that they have a reasonable number of clusters to high z). Probably increasing the node size will help. And you shouldn't worry about outliers at this point, it's a problem if they persist into the second iteration which makes a big difference in the selection/modeling.

@jacobic
Copy link
Author

jacobic commented Nov 12, 2020

Hi Eli,

Thanks so much for your detailed response. I learned a lot while implementing your recommendations and think I am definitely a few steps closer to achieving an accurate high-z calibration thanks to your help :)

I tried almost everything that you suggested (detailed at the end of this message) but think this is the cause of the problem is the following:

Proposed solution

These are the default cuts

calib_pcut = ConfigField(default=0.3)
calib_color_pcut = ConfigField(default=0.7)

calib_use_pcol = ConfigField(default=True)

and the pcut is applied to pcol by default to define use and the pcol cut is used to define coluse

gals = GalaxyCatalog.from_galfile(self._galfile)
if self.config.calib_use_pcol:
use, = np.where((gals.z > self.config.zrange[0]) &
(gals.z < self.config.zrange[1]) &
(gals.pcol > self.config.calib_pcut))
else:
use, = np.where((gals.z > self.config.zrange[0]) &
(gals.z < self.config.zrange[1]) &
(gals.p > self.config.calib_pcut))
if use.size == 0:
raise RuntimeError("No good galaxies in %s!" % (self._galfile))
gals = gals[use]

This means that the galaxies used to to correct zreds will only be cut with have pcol > 0.3 as gals depends on use.

# Compute correction (mode2)
self._calc_corrections(gals, mode2=True)

because in the function above there is only minor outlier clipping and no further usage of any of the pcol / p cuts

# This is an arbitrary 2sigma cut...
guse, = np.where((gals.lkhd > thresh) &
(np.abs(gals.z - gals.zred) < 2. * gals.zred_e))

corrfitter = CorrectionFitter(self.pars.corr_z,
z[guse],
gals.z[guse] - gals.zred[guse],
gals.zred_e[guse],
slope_nodes=self.pars.corr_slope_z,
probs=np.clip(probs[guse], None, 0.99),
dmags=gals.refmag[guse] - pivotmags[guse],
ws=w)

The calib pcorr value is actually never used in the code (apart from in one redmagic script)

calib_corr_pcut = ConfigField(default=0.9)

but in the plotting routine there is a 0.9 hardcoded

Explanation

Using a pcut of 0.3 rather than 0.7 or 0.9 will make a big difference at high-z because it is where the training set is most contaminated.

Since I opened this issue I switched to using a spectroscopic training set which samples the top 2000 brighest cluster galaxies from the literature in each redshift bin of width 0.05 across the whole redshift range.

This means there is more than enough signal and redmapper does not complain at all.

As you can see from the histogram of the training set, at around z~0.8 completeness starts to drop off and we have < 2000 cluster galaxies per dz=0.05. This is where contamination starts to become a problem. This is what I believe causes the bias at high-z since the contaminating galaxies have low pcol yet are still included in the zred correction because of the potential bug described above.

This figure below shows what the iter_0 color mem file looks like at for lambda > 10 for z-w1 at high redshift.

  • red points show the cuts which are currently used in the code
  • blue points is what I think is supposed to be used by default if pcol cut was working
  • green is what I think the distribution of should look like if pcalib_corr was being used in the code.

You can see that although there are many galaxies with high pcol, there are also a lot with pcol < 0.7 (and it actually turns over at 0.65).

To avoid these low probability galaxies being used in the zred correction and redsequence fitting I propose to force the pcol cut / p cut / p calib cut to something higher than 0.3 in the code. Hopefully this will solve the bias in zred and zlambda at high-z.

Please let me know what you think about my proposed solution and thanks again for such useful feedback!

Cheers,
Jacob

P.S. Other things that I updated while trying to solve the problem:

  • calibrating on the full footprint with all training galaxies (and calculating the background for 10% of the footprint)
  • modifying the zbounds to push r-z closer to the transition (0.8 rather than 0.75)
  • trying many different combinations of the various maxnode size settings for colour, slope and most importantly covmat.
  • leaving the default maxnodes to -1 for each colour
  • trying with multiple iterations (it does improve after 2 but zred and zlambda were still biased at z > 0.8).
  • using limmag_catalog: 24.5 instead of 24.0
  • manually modifying the z-w1 model to increase the number of training clusters (I think this was a bad idea because it means it is more likely to be effected by contamination)
  • I still need to fix the depth map. Thanks for clarifying about the extinction.

@jacobic
Copy link
Author

jacobic commented Nov 12, 2020

Hi Eli,

Here is a quick update with further debugging.

At first glance, modifying the cuts does not seem to improve things on the first iteration and more conservative cuts do not result in better zred at high-z. Perhaps it will make a bigger difference during the second integration (or later on in the first calibration)... or perhaps I am fundamentally misunderstanding something in my previous comment.

Experimenting with cuts

calib_color_pcut: 0.7, calib_pcut: 0.3 (default settings)

calib_color_pcut: 0.7, calib_pcut: 0.7

calib_color_pcut: 0.9, calib_pcut: 0.9

New solution?

I still suspect that the bias at high-z is related to the zred / zlambda corrections. The plot below uses your newly pushed zscan code and my "best" calibration so far (1 iter, maxnodes=-1 covmat_nodesize=0.2, improved training clusters etc.) to check the calibration performance. As you can see there is a build up of high-z clusters in a zscan for the SPT Clusters from the 2500deg^2 Bocquet+19 sample below. This is relatively unbiased as a function of z_lambda but very biased as a function of zlit (the literature redshift in this case).

As the build up of the clusters at around z=0.85 I thought it could be because the zbounds is too high (currently 0.8) between r-z and z-w1. This means there is a huge peak in the distribution of clusters at this transition point because the colour models for r-z and z-w1 are both relatively flat at z=0.8.

It also looks strongly correlated to scaleval which made me think that the extinction bug in the depth map (which is defined in the z-band) could be causing this problem but since scaleval is related to redshift it is difficult to say.

Please let me know if you have any ideas and thanks for pushing the zscan code, it works like a charm!

Cheers,
Jacob

@jacobic
Copy link
Author

jacobic commented Nov 13, 2020

Hi Eli,

One last update from me before the weekend (sorry for so many comments).

  • I have also tried using a minimal training set (as I was worried about overfitting).
  • and I tried changing calib_redspec_nsig from 2, 1, 0.5 and 0.25 because I was worried about contamination from blue galaxies but it did not help either.
  • I varied the filter z-W1 transition from redshift 0.6, 0.65, 0.7 and 0.75

None of these things improved things much...

It could be simply that the colours and uncertainties from the 4year WISE forced photometry are not sufficiently accurate or deep enough when processed in the Legacy Imaging Surveys so when the transition to z-W1 is made, things start to go pear shaped.

To test this theory out I am matching CATWISE 2020 galaxies to the grz galaxies in Legacy DR8 which will hopefully result in more reliable colour information at high-z due to the increase in depth and the fact the photometry is not forced. This should make it easier to interpret the zred plots.

Have a great weekend!

Cheers,
Jacob

@erykoff
Copy link
Owner

erykoff commented Nov 13, 2020

So I don't have any quick answers, and I'll have to look at the use of the different p cuts to make sure that things are doing what they're supposed to be doing and documented at least adequately. But I wanted to point out that the zscan mode is not magic; it's only as good as the red sequence model that's put in. So if the red sequence model is not converged properly, then zscan will end up with a pileup as you see.

One thing to look at is not the extinction variation (which isn't going to be that large) but the overall depth. Are you using z or W1 as the reference band? And how deep is the catalog in these bands, and how deep is it in terms of L* at z=0.8, 1.0, 1.2? Because if you're just reaching the tip of the luminosity function, things won't work as well. One thing that you can try, though, to normalize things is to change the reference luminosity cut from the default 0.2L* to something brighter (0.4 or 0.5L*) and see if that reduces problems where I think you might be hitting the filter transition and the z-band depth limit at the same time.

Another thing to look at is the actual errors in the photometric catalog. If these are largely over- or under- estimated that can lead to problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants