-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
quickquasars simulated spectra can not be reproduced #386
Comments
I'm taking a look at this, using this notebook alxogm/notebooks/global/homes/a/alxogm/Lyalpha/spectra_sim/desitutorials/quick_quasar.ipynb |
From what Ive seen quasars are reproduced if no features present. No BAL quasar are reproduced when run with BAL probability different from zero, i.e. only BALs look different. Adding the seed to --dla random helps to reproduce the added DLAs. But reset the random number generator, doesn't seem to help much. For instance when adding the DLAs from transmission file, there is no change in in the random number generator and still shows some difference. BALs seems to work fine, but BAL is added in tmp_qso_flux, before apply_lya_transmission. The notebook can be seen here https://github.com/desihub/desisim/blob/fix_dlabugs/etc/quick_quasar.ipynb |
So I've seen that spectra are reproduced if no noise is added. in line desisim/py/desisim/scripts/quickspectra.py Line 197 in 9df332d
As well as if I increase the exposure time, the differences gets much smaller as can be seen in the current version of the notebook https://github.com/desihub/desisim/blob/fix_dlabugs/etc/quick_quasar.ipynb Also checked that the status of the random generator status is the same for all runs up to line desisim/py/desisim/scripts/quickspectra.py Line 181 in 9df332d
However funny things happen in desisim/py/desisim/scripts/quickspectra.py Line 182 in 9df332d
like if I pass a seed number explicitly here, the difference between the dla random/no_dla was smaller, but not for any number I pass :S @julienguy any idea on how to fix it? or how can I load a local version of the specsim from the notebook? (or from command line) I haven't been able to do it as I do with desisim or redrock, therefore I can't continue dig in into it... |
@alxogm - What is the status of this? This is now the main limitation preventing us from doing systematic tests, so I would try to prioritize this task. Thanks! |
@andreufont So I'm not quite satisfied with my solution. But this is what I think: I'm sure the problem is when adding the random_noise_electron, but this is in specsim not desisim. The noise is given according to the detected electrons, so when DLAs is present the flux normalization slightly varies and this makes the adding noise function to be a bit sensitive. The current solution I tested involves modifying specsim/simulator.py . For some reason it likes it more to do an explicit cycle instead of standard array sintaxis, when adding the noise. But I also found that to convert the mean_electron variable to integer makes the agreement better. So I'm not sure all this is the way to go, but I haven't found another solution. The problem is that this slightly changes to specsim/simulator.py might affect other non-Lya codes.... So maybe @moustakas @julienguy and/or @sbailey know better if this issue is affecting others, or they might have another solution. You can see here my tests, I hope it is understandable https://github.com/desihub/desisim/blob/fix_dlabugs/etc/quick_quasar.ipynb It is better than before for DLAs cases, although it got a little bit worst for BAL (weird, but I think is because it is not affecting the overall flux normalization). |
Hi @alxogm , @LuzGarciaP - It's great to see these plots. There are several issues that could affect these tests, and it would be great to isolate them and understand them well. A) The first issue is the noise in the spectra. We would like to have the same noise realization every time we run the same code. Is this the case? You can test this on the same quasar, or you can even test this on a flat spectrum. B) Can we generate noiseless spectra? While we are not sure about point A) above, it would be good to have an option to generate simulated spectra with no noise. I guess we can get this by running with a very low level of noise, by using --exptime 1e6 (a thousand exposures) or something like this. C) In noiseless spectra (until we fix A), can we reproduce a simulated quasar spectra? Do we get the same quasar continuum, if we run the code twice? D) In noiseless spectra, and assuming we pass C above, do we get the same spectra (modulo a change in normalization) when we add DLAs to the spectra? Both random and from file? E) In noiseless spectra, and assuming we pass C above, do we get the same spectra (modulo a change in normalization) when we add BALs to the spectra? |
@LuzGarciaP - could you share notebooks (or plain python code) with your tests? It might help us understand what could be going on. |
@andreufont We cannot reproduce the noise. Most probably we will go for noiseless spectra using a large --exptime. |
@forero Yes, this is a good short term solution. But we should really be able to reproduce the noise! There is an option to specify the seed in desisim.simexp.simulate_spectra , I would have thought that this would be enough! After a quick search around, I noticed that this issue was already raised in specsim: desihub/specsim#83 and it looks like @dkirkby and @julienguy are already aware of this. @forero , @LuzGarciaP , could you take a look at it and see if you can fix that reproducibility? |
Hi @LuzGarciaP @forero, I've run Luz script as follows quickbals -i $ifile -o outfile --bal BAL --balprob 0.12 --overwrite --nmax 20 --seed 123 and the noise is more or least reproduced. It is at the same level as I was mentioning in a comment above using the bal option in quickquasars. I'm not sure of the differences of quickbals vs quickquasar but that you will tell us about. Can you please try to pass the --seed argument and check it is ok? maybe that way is easier to spot the BALs features I attach two plots here, with and without noise. |
@andreufont I think I've searched precisely along all the points you mention, mostly for the DLAs case, in the notebook. 1)If two runs are made with a seed they matches. In summary, I have a branch, fix_dlasbugs that has a significant improvement of reproducibility for DLAs, but need a modification to specsim that might affect other things. So I also saw desihub/specsim#83 but wasn't sure it was as related. |
Do they match exactly? If so, doesn't that mean the problem is fixed?
You can test if that issue is related by changing 'fastsim' to 'table' in this line: Line 476 in 9df332d
I can give more priority to fixing this issue if that change solves your problem. |
desihub/specsim#83 is now fixed in desihub/specsim#91 which should be merged in the next few days |
I just tested that in the master branch (the one you get in Cori if you load the DESI modules), and in the simplest quickquasar run you do get exactly the same noise, down to machine precision. @forero - when you say that you can not reproduce the noise, do you mean when you compare two spectra one with BAL and one without BAL? Or do you mean you don't reproduce the noise when running exactly the same command? Are you passing a seed? |
@andreufont I should be more precise. I meant that comparing two spectra (one with BAL and one without BAL) the noise is not the same. Running the same command produces the same noise. But, as @alxogm shows, our problem was not including the seed. The BAL's features should be indeed easier to spot in the kind of run you made. |
I just tested that in the master (the one you get in Cori if you load the DESI modules), if you generate twice the same quasar with --dla random, you get a different DLA even when using the same seed. See for instance: ifile='/project/projectdirs/desi/mocks/lya_forest/london/v2.0/0/0/transmission-16-0.fits' out_v0='spectra_v0.fits' out_v1='spectra_v1.fits' @alxogm - this looks like an isolated issue that should be easy to fix and commit, without worrying about changes in specsim that are not clear to me that are needed. |
Great, so we can consider issue A) solved. We can reproduce the noise. Now we need to be able to reproduce random DLAs, and random BALs. |
I also tested that if we generate twice a quasars with BAL and noise, we get the same thing, what is good. However, if we generate the same quasar twice, one with BAL and one without BAL, we get different noise vector. So the problem with BALs is that depending on whether we add BALs or not, we change the random numbers in the noise simulation. The solution should be the one I suggested in the first post of this issue: I belive we can easily accomplish this by resetting the random number generator, using the seed that has been passed, every time we need to generate numbers at a different part of the code. We don't care if the random numbers in continuum and in noise are correlated, as lonf as they are different in different quasars. Right? |
When adding DLAs from files, the noise added is also different, what to me makes no sense. I think this is that Alma had already spot. May be we should open individual issues for each of this ill cases, even if they might be related. |
I am on it: trying to keep the same random numbers in the instrumental noise simulation with or without BALs ( it is not the case now, and this is quite natural, because we are using random numbers in the sequence to choose BALs properties before adding instrumental noise). |
This is due to a quite amazing feature of numpy.random.poisson(mean) which "consumes" a different set of random numbers depending on the mean value. This as for consequence to totally perturb the random realization of the noise in a spectrum if a DLA is present or not.
returns on my computer
(The last two values have changed in the last row when they shouldn't).
returns (on my computer)
|
(thanks Stephen for noting this) |
And it behaves a bit different if in is inside a cycle, and making the mean an integer helps, but, is not perfect. @julienguy I hope you find how to fix it...
Enviado desde mi iPhone
El 20/07/2018, a la(s) 17:31, julienguy <[email protected]> escribió:
… This is due to a quite amazing feature of numpy.random.poisson(mean) which "consumes" a different set of random numbers depending on the mean value. This as for consequence to totally perturb the random realization of the noise in a spectrum if a DLA is present or not.
Example:
np.random.seed(1234)
print(np.random.poisson([10,10,10]))
np.random.seed(1234)
print(np.random.poisson([10,10,10]))
np.random.seed(1234)
print(np.random.poisson([1,10,10]))
returns on my computer
[ 7 9 13]
[ 7 9 13]
[ 0 11 8]
(The last two values have changed in the last row when they shouldn't).
This is not the case for random.normal.
mu=np.array([10,10,10])
np.random.seed(1234)
print(np.random.normal(loc=mu,scale=np.sqrt(mu)))
np.random.seed(1234)
print(np.random.normal(loc=mu,scale=np.sqrt(mu)))
mu=np.array([1,10,10])
np.random.seed(1234)
print(np.random.normal(loc=mu,scale=np.sqrt(mu)))
returns (on my computer)
[ 11.49080889 6.23380417 14.53061724]
[ 11.49080889 6.23380417 14.53061724]
[ 1.47143516 6.23380417 14.53061724]
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
This is fixed in branch no-poisson of desisim and specsim, where I have added an option use_poisson to quickspectra and specsim.simulator.Simulator.generate_random_noise that I set to False in quickquasars. I will let @dkirkby review the associated PRs. |
Thanks @julienguy for modifying specsim to fix this. |
@alxogm - I have checked that with the current master branch, we can reproduce all sorts of mocks. I think we can close this issue. If there are (unrelated) problems with the metal implementation, we can open a new issue. |
I can not reproduce the simulated spectra, even when passing a random seed. For instance, when I run the simple code below.
``
#!/bin/bash
downsampling=0.40
outdir=/project/projectdirs/desi/mocks/lya_forest/london/v2.0/test_quick/v0
infile=/project/projectdirs/desi/mocks/lya_forest/london/v2.0/0/0/transmission-16-0.fits
quickquasars --exptime 4000. -i $infile --zmin 1.8 --outdir $outdir/spectra-16 --downsampling
$downsampling --zbest --mags --desi-footprint --seed 123
``
Moreover, we need to make sure that we are able to run two simulations that are identical in any way, except adding a new artefact. This means that we should be able to add DLAs or not, without this changing the random numers for the quasar continua or for the instrumental noise.
I belive we can easily accomplish this by resetting the random number generator, using the seed that has been passed, every time we need to generate numbers at a different part of the code. We don't care if the random numbers in continuum and in noise are correlated, as lonf as they are different in different quasars. Right?
The text was updated successfully, but these errors were encountered: