Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

quickquasars simulated spectra can not be reproduced #386

Closed
andreufont opened this issue Jul 4, 2018 · 28 comments
Closed

quickquasars simulated spectra can not be reproduced #386

andreufont opened this issue Jul 4, 2018 · 28 comments
Assignees

Comments

@andreufont
Copy link
Contributor

I can not reproduce the simulated spectra, even when passing a random seed. For instance, when I run the simple code below.

``
#!/bin/bash

downsampling=0.40
outdir=/project/projectdirs/desi/mocks/lya_forest/london/v2.0/test_quick/v0
infile=/project/projectdirs/desi/mocks/lya_forest/london/v2.0/0/0/transmission-16-0.fits

quickquasars --exptime 4000. -i $infile --zmin 1.8 --outdir $outdir/spectra-16 --downsampling
$downsampling --zbest --mags --desi-footprint --seed 123
``

Moreover, we need to make sure that we are able to run two simulations that are identical in any way, except adding a new artefact. This means that we should be able to add DLAs or not, without this changing the random numers for the quasar continua or for the instrumental noise.

I belive we can easily accomplish this by resetting the random number generator, using the seed that has been passed, every time we need to generate numbers at a different part of the code. We don't care if the random numbers in continuum and in noise are correlated, as lonf as they are different in different quasars. Right?

@andreufont
Copy link
Contributor Author

@alxogm
Copy link
Contributor

alxogm commented Jul 4, 2018

I'm taking a look at this, using this notebook alxogm/notebooks/global/homes/a/alxogm/Lyalpha/spectra_sim/desitutorials/quick_quasar.ipynb
were could I put this so that is more convenient than my home directory ?

@alxogm
Copy link
Contributor

alxogm commented Jul 5, 2018

From what Ive seen quasars are reproduced if no features present. No BAL quasar are reproduced when run with BAL probability different from zero, i.e. only BALs look different.
Differences still happening with DLAs, dig in into it...

Adding the seed to --dla random helps to reproduce the added DLAs. But reset the random number generator, doesn't seem to help much. For instance when adding the DLAs from transmission file, there is no change in in the random number generator and still shows some difference. BALs seems to work fine, but BAL is added in tmp_qso_flux, before apply_lya_transmission.

The notebook can be seen here

https://github.com/desihub/desisim/blob/fix_dlabugs/etc/quick_quasar.ipynb

@alxogm
Copy link
Contributor

alxogm commented Jul 6, 2018

So I've seen that spectra are reproduced if no noise is added. in line

flux = (table['observed_flux']+table['random_noise_electrons']*table['flux_calibration']).T.astype(float)

As well as if I increase the exposure time, the differences gets much smaller as can be seen in the current version of the notebook https://github.com/desihub/desisim/blob/fix_dlabugs/etc/quick_quasar.ipynb

Also checked that the status of the random generator status is the same for all runs up to line

random_state = np.random.RandomState(seed)

However funny things happen in

sim.generate_random_noise(random_state)

like if I pass a seed number explicitly here, the difference between the dla random/no_dla was smaller, but not for any number I pass :S

@julienguy any idea on how to fix it? or how can I load a local version of the specsim from the notebook? (or from command line) I haven't been able to do it as I do with desisim or redrock, therefore I can't continue dig in into it...

@andreufont
Copy link
Contributor Author

@alxogm - What is the status of this? This is now the main limitation preventing us from doing systematic tests, so I would try to prioritize this task. Thanks!

@alxogm
Copy link
Contributor

alxogm commented Jul 18, 2018

@andreufont So I'm not quite satisfied with my solution. But this is what I think:

I'm sure the problem is when adding the random_noise_electron, but this is in specsim not desisim. The noise is given according to the detected electrons, so when DLAs is present the flux normalization slightly varies and this makes the adding noise function to be a bit sensitive. The current solution I tested involves modifying specsim/simulator.py . For some reason it likes it more to do an explicit cycle instead of standard array sintaxis, when adding the noise. But I also found that to convert the mean_electron variable to integer makes the agreement better.
This is in lines

https://github.com/desihub/specsim/blob/a51974bdcd4aaed53f0b3ce4759a33a229059d8b/specsim/simulator.py#L599

So I'm not sure all this is the way to go, but I haven't found another solution. The problem is that this slightly changes to specsim/simulator.py might affect other non-Lya codes.... So maybe @moustakas @julienguy and/or @sbailey know better if this issue is affecting others, or they might have another solution.

You can see here my tests, I hope it is understandable

https://github.com/desihub/desisim/blob/fix_dlabugs/etc/quick_quasar.ipynb

It is better than before for DLAs cases, although it got a little bit worst for BAL (weird, but I think is because it is not affecting the overall flux normalization).

@LuzGarciaP
Copy link

Hi everyone. We have made some progress on the BAL' side, working with the branch trial_bal (that mimics quickquasars). However, we are not able to visualize the BAL features from the noise that is automatically introduced to the spectra.
In addition, we encountered an issue when trying to compare spectra with and without BALs, because each run introduces some random noise that makes the spectra not reproducible. Both points can be visualize in the plot I attach:
The blue spectrum includes a BAL, whereas the orange one doesn't... No features are visible from the noise, and the overall fluxes are clearly different, due to the noise added in each case that changes the normalization.
number_317_trial_today

Have you made any progress by fixing the random seed? Do you have any suggestion to fix the issue above?

@andreufont
Copy link
Contributor Author

Hi @alxogm , @LuzGarciaP - It's great to see these plots.

There are several issues that could affect these tests, and it would be great to isolate them and understand them well.

A) The first issue is the noise in the spectra. We would like to have the same noise realization every time we run the same code. Is this the case? You can test this on the same quasar, or you can even test this on a flat spectrum.

B) Can we generate noiseless spectra? While we are not sure about point A) above, it would be good to have an option to generate simulated spectra with no noise. I guess we can get this by running with a very low level of noise, by using --exptime 1e6 (a thousand exposures) or something like this.

C) In noiseless spectra (until we fix A), can we reproduce a simulated quasar spectra? Do we get the same quasar continuum, if we run the code twice?

D) In noiseless spectra, and assuming we pass C above, do we get the same spectra (modulo a change in normalization) when we add DLAs to the spectra? Both random and from file?

E) In noiseless spectra, and assuming we pass C above, do we get the same spectra (modulo a change in normalization) when we add BALs to the spectra?

@andreufont
Copy link
Contributor Author

@LuzGarciaP - could you share notebooks (or plain python code) with your tests? It might help us understand what could be going on.

@forero
Copy link
Member

forero commented Jul 18, 2018

@andreufont We cannot reproduce the noise. Most probably we will go for noiseless spectra using a large --exptime.

@andreufont
Copy link
Contributor Author

@forero Yes, this is a good short term solution.

But we should really be able to reproduce the noise! There is an option to specify the seed in desisim.simexp.simulate_spectra , I would have thought that this would be enough!

After a quick search around, I noticed that this issue was already raised in specsim: desihub/specsim#83 and it looks like @dkirkby and @julienguy are already aware of this.

@forero , @LuzGarciaP , could you take a look at it and see if you can fix that reproducibility?

@alxogm
Copy link
Contributor

alxogm commented Jul 19, 2018

Hi @LuzGarciaP @forero, I've run Luz script as follows

quickbals -i $ifile -o outfile --bal BAL --balprob 0.12 --overwrite --nmax 20 --seed 123

and the noise is more or least reproduced. It is at the same level as I was mentioning in a comment above using the bal option in quickquasars. I'm not sure of the differences of quickbals vs quickquasar but that you will tell us about. Can you please try to pass the --seed argument and check it is ok? maybe that way is easier to spot the BALs features

I attach two plots here, with and without noise.

bals_wseed
bals_noseed

@alxogm
Copy link
Contributor

alxogm commented Jul 19, 2018

@andreufont I think I've searched precisely along all the points you mention, mostly for the DLAs case, in the notebook.
**I've edited the notebook so that this more clear (I think).

1)If two runs are made with a seed they matches.
2) The noiseless spectra, also matches. This was done by commenting out the adding noise lines in the code, but I also tried as you said by increasing the exposure time and that also reproduces the spectra.

In summary, I have a branch, fix_dlasbugs that has a significant improvement of reproducibility for DLAs, but need a modification to specsim that might affect other things.

So I also saw desihub/specsim#83 but wasn't sure it was as related.
Now that you mention it, I've made a brach of specsim so that the proposed solution can be tested...

@dkirkby
Copy link
Member

dkirkby commented Jul 19, 2018

1)If two runs are made with a seed they matches.

Do they match exactly? If so, doesn't that mean the problem is fixed?

So I also saw desihub/specsim#83 but wasn't sure it was as related.

You can test if that issue is related by changing 'fastsim' to 'table' in this line:

desi.instrument.fiberloss_method = 'fastsim'

I can give more priority to fixing this issue if that change solves your problem.

@dkirkby
Copy link
Member

dkirkby commented Jul 19, 2018

desihub/specsim#83 is now fixed in desihub/specsim#91 which should be merged in the next few days

@andreufont
Copy link
Contributor Author

I just tested that in the master branch (the one you get in Cori if you load the DESI modules), and in the simplest quickquasar run you do get exactly the same noise, down to machine precision.

@forero - when you say that you can not reproduce the noise, do you mean when you compare two spectra one with BAL and one without BAL? Or do you mean you don't reproduce the noise when running exactly the same command? Are you passing a seed?

@forero
Copy link
Member

forero commented Jul 19, 2018

@andreufont I should be more precise. I meant that comparing two spectra (one with BAL and one without BAL) the noise is not the same. Running the same command produces the same noise.

But, as @alxogm shows, our problem was not including the seed. The BAL's features should be indeed easier to spot in the kind of run you made.

@andreufont
Copy link
Contributor Author

I just tested that in the master (the one you get in Cori if you load the DESI modules), if you generate twice the same quasar with --dla random, you get a different DLA even when using the same seed.

See for instance:

ifile='/project/projectdirs/desi/mocks/lya_forest/london/v2.0/0/0/transmission-16-0.fits'

out_v0='spectra_v0.fits'
cmd="quickquasars -i $ifile -o $out_v0 --nmax 1 --seed 123 --overwrite --dla random"
$cmd

out_v1='spectra_v1.fits'
cmd="quickquasars -i $ifile -o $out_v1 --nmax 1 --seed 123 --overwrite --dla random"
$cmd

@alxogm - this looks like an isolated issue that should be easy to fix and commit, without worrying about changes in specsim that are not clear to me that are needed.

@andreufont
Copy link
Contributor Author

Great, so we can consider issue A) solved. We can reproduce the noise.

Now we need to be able to reproduce random DLAs, and random BALs.

@andreufont
Copy link
Contributor Author

I also tested that if we generate twice a quasars with BAL and noise, we get the same thing, what is good.

However, if we generate the same quasar twice, one with BAL and one without BAL, we get different noise vector. So the problem with BALs is that depending on whether we add BALs or not, we change the random numbers in the noise simulation. The solution should be the one I suggested in the first post of this issue:

I belive we can easily accomplish this by resetting the random number generator, using the seed that has been passed, every time we need to generate numbers at a different part of the code. We don't care if the random numbers in continuum and in noise are correlated, as lonf as they are different in different quasars. Right?

@andreufont
Copy link
Contributor Author

When adding DLAs from files, the noise added is also different, what to me makes no sense. I think this is that Alma had already spot.

May be we should open individual issues for each of this ill cases, even if they might be related.

@julienguy
Copy link
Contributor

I am on it: trying to keep the same random numbers in the instrumental noise simulation with or without BALs ( it is not the case now, and this is quite natural, because we are using random numbers in the sequence to choose BALs properties before adding instrumental noise).

@julienguy
Copy link
Contributor

This is due to a quite amazing feature of numpy.random.poisson(mean) which "consumes" a different set of random numbers depending on the mean value. This as for consequence to totally perturb the random realization of the noise in a spectrum if a DLA is present or not.
Example:

np.random.seed(1234)
print(np.random.poisson([10,10,10]))
np.random.seed(1234)
print(np.random.poisson([10,10,10]))
np.random.seed(1234)
print(np.random.poisson([1,10,10]))

returns on my computer

[ 7  9 13]
[ 7  9 13]
[ 0 11  8]

(The last two values have changed in the last row when they shouldn't).
This is not the case for random.normal.

mu=np.array([10,10,10])
np.random.seed(1234)
print(np.random.normal(loc=mu,scale=np.sqrt(mu)))
np.random.seed(1234)
print(np.random.normal(loc=mu,scale=np.sqrt(mu)))
mu=np.array([1,10,10])
np.random.seed(1234)
print(np.random.normal(loc=mu,scale=np.sqrt(mu)))

returns (on my computer)

[ 11.49080889   6.23380417  14.53061724]
[ 11.49080889   6.23380417  14.53061724]
[  1.47143516   6.23380417  14.53061724]

@julienguy
Copy link
Contributor

(thanks Stephen for noting this)

@alxogm
Copy link
Contributor

alxogm commented Jul 20, 2018 via email

@julienguy
Copy link
Contributor

This is fixed in branch no-poisson of desisim and specsim, where I have added an option use_poisson to quickspectra and specsim.simulator.Simulator.generate_random_noise that I set to False in quickquasars. I will let @dkirkby review the associated PRs.

spec

@julienguy julienguy mentioned this issue Jul 20, 2018
@alxogm
Copy link
Contributor

alxogm commented Jul 21, 2018

Thanks @julienguy for modifying specsim to fix this.
I used the no-poisson branch and the DLAs and BALs cases looks very good. Here is some examples. I have asked to @londumas to help me check the metals case.

dla_file
dla_rand
bals

@andreufont
Copy link
Contributor Author

@alxogm - I have checked that with the current master branch, we can reproduce all sorts of mocks. I think we can close this issue.

If there are (unrelated) problems with the metal implementation, we can open a new issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants