quickquasars simulated spectra can not be reproduced #386

andreufont · 2018-07-04T11:28:44Z

I can not reproduce the simulated spectra, even when passing a random seed. For instance, when I run the simple code below.

``
#!/bin/bash

downsampling=0.40
outdir=/project/projectdirs/desi/mocks/lya_forest/london/v2.0/test_quick/v0
infile=/project/projectdirs/desi/mocks/lya_forest/london/v2.0/0/0/transmission-16-0.fits

quickquasars --exptime 4000. -i $infile --zmin 1.8 --outdir $outdir/spectra-16 --downsampling
$downsampling --zbest --mags --desi-footprint --seed 123
``

Moreover, we need to make sure that we are able to run two simulations that are identical in any way, except adding a new artefact. This means that we should be able to add DLAs or not, without this changing the random numers for the quasar continua or for the instrumental noise.

I belive we can easily accomplish this by resetting the random number generator, using the seed that has been passed, every time we need to generate numbers at a different part of the code. We don't care if the random numbers in continuum and in noise are correlated, as lonf as they are different in different quasars. Right?

andreufont · 2018-07-04T11:29:14Z

@alxogm @julienguy @londumas @forero

alxogm · 2018-07-04T21:24:03Z

I'm taking a look at this, using this notebook alxogm/notebooks/global/homes/a/alxogm/Lyalpha/spectra_sim/desitutorials/quick_quasar.ipynb
were could I put this so that is more convenient than my home directory ?

alxogm · 2018-07-05T00:06:15Z

From what Ive seen quasars are reproduced if no features present. No BAL quasar are reproduced when run with BAL probability different from zero, i.e. only BALs look different.
Differences still happening with DLAs, dig in into it...

Adding the seed to --dla random helps to reproduce the added DLAs. But reset the random number generator, doesn't seem to help much. For instance when adding the DLAs from transmission file, there is no change in in the random number generator and still shows some difference. BALs seems to work fine, but BAL is added in tmp_qso_flux, before apply_lya_transmission.

The notebook can be seen here

https://github.com/desihub/desisim/blob/fix_dlabugs/etc/quick_quasar.ipynb

alxogm · 2018-07-06T23:07:57Z

So I've seen that spectra are reproduced if no noise is added. in line

desisim/py/desisim/scripts/quickspectra.py

Line 197 in 9df332d

    
           flux = (table['observed_flux']+table['random_noise_electrons']*table['flux_calibration']).T.astype(float)

As well as if I increase the exposure time, the differences gets much smaller as can be seen in the current version of the notebook https://github.com/desihub/desisim/blob/fix_dlabugs/etc/quick_quasar.ipynb

Also checked that the status of the random generator status is the same for all runs up to line

desisim/py/desisim/scripts/quickspectra.py

Line 181 in 9df332d

random_state = np.random.RandomState(seed)

However funny things happen in

desisim/py/desisim/scripts/quickspectra.py

Line 182 in 9df332d

sim.generate_random_noise(random_state)

like if I pass a seed number explicitly here, the difference between the dla random/no_dla was smaller, but not for any number I pass :S

@julienguy any idea on how to fix it? or how can I load a local version of the specsim from the notebook? (or from command line) I haven't been able to do it as I do with desisim or redrock, therefore I can't continue dig in into it...

andreufont · 2018-07-17T17:43:30Z

@alxogm - What is the status of this? This is now the main limitation preventing us from doing systematic tests, so I would try to prioritize this task. Thanks!

alxogm · 2018-07-18T06:03:24Z

@andreufont So I'm not quite satisfied with my solution. But this is what I think:

I'm sure the problem is when adding the random_noise_electron, but this is in specsim not desisim. The noise is given according to the detected electrons, so when DLAs is present the flux normalization slightly varies and this makes the adding noise function to be a bit sensitive. The current solution I tested involves modifying specsim/simulator.py . For some reason it likes it more to do an explicit cycle instead of standard array sintaxis, when adding the noise. But I also found that to convert the mean_electron variable to integer makes the agreement better.
This is in lines

https://github.com/desihub/specsim/blob/a51974bdcd4aaed53f0b3ce4759a33a229059d8b/specsim/simulator.py#L599

So I'm not sure all this is the way to go, but I haven't found another solution. The problem is that this slightly changes to specsim/simulator.py might affect other non-Lya codes.... So maybe @moustakas @julienguy and/or @sbailey know better if this issue is affecting others, or they might have another solution.

You can see here my tests, I hope it is understandable

https://github.com/desihub/desisim/blob/fix_dlabugs/etc/quick_quasar.ipynb

It is better than before for DLAs cases, although it got a little bit worst for BAL (weird, but I think is because it is not affecting the overall flux normalization).

LuzGarciaP · 2018-07-18T14:31:34Z

Hi everyone. We have made some progress on the BAL' side, working with the branch trial_bal (that mimics quickquasars). However, we are not able to visualize the BAL features from the noise that is automatically introduced to the spectra.
In addition, we encountered an issue when trying to compare spectra with and without BALs, because each run introduces some random noise that makes the spectra not reproducible. Both points can be visualize in the plot I attach:
The blue spectrum includes a BAL, whereas the orange one doesn't... No features are visible from the noise, and the overall fluxes are clearly different, due to the noise added in each case that changes the normalization.

Have you made any progress by fixing the random seed? Do you have any suggestion to fix the issue above?

andreufont · 2018-07-18T17:39:22Z

Hi @alxogm , @LuzGarciaP - It's great to see these plots.

There are several issues that could affect these tests, and it would be great to isolate them and understand them well.

A) The first issue is the noise in the spectra. We would like to have the same noise realization every time we run the same code. Is this the case? You can test this on the same quasar, or you can even test this on a flat spectrum.

B) Can we generate noiseless spectra? While we are not sure about point A) above, it would be good to have an option to generate simulated spectra with no noise. I guess we can get this by running with a very low level of noise, by using --exptime 1e6 (a thousand exposures) or something like this.

C) In noiseless spectra (until we fix A), can we reproduce a simulated quasar spectra? Do we get the same quasar continuum, if we run the code twice?

D) In noiseless spectra, and assuming we pass C above, do we get the same spectra (modulo a change in normalization) when we add DLAs to the spectra? Both random and from file?

E) In noiseless spectra, and assuming we pass C above, do we get the same spectra (modulo a change in normalization) when we add BALs to the spectra?

andreufont · 2018-07-18T17:40:12Z

@LuzGarciaP - could you share notebooks (or plain python code) with your tests? It might help us understand what could be going on.

forero · 2018-07-18T17:46:25Z

@andreufont We cannot reproduce the noise. Most probably we will go for noiseless spectra using a large --exptime.

andreufont · 2018-07-18T18:04:56Z

@forero Yes, this is a good short term solution.

But we should really be able to reproduce the noise! There is an option to specify the seed in desisim.simexp.simulate_spectra , I would have thought that this would be enough!

After a quick search around, I noticed that this issue was already raised in specsim: desihub/specsim#83 and it looks like @dkirkby and @julienguy are already aware of this.

@forero , @LuzGarciaP , could you take a look at it and see if you can fix that reproducibility?

alxogm · 2018-07-19T03:16:49Z

Hi @LuzGarciaP @forero, I've run Luz script as follows

quickbals -i $ifile -o outfile --bal BAL --balprob 0.12 --overwrite --nmax 20 --seed 123

and the noise is more or least reproduced. It is at the same level as I was mentioning in a comment above using the bal option in quickquasars. I'm not sure of the differences of quickbals vs quickquasar but that you will tell us about. Can you please try to pass the --seed argument and check it is ok? maybe that way is easier to spot the BALs features

I attach two plots here, with and without noise.

alxogm · 2018-07-19T03:58:05Z

@andreufont I think I've searched precisely along all the points you mention, mostly for the DLAs case, in the notebook.
**I've edited the notebook so that this more clear (I think).

1)If two runs are made with a seed they matches.
2) The noiseless spectra, also matches. This was done by commenting out the adding noise lines in the code, but I also tried as you said by increasing the exposure time and that also reproduces the spectra.

In summary, I have a branch, fix_dlasbugs that has a significant improvement of reproducibility for DLAs, but need a modification to specsim that might affect other things.

So I also saw desihub/specsim#83 but wasn't sure it was as related.
Now that you mention it, I've made a brach of specsim so that the proposed solution can be tested...

dkirkby · 2018-07-19T07:10:30Z

1)If two runs are made with a seed they matches.

Do they match exactly? If so, doesn't that mean the problem is fixed?

So I also saw desihub/specsim#83 but wasn't sure it was as related.

You can test if that issue is related by changing 'fastsim' to 'table' in this line:

desisim/py/desisim/simexp.py

Line 476 in 9df332d

desi.instrument.fiberloss_method = 'fastsim'

I can give more priority to fixing this issue if that change solves your problem.

dkirkby · 2018-07-19T10:21:51Z

desihub/specsim#83 is now fixed in desihub/specsim#91 which should be merged in the next few days

andreufont · 2018-07-19T15:59:26Z

I just tested that in the master branch (the one you get in Cori if you load the DESI modules), and in the simplest quickquasar run you do get exactly the same noise, down to machine precision.

@forero - when you say that you can not reproduce the noise, do you mean when you compare two spectra one with BAL and one without BAL? Or do you mean you don't reproduce the noise when running exactly the same command? Are you passing a seed?

forero · 2018-07-19T16:18:40Z

@andreufont I should be more precise. I meant that comparing two spectra (one with BAL and one without BAL) the noise is not the same. Running the same command produces the same noise.

But, as @alxogm shows, our problem was not including the seed. The BAL's features should be indeed easier to spot in the kind of run you made.

andreufont · 2018-07-19T16:22:13Z

I just tested that in the master (the one you get in Cori if you load the DESI modules), if you generate twice the same quasar with --dla random, you get a different DLA even when using the same seed.

See for instance:

ifile='/project/projectdirs/desi/mocks/lya_forest/london/v2.0/0/0/transmission-16-0.fits'

out_v0='spectra_v0.fits'
cmd="quickquasars -i $ifile -o $out_v0 --nmax 1 --seed 123 --overwrite --dla random"
$cmd

out_v1='spectra_v1.fits'
cmd="quickquasars -i $ifile -o $out_v1 --nmax 1 --seed 123 --overwrite --dla random"
$cmd

@alxogm - this looks like an isolated issue that should be easy to fix and commit, without worrying about changes in specsim that are not clear to me that are needed.

andreufont · 2018-07-19T16:23:43Z

Great, so we can consider issue A) solved. We can reproduce the noise.

Now we need to be able to reproduce random DLAs, and random BALs.

andreufont · 2018-07-19T18:10:00Z

I also tested that if we generate twice a quasars with BAL and noise, we get the same thing, what is good.

However, if we generate the same quasar twice, one with BAL and one without BAL, we get different noise vector. So the problem with BALs is that depending on whether we add BALs or not, we change the random numbers in the noise simulation. The solution should be the one I suggested in the first post of this issue:

I belive we can easily accomplish this by resetting the random number generator, using the seed that has been passed, every time we need to generate numbers at a different part of the code. We don't care if the random numbers in continuum and in noise are correlated, as lonf as they are different in different quasars. Right?

andreufont · 2018-07-19T18:21:15Z

When adding DLAs from files, the noise added is also different, what to me makes no sense. I think this is that Alma had already spot.

May be we should open individual issues for each of this ill cases, even if they might be related.

julienguy · 2018-07-20T18:08:09Z

I am on it: trying to keep the same random numbers in the instrumental noise simulation with or without BALs ( it is not the case now, and this is quite natural, because we are using random numbers in the sequence to choose BALs properties before adding instrumental noise).

julienguy · 2018-07-20T22:31:49Z

This is due to a quite amazing feature of numpy.random.poisson(mean) which "consumes" a different set of random numbers depending on the mean value. This as for consequence to totally perturb the random realization of the noise in a spectrum if a DLA is present or not.
Example:

np.random.seed(1234)
print(np.random.poisson([10,10,10]))
np.random.seed(1234)
print(np.random.poisson([10,10,10]))
np.random.seed(1234)
print(np.random.poisson([1,10,10]))

returns on my computer

[ 7  9 13]
[ 7  9 13]
[ 0 11  8]

(The last two values have changed in the last row when they shouldn't).
This is not the case for random.normal.

mu=np.array([10,10,10])
np.random.seed(1234)
print(np.random.normal(loc=mu,scale=np.sqrt(mu)))
np.random.seed(1234)
print(np.random.normal(loc=mu,scale=np.sqrt(mu)))
mu=np.array([1,10,10])
np.random.seed(1234)
print(np.random.normal(loc=mu,scale=np.sqrt(mu)))

returns (on my computer)

[ 11.49080889   6.23380417  14.53061724]
[ 11.49080889   6.23380417  14.53061724]
[  1.47143516   6.23380417  14.53061724]

julienguy · 2018-07-20T22:41:53Z

(thanks Stephen for noting this)

alxogm · 2018-07-20T23:20:18Z

And it behaves a bit different if in is inside a cycle, and making the mean an integer helps, but, is not perfect. @julienguy I hope you find how to fix it... Enviado desde mi iPhone El 20/07/2018, a la(s) 17:31, julienguy <[email protected]> escribió:

…

This is due to a quite amazing feature of numpy.random.poisson(mean) which "consumes" a different set of random numbers depending on the mean value. This as for consequence to totally perturb the random realization of the noise in a spectrum if a DLA is present or not. Example: np.random.seed(1234) print(np.random.poisson([10,10,10])) np.random.seed(1234) print(np.random.poisson([10,10,10])) np.random.seed(1234) print(np.random.poisson([1,10,10])) returns on my computer [ 7 9 13] [ 7 9 13] [ 0 11 8] (The last two values have changed in the last row when they shouldn't). This is not the case for random.normal. mu=np.array([10,10,10]) np.random.seed(1234) print(np.random.normal(loc=mu,scale=np.sqrt(mu))) np.random.seed(1234) print(np.random.normal(loc=mu,scale=np.sqrt(mu))) mu=np.array([1,10,10]) np.random.seed(1234) print(np.random.normal(loc=mu,scale=np.sqrt(mu))) returns (on my computer) [ 11.49080889 6.23380417 14.53061724] [ 11.49080889 6.23380417 14.53061724] [ 1.47143516 6.23380417 14.53061724] — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

julienguy · 2018-07-20T23:21:36Z

This is fixed in branch no-poisson of desisim and specsim, where I have added an option use_poisson to quickspectra and specsim.simulator.Simulator.generate_random_noise that I set to False in quickquasars. I will let @dkirkby review the associated PRs.

alxogm · 2018-07-21T07:52:48Z

Thanks @julienguy for modifying specsim to fix this.
I used the no-poisson branch and the DLAs and BALs cases looks very good. Here is some examples. I have asked to @londumas to help me check the metals case.

andreufont · 2018-07-30T15:23:31Z

@alxogm - I have checked that with the current master branch, we can reproduce all sorts of mocks. I think we can close this issue.

If there are (unrelated) problems with the metal implementation, we can open a new issue.

forero assigned LuzGarciaP Jul 4, 2018

alxogm mentioned this issue Jul 19, 2018

Script results are not reproducible desihub/specsim#83

Closed

julienguy mentioned this issue Jul 20, 2018

No poisson #393

Merged

julienguy closed this as completed Jul 30, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

quickquasars simulated spectra can not be reproduced #386

quickquasars simulated spectra can not be reproduced #386

andreufont commented Jul 4, 2018

andreufont commented Jul 4, 2018

alxogm commented Jul 4, 2018

alxogm commented Jul 5, 2018 •

edited

Loading

alxogm commented Jul 6, 2018

andreufont commented Jul 17, 2018

alxogm commented Jul 18, 2018

LuzGarciaP commented Jul 18, 2018

andreufont commented Jul 18, 2018

andreufont commented Jul 18, 2018

forero commented Jul 18, 2018

andreufont commented Jul 18, 2018

alxogm commented Jul 19, 2018

alxogm commented Jul 19, 2018 •

edited

Loading

dkirkby commented Jul 19, 2018

dkirkby commented Jul 19, 2018

andreufont commented Jul 19, 2018

forero commented Jul 19, 2018 •

edited

Loading

andreufont commented Jul 19, 2018

andreufont commented Jul 19, 2018

andreufont commented Jul 19, 2018

andreufont commented Jul 19, 2018

julienguy commented Jul 20, 2018

julienguy commented Jul 20, 2018

julienguy commented Jul 20, 2018

alxogm commented Jul 20, 2018 via email

julienguy commented Jul 20, 2018

alxogm commented Jul 21, 2018

andreufont commented Jul 30, 2018

quickquasars simulated spectra can not be reproduced #386

quickquasars simulated spectra can not be reproduced #386

Comments

andreufont commented Jul 4, 2018

andreufont commented Jul 4, 2018

alxogm commented Jul 4, 2018

alxogm commented Jul 5, 2018 • edited Loading

alxogm commented Jul 6, 2018

andreufont commented Jul 17, 2018

alxogm commented Jul 18, 2018

LuzGarciaP commented Jul 18, 2018

andreufont commented Jul 18, 2018

andreufont commented Jul 18, 2018

forero commented Jul 18, 2018

andreufont commented Jul 18, 2018

alxogm commented Jul 19, 2018

alxogm commented Jul 19, 2018 • edited Loading

dkirkby commented Jul 19, 2018

dkirkby commented Jul 19, 2018

andreufont commented Jul 19, 2018

forero commented Jul 19, 2018 • edited Loading

andreufont commented Jul 19, 2018

andreufont commented Jul 19, 2018

andreufont commented Jul 19, 2018

andreufont commented Jul 19, 2018

julienguy commented Jul 20, 2018

julienguy commented Jul 20, 2018

julienguy commented Jul 20, 2018

alxogm commented Jul 20, 2018 via email

julienguy commented Jul 20, 2018

alxogm commented Jul 21, 2018

andreufont commented Jul 30, 2018

alxogm commented Jul 5, 2018 •

edited

Loading

alxogm commented Jul 19, 2018 •

edited

Loading

forero commented Jul 19, 2018 •

edited

Loading