[WIP]: theory shift covmat comparison - updated #314

RosalynLP · 2018-10-22T11:33:39Z

Cleaner version carrying on from #309

https://vp.nnpdf.science/bQCke9RZT4-eC4A7szF80g==/figures/plot_thcorrmat_heatmap_custom2.png
This looks a bit weird to me - first it is missing FT DY relative to https://vp.nnpdf.science/NlltmlyWRRqCtSeJbi1xIQ==/, and second the correlations between different experiments look different to what we were previously seeing - I don't see why the upper left block of experiments should be so strongly correlated as they are just presented alphabetically - it seems as if maybe the labelling has been done wrong - any ideas of the cause of this? I am trying to debug but haven't spotted the issue yet.

voisey · 2018-10-22T12:02:55Z

validphys2/src/validphys/theorycovariance.py

+    return combine_by_type(process_lookup, all_matched_results, dataset_names)
+
+datapsecs_theoryids = collect('theoryid', ['dataspecs'])
+


dataspecs_theoryids rather than datapsecs_theoryids?

voisey · 2018-10-22T12:08:58Z

validphys2/src/validphys/theorycovariance.py

+all_matched_results = collect('matched_dataspecs_results',
+                              ['matched_datasets_from_dataspecs'])
+
+def combine_by_type2(process_lookup, all_matched_results, dataset_names):


If you have the time could you please add descriptions to the functions? In particular, could you explain what the difference is between <function_name> and <function_name>2?

These things shouldn't be called _2 in the first places. Should be _dataspecs or something similar.

Zaharid · 2018-10-22T12:19:44Z

@RosalynLP Can we see the same covmat using the original functions? I agree this looks suspicious.

RosalynLP · 2018-10-22T12:52:41Z

https://vp.nnpdf.science/Y8t94trrTMm_mxozbAy1Eg==/figures/default_theory0_plot_thcorrmat_heatmap_custom.png

Here is the original

RosalynLP · 2018-10-22T13:26:12Z

I'm going to try running it with just BCDMS and HERACOMB to see if that makes things any clearer

Zaharid · 2018-10-22T13:38:18Z

Also note that a potentially more elegant way to implement this is with NNPDF/reportengine#63. But let's get something working first, even if it's ugly.

voisey · 2018-10-22T15:32:11Z

validphys2/src/validphys/theorycovariance.py

+    return covs_pt_prescrip(combine_by_type_dataspecs, process_starting_points_dataspecs,
+                            datapsecs_theoryids, fivetheories)
+
+def covmap2(combine_by_type_dataspecs, dataset_names):


Should this '2' be replaced with 'dataspecs' too?

Yeah it should, I haven't pushed that change yet :)

RosalynLP · 2018-10-22T15:49:59Z

matched_cuts_debug.pdf
This is a comparison of the "old" (point prescriptions as previously implemented) and "new" (implementing using matched cuts and all the dataspecs functions) theory correlation matrices, just for BCDMS versus HERACOMB. You can see they're basically quite similar but if you look closely there are some subtle differences - it appears as if HERACOMB has less stringent cuts in the new case but for both I am using

fit: 180421-lr-nlo-central_global
use_cuts: "internal"
q2min: 3
w2min: 5

I might try adding in a few more datasets to see if it makes it clearer what's going on

RosalynLP · 2018-10-22T16:29:12Z

matched_cuts_debug.pdf
This might provide some more insight - you can see that something funny is going on. Look at the correlations between HERACOMB and BCDMS and now they are very weak. The correlations within HERACOMB itself have also appeared as a weird block diagonal form. You can also see what appears to be a strong correlation of LHCb with parts of HERACOMB. However, the rough sizes of each dataset seem OK which makes me think that perhaps the labelling is "OK", but something is going wrong at the construction of the matrix stage? Any thoughts?

RosalynLP · 2018-10-22T16:33:44Z

It seems like maybe the "same process type" matrix construction is being applied in some random regions which don't correspond to full datasets?

Zaharid · 2018-10-22T16:33:54Z

Can you please upload these things with the relevant runcards?

RosalynLP · 2018-10-22T16:37:25Z

new: https://vp.nnpdf.science/63eXhbbJQ7iVcIS-TXmFXg==/
old: https://vp.nnpdf.science/Kg42WepuQWac4vW97GyJcQ==/

Zaharid · 2018-10-22T16:41:39Z

I think I made a stupid mistake with dataset_names. It should also use the per-datsepc version, which is different. I hadn't realized that at some point we have:

    for dataset, name in zip(each_dataset_results_bytheory, dataset_names):

in combine_by_type, so these things need to match properly. Should work better by using matched_dataspecs_dataset_name instead.

This was out of sync because we were zipping the results with the datasets, as ordered in the original experiments, rather than the datasets as ordered by matched_datasets_from_dataspecs.

Zaharid · 2018-10-22T17:00:51Z

It looks more reasonable now:

https://vp.nnpdf.science/OarGiaUTQ2yNMxKxnFDRMQ==

Zaharid · 2018-10-22T17:10:14Z

On Mon, Oct 22, 2018 at 4:50 PM RosalynLP ***@***.***> wrote: matched_cuts_debug.pdf <https://github.com/NNPDF/nnpdf/files/2502063/matched_cuts_debug.pdf> This is a comparison of the "old" (point prescriptions as previously implemented) and "new" (implementing using matched cuts and all the dataspecs functions) theory correlation matrices, just for BCDMS versus HERACOMB. You can see they're basically quite similar but if you look closely there are some subtle differences - it appears as if HERACOMB has less stringent cuts in the new case but for both I am using fit: 180421-lr-nlo-central_global use_cuts: "internal" q2min: 3 w2min: 5

You shouldn't be using this cut configuration. These were bogus values that I entered at random in the test runcard. q2min and w2min should not be there and cuts should probably be "fromfit". Or else, you should find out the correct q2min and w2min values.

…

I might try adding in a few more datasets to see if it makes it clearer what's going on — You are receiving this because your review was requested. Reply to this email directly, view it on GitHub <#314 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFabUvp-FDqNW27sSSrTQ0KyJ7QlTYeDks5unekngaJpZM4XzONq> .

RosalynLP · 2018-10-22T17:21:15Z

Aha excellent, good point this looks far better

RosalynLP · 2018-10-22T17:22:00Z

OK I will change the cuts back, I altered it to match yours just for debugging purposes because that also introduced a difference.

RosalynLP · 2018-10-22T18:28:57Z

So this is the full thing: https://vp.nnpdf.science/3wNfZxXdR4SVPuRZEsg4rg==/figures/plot_thcorrmat_heatmap_custom_dataspecs.png

RosalynLP · 2018-10-22T18:42:59Z

Compared to https://vp.nnpdf.science/NlltmlyWRRqCtSeJbi1xIQ==/figures/plot_matched_datasets_shift_marix_correlations.png we can see that there is some similarity in the patterns at least. The next step is to create functions which take in both of these and compare the mutual parts, right? Presumably this means using another function a bit like matched cuts to find the mutual datasets with matching ccuts in each one and only use those elements? Or would it make sense if the matched cuts are computed separately using information from the 3 different fits, and then these are applied to both functions?

Zaharid · 2018-10-23T10:13:15Z

I would do a production rule that takes 'shiftconfig' and 'theoryconfig' and returns a mapping containing 'shidtconfig' and 'theoryconfig' where each has the initial datasepcs and returns a mapping with 'shiftconfig' and 'theoryconfig' again, but with matched dataspecs, with the same data and the same matched cuts. Probably that production rule can use the existing ones (matched_datasets_from_datasepcs and dataspecs_with_matched_cuts).

RosalynLP · 2018-10-23T13:30:15Z

OK so what I have been trying is the runcard below, and now I am writing a production rule in config.py. I am a little unsure of what to take in. Can I do something like:

def produce_dataspecs_matched_between_namespaces(self, shiftconfig, theoryconfig):

or should I be passing in something other thanshiftconfigand theoryconfig here? My plan was to do something like load in dataspecs from both shiftconfig and theoryconfig and then call matched_datasets_from_dataspecs and dataspecs_with_matched_cuts on these combined dataspecs. However, I am not sure how to get dataspecs from shiftconfig and theoryconfig as these namespaces both have a number of sub-elements so are not a simple list of things (i.e. they also have information on the fit and experiments and things, they aren't just lists of dataspecs).


meta:
    title: Testing combining namespaces for shift matrices
    author: Rosalyn Pearson
    keywords: [test, theory uncertainties]

experiments:
  - experiment: BCDMS
    datasets:
      - dataset: BCDMSP
      - dataset: BCDMSD
  - experiment: HERACOMB
    datasets:
      - dataset: HERACOMBNCEM 
      - dataset: HERACOMBNCEP460
      - dataset: HERACOMBNCEP575
      - dataset: HERACOMBNCEP820
      - dataset: HERACOMBNCEP920
      - dataset: HERACOMBCCEM 
      - dataset: HERACOMBCCEP 
  - experiment: CMS
    datasets:
      - dataset: CMSWEASY840PB
      - dataset: CMSWMASY47FB
 #     - dataset: CMSWCHARMRAT
      - dataset: CMSDY2D11
      - dataset: CMSWMU8TEV
      - dataset: CMSJETS11
      - dataset: CMSTTBARTOT
      - dataset: CMSTOPDIFF8TEVTTRAPNORM
  - experiment: LHCb
    datasets:
      - dataset: LHCBZ940PB
      - dataset: LHCBZEE2FB

theoryconfig:

   theoryids:
      - 163
      - 180
      - 173

   theoryid: 163

   use_cuts: 'fromfit'

   fit: 180421-lr-nlo-central_global

   pdf: 
      from_: fit

   dataspecs:
           - theoryid: 163
             speclabel: $(\xi_F,\xi_R)=(1,1)$
             experiments:
               - experiment: BCDMS
                 datasets:
                    - dataset: BCDMSP
                    - dataset: BCDMSD
               - experiment: HERACOMB
                 datasets:
                    - dataset: HERACOMBNCEM 
                    - dataset: HERACOMBNCEP460
                    - dataset: HERACOMBNCEP575
                    - dataset: HERACOMBNCEP820
                    - dataset: HERACOMBNCEP920
                    - dataset: HERACOMBCCEM 
                    - dataset: HERACOMBCCEP 
               - experiment: CMS
                 datasets:
                    - dataset: CMSWEASY840PB
                    - dataset: CMSWMASY47FB
       #             - dataset: CMSWCHARMRAT
                    - dataset: CMSDY2D11
                    - dataset: CMSWMU8TEV
                    - dataset: CMSJETS11
                    - dataset: CMSTTBARTOT
                    - dataset: CMSTOPDIFF8TEVTTRAPNORM
               - experiment: LHCb
                 datasets:
                    - dataset: LHCBZ940PB
                    - dataset: LHCBZEE2FB
           - theoryid: 180
             speclabel: $(\xi_F,\xi_R)=(2,2)$
             experiments:
               - experiment: BCDMS
                 datasets:
                    - dataset: BCDMSP
                    - dataset: BCDMSD
               - experiment: HERACOMB
                 datasets:
                    - dataset: HERACOMBNCEM 
                    - dataset: HERACOMBNCEP460
                    - dataset: HERACOMBNCEP575
                    - dataset: HERACOMBNCEP820
                    - dataset: HERACOMBNCEP920
                    - dataset: HERACOMBCCEM 
                    - dataset: HERACOMBCCEP 
               - experiment: CMS
                 datasets:
                    - dataset: CMSWEASY840PB
                    - dataset: CMSWMASY47FB
         #           - dataset: CMSWCHARMRAT
                    - dataset: CMSDY2D11
                    - dataset: CMSWMU8TEV
                    - dataset: CMSJETS11
                    - dataset: CMSTTBARTOT
                    - dataset: CMSTOPDIFF8TEVTTRAPNORM
               - experiment: LHCb
                 datasets:
                    - dataset: LHCBZ940PB
                    - dataset: LHCBZEE2FB
           - theoryid: 173
             speclabel: $(\xi_F,\xi_R)=(0.5,0.5)$
             experiments:
               - experiment: BCDMS
                 datasets:
                    - dataset: BCDMSP
                    - dataset: BCDMSD
               - experiment: HERACOMB
                 datasets:
                    - dataset: HERACOMBNCEM 
                    - dataset: HERACOMBNCEP460
                    - dataset: HERACOMBNCEP575
                    - dataset: HERACOMBNCEP820
                    - dataset: HERACOMBNCEP920
                    - dataset: HERACOMBCCEM 
                    - dataset: HERACOMBCCEP 
               - experiment: CMS
                 datasets:
                    - dataset: CMSWEASY840PB
                    - dataset: CMSWMASY47FB
       #             - dataset: CMSWCHARMRAT
                    - dataset: CMSDY2D11
                    - dataset: CMSWMU8TEV
                    - dataset: CMSJETS11
                    - dataset: CMSTTBARTOT
                    - dataset: CMSTOPDIFF8TEVTTRAPNORM
               - experiment: LHCb
                 datasets:
                    - dataset: LHCBZ940PB
                    - dataset: LHCBZEE2FB

shiftconfig:

   theoryid: 52

   fit: NNPDF31_nlo_as_0118_1000
   use_cuts: 'fromfit'

   experiments:
       from_: fit

   dataspecs:
       - theoryid: 52
         pdf: NNPDF31_nlo_as_0118_hessian
         speclabel: "NLO"
         fit: NNPDF31_nlo_as_0118_1000

       - theoryid: 53
         pdf: NNPDF31_nnlo_as_0118_hessian
         speclabel: "NNLO"
         fit: NNPDF31_nnlo_as_0118_1000

actions_:
    - shiftconfig plot_matched_datasets_shift_matrix_correlations
    - theoryconfig plot_thcorrmat_heatmap_custom_dataspecs

Zaharid · 2018-10-23T13:41:02Z

Have a look at my matched_cuts PR.

On Tue, Oct 23, 2018 at 2:32 PM RosalynLP ***@***.***> wrote: OK so what I have been trying is the runcard below, and now I am writing a production rule in config.py. I am a little unsure of what to take in. Can I do something like: def produce_dataspecs_matched_between_namespaces(self, shiftconfig, theoryconfig): or should I be passing in something other thanshiftconfigand theoryconfig here? My plan was to do something like load in dataspecs from both shiftconfig and theoryconfig and then call matched_datasets_from_dataspecs and dataspecs_with_matched_cuts on these combined dataspecs. However, I am not sure how to get dataspecs from shiftconfig and theoryconfig as these namespaces both have a number of sub-elements so are not a simple list of things (i.e. they also have information on the fit and experiments and things, they aren't just lists of dataspecs).

This is simply a dictionary. To first approximation, it should be possible to do e.g. theoryconfig['dataspecs'] and so on. I guess roughly the idea is to merge the two lists of datasepcs, apply the two production rules to that list and then split them again.

…

meta: title: Testing combining namespaces for shift matrices author: Rosalyn Pearson keywords: [test, theory uncertainties] experiments: - experiment: BCDMS datasets: - dataset: BCDMSP - dataset: BCDMSD - experiment: HERACOMB datasets: - dataset: HERACOMBNCEM - dataset: HERACOMBNCEP460 - dataset: HERACOMBNCEP575 - dataset: HERACOMBNCEP820 - dataset: HERACOMBNCEP920 - dataset: HERACOMBCCEM - dataset: HERACOMBCCEP - experiment: CMS datasets: - dataset: CMSWEASY840PB - dataset: CMSWMASY47FB # - dataset: CMSWCHARMRAT - dataset: CMSDY2D11 - dataset: CMSWMU8TEV - dataset: CMSJETS11 - dataset: CMSTTBARTOT - dataset: CMSTOPDIFF8TEVTTRAPNORM - experiment: LHCb datasets: - dataset: LHCBZ940PB - dataset: LHCBZEE2FB theoryconfig: theoryids: - 163 - 180 - 173 theoryid: 163 use_cuts: 'fromfit' fit: 180421-lr-nlo-central_global pdf: from_: fit dataspecs: - theoryid: 163 speclabel: $(\xi_F,\xi_R)=(1,1)$ experiments: - experiment: BCDMS datasets: - dataset: BCDMSP - dataset: BCDMSD - experiment: HERACOMB datasets: - dataset: HERACOMBNCEM - dataset: HERACOMBNCEP460 - dataset: HERACOMBNCEP575 - dataset: HERACOMBNCEP820 - dataset: HERACOMBNCEP920 - dataset: HERACOMBCCEM - dataset: HERACOMBCCEP - experiment: CMS datasets: - dataset: CMSWEASY840PB - dataset: CMSWMASY47FB # - dataset: CMSWCHARMRAT - dataset: CMSDY2D11 - dataset: CMSWMU8TEV - dataset: CMSJETS11 - dataset: CMSTTBARTOT - dataset: CMSTOPDIFF8TEVTTRAPNORM - experiment: LHCb datasets: - dataset: LHCBZ940PB - dataset: LHCBZEE2FB - theoryid: 180 speclabel: $(\xi_F,\xi_R)=(2,2)$ experiments: - experiment: BCDMS datasets: - dataset: BCDMSP - dataset: BCDMSD - experiment: HERACOMB datasets: - dataset: HERACOMBNCEM - dataset: HERACOMBNCEP460 - dataset: HERACOMBNCEP575 - dataset: HERACOMBNCEP820 - dataset: HERACOMBNCEP920 - dataset: HERACOMBCCEM - dataset: HERACOMBCCEP - experiment: CMS datasets: - dataset: CMSWEASY840PB - dataset: CMSWMASY47FB # - dataset: CMSWCHARMRAT - dataset: CMSDY2D11 - dataset: CMSWMU8TEV - dataset: CMSJETS11 - dataset: CMSTTBARTOT - dataset: CMSTOPDIFF8TEVTTRAPNORM - experiment: LHCb datasets: - dataset: LHCBZ940PB - dataset: LHCBZEE2FB - theoryid: 173 speclabel: $(\xi_F,\xi_R)=(0.5,0.5)$ experiments: - experiment: BCDMS datasets: - dataset: BCDMSP - dataset: BCDMSD - experiment: HERACOMB datasets: - dataset: HERACOMBNCEM - dataset: HERACOMBNCEP460 - dataset: HERACOMBNCEP575 - dataset: HERACOMBNCEP820 - dataset: HERACOMBNCEP920 - dataset: HERACOMBCCEM - dataset: HERACOMBCCEP - experiment: CMS datasets: - dataset: CMSWEASY840PB - dataset: CMSWMASY47FB # - dataset: CMSWCHARMRAT - dataset: CMSDY2D11 - dataset: CMSWMU8TEV - dataset: CMSJETS11 - dataset: CMSTTBARTOT - dataset: CMSTOPDIFF8TEVTTRAPNORM - experiment: LHCb datasets: - dataset: LHCBZ940PB - dataset: LHCBZEE2FB shiftconfig: theoryid: 52 fit: NNPDF31_nlo_as_0118_1000 use_cuts: 'fromfit' experiments: from_: fit dataspecs: - theoryid: 52 pdf: NNPDF31_nlo_as_0118_hessian speclabel: "NLO" fit: NNPDF31_nlo_as_0118_1000 - theoryid: 53 pdf: NNPDF31_nnlo_as_0118_hessian speclabel: "NNLO" fit: NNPDF31_nnlo_as_0118_1000 actions_: - shiftconfig plot_matched_datasets_shift_matrix_correlations - theoryconfig plot_thcorrmat_heatmap_custom_dataspecs — You are receiving this because your review was requested. Reply to this email directly, view it on GitHub <#314 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFabUocETj69HqANvMkaynQ0R4Q4OQiaks5unxp9gaJpZM4XzONq> .

RosalynLP · 2018-10-23T14:09:30Z

This PR? https://github.com/NNPDF/nnpdf/pull/303/files

Zaharid · 2018-10-23T14:17:55Z

#304

…

On Tue, Oct 23, 2018 at 3:11 PM RosalynLP ***@***.***> wrote: This PR? https://github.com/NNPDF/nnpdf/pull/303/files — You are receiving this because your review was requested. Reply to this email directly, view it on GitHub, or mute the thread.

RosalynLP · 2018-10-23T15:01:37Z

We have the functions produce_matched_datasets_from_dataspecs and produce_dataspecs_with_matched_cuts but I can't see anywhere where matched_datasets_from_dataspecs and dataspecs_with_matched_cuts themselves are defined. Presumably they are the output of the above functions but how does validphys know this?

So far I am attempting to do


   def produce_dataspecs_matched_between_namespaces(self, 
                                                    shiftconfig, 
                                                    theoryconfig, 
                                                    produce_matched_datasets_from_dataspecs,
                                                    produce_dataspecs_with_matched_cuts):
       total_dataspecs = theoryconfig['dataspecs'] + shiftconfig['dataspecs']
       matched_datasets = produce_matched_datasets_from_dataspecs(total_dataspecs)
       matched_datasets_with_cuts = produce_dataspecs_with_matched_cuts(matched_datasets)
       theoryconfig['dataspecs'] = matched_datasets_with_cuts
       shiftconfig['dataspecs'] = matched_datasets_with_cuts
       return theoryconfig, shiftconfig

(will this '+' operation work?). Does this look OK? I am not really sure how to reassign theoryconfig and shiftconfig - presumably just returning them like this doesn't really work? I then also don't know how to make validphys run this production rule in order to try and test it... do I have to somehow use it in theorycovariance.py?

Zaharid · 2018-10-23T17:40:12Z

Have a look here

https://data.nnpdf.science/validphys-docs/guide.html#configuration

the quality of the writing is not optimal but it should be ok to see how these things work. Indeed things like dataspecs_with_matched_cuts are defined by the corresponding production rule.

As a general comment, start from something small that works and is simple and evolve it to something that works and also does what you want. Always divide the work in pieces that you can test and reason about individually (and print things and fire up a terminal). An example of something small that works is a function that takes some parameters and does nothing with them, which can be useful to see how they get computed by vp.

For example vp wouldn't give you parameters called produce_XXX but rather the corresponding XXX.
But we don't want to do any of that here. You take just the two mappings that you start with, and then call the production rules on the total dataspecs (the + operator is fine, you can start by printing the result of it).
Now the production rules are normal methods of the Config class, which return a value that can be useful by itself, e.g. inside some other production rule (in addition reportengine calls production rules when asked to resolve specific names). See e.g.

nnpdf/validphys2/src/validphys/config.py

Line 314 in 9da35f3

full_ds = self.produce_dataset(

In this case the result we want is a list of namespaces that are the result of calling the two rules (one after the next) on the total_dataspecs list you have. The final result we want is a namespace (i.e. a dictionary) with the corresponding reshuffled lists of dataspecs, e.g

return {'shiftconfig': ChainMap({'dataspecs': new_shift_specs}, shiftconfig),  'theoryconfig': ChainMap({'dataspecs': new_th_specs}, theoryconfig)}

RosalynLP · 2018-10-24T15:44:33Z

Hi @Zaharid, @wilsonmr and I have been working on this today - it's currently a bit messy but the stage we are at is we think we have replaced the shiftconfig and theoryconfig datasets with matched datasets. We're now looking into testing this on the functions in theorycovariance.py but have got a bit confused over the sequence of collects which is necessary. For example before we had things like


matched_dataspecs_experiment_name = collect(
    'experiment_name', ['matched_datasets_from_dataspecs'])
matched_dataspecs_dataset_name = collect(
    'dataset_name', ['matched_datasets_from_dataspecs'])

matched_cuts_datasets = collect('dataset', ['dataspecs_with_matched_cuts'])
all_matched_datasets = collect('matched_cuts_datasets',
                               ['matched_datasets_from_dataspecs'])

but we no longer need to collect over matched_cuts etc. like this because the cuts should all be matched from the outset. Ideally we want to collect over dataspecs and then try something like

   {@with combined_shift_and_theory_dataspecs@}

   {@shiftconfig plot_matched_datasets_shift_matrix_correlations@}

   {@theoryconfig plot_thcorrmat_heatmap_custom_dataspecs@}

   {@endwith@}

but this doesn't seem to like to run (crashing) perhaps because of multiple uses of dataspecs_with_matched_cuts appearing due to the lines above?

Zaharid · 2018-10-24T16:55:35Z

@RosalynLP ISTM that the production rule you have now mixes the indexes of the datasets and of the input dataspecs. I think it should look more like this:

    def produce_combined_shift_and_theory_dataspecs(self, theoryconfig, shiftconfig):
        total_dataspecs = theoryconfig["dataspecs"] + shiftconfig["dataspecs"]
        matched_datasets = self.produce_matched_datasets_from_dataspecs(total_dataspecs)
        final_res = [
            self.produce_dataspecs_with_matched_cuts(ds["dataspecs"])
            for ds in matched_datasets
        ]
        new_theoryconfig = []
        new_shiftconfig = []
        len_th = len(theoryconfig['dataspecs'])
        for s in final_res:
            new_theoryconfig.append({"dataspecs": s[:len_th]})
            new_shiftconfig.append({"dataspecs": s[len_th:]})
        return {
            "shiftconfig": ChainMap({"dataspecs": new_shiftconfig}, shiftconfig),
            "theoryconfig": ChainMap({"dataspecs": new_theoryconfig}, theoryconfig),
        }

Then, given the runcard:

experiments:
    from_: fit

theory:
    from_: fit

theoryid:
    from_: theory

use_cuts: "fromfit"

shiftconfig:
    dataspecs:
        - fit: NNPDF31_nlo_as_0118_1000
        - fit: NNPDF31_nnlo_as_0118_1000


theoryconfig:
    dataspecs:
        - fit: 180421-lr-nlo-xF2dot0xR2dot0_global
        - fit: NNPDF31_nnlo_as_0118_uncorr_s3
        - fit: NNPDF31_nnlo_as_0118_1000


actions_:
    - combined_shift_and_theory_dataspecs

We have

In [1]: len(new_shiftconfig)                                                                                          
Out[1]: 45

In [2]: len(new_theoryconfig)                                                                                         
Out[2]: 45

In [3]: set(len(x['dataspecs']) for x in new_theoryconfig)                                                            
Out[3]: {3}

In [4]: set(len(x['dataspecs']) for x in new_shiftconfig)                                                             
Out[4]: {2}

Zaharid · 2018-10-24T16:57:35Z

Probably we can make shiftconfig and theoryconfig lists of namespaces themselves and remove the inner dataspecs in each of the input.

Zaharid · 2018-10-25T09:34:56Z

@RosalynLP @wilsonmr Have you had a chance to test this?

RosalynLP · 2018-10-25T10:17:23Z

@Zaharid just making the changes now

RosalynLP · 2018-10-25T10:36:53Z

@Zaharid I'm not sure whether I interpreted what you said right but I have modified the action and it runs fine for


actions_:
   - combined_shift_and_theory_dataspecs

in the runcard, but when I change this to

template_text: |
   {@with combined_shift_and_theory_dataspecs@}

   {@shiftconfig@}

   {@theoryconfig@}

   {@endwith@}

actions_:
   - report(main=True)

it seems to freeze the computer and continue to run for some indefinite length of time.

I can try leaving it to see what happens/if an error message crops up?

wilsonmr · 2018-10-25T10:42:22Z

I'm also confused, why would we want

in[1]: len(new_shiftconfig)                                                                                          
Out[1]: 45

The new_shiftconfig should be a list of 2 dataspecs, which each in turn are dicts of whatever length no?

Zaharid · 2018-10-25T10:59:54Z

@wilsonmr I might be missing something here, but I'd say we want one namespace for each dataset where each of these namespaces contains the various ways to compute the dataset (e.g. with settings from different fits). This allows you to build an analog to all_matchec_results.

Zaharid · 2018-10-25T11:06:10Z

@RosalynLP The problem is that you are trying to print a deeply recursive complicated object. If you run with --debug and cancel with Control + C you can see that it is getting confused printing the thing (which surely you saw also when trying to print with IPython?)

Interrupted by user. Exiting.
[DEBUG]: The traceback of the exception below is:
Traceback (most recent call last):
  File "/home/zah/phd/reportengine/src/reportengine/app.py", line 374, in main
    self.run()
  File "/home/zah/nngit/nnpdf/validphys2/src/validphys/app.py", line 140, in run
    super().run()
  File "/home/zah/phd/reportengine/src/reportengine/app.py", line 359, in run
    rb.execute_sequential()
  File "/home/zah/phd/reportengine/src/reportengine/resourcebuilder.py", line 163, in execute_sequential
    result = callspec.function(self.rootns, callspec.nsspec)
  File "/home/zah/phd/reportengine/src/reportengine/report.py", line 426, in __call__
    expand_fuzzyspec=namespaces.expand_fuzzyspec,
  File "/home/zah/anaconda3/lib/python3.7/site-packages/jinja2/asyncsupport.py", line 76, in render
    return original_render(self, *args, **kwargs)
  File "/home/zah/anaconda3/lib/python3.7/site-packages/jinja2/environment.py", line 1005, in render
    return concat(self.root_render_func(self.new_context(vars)))
  File "<template>", line 17, in root
  File "/home/zah/anaconda3/lib/python3.7/site-packages/jinja2/runtime.py", line 262, in call
    return __obj(*args, **kwargs)
  File "/home/zah/phd/reportengine/src/reportengine/report.py", line 422, in format_collect_fuzzyspec
    return as_markdown(res)
  File "/home/zah/phd/reportengine/src/reportengine/report.py", line 398, in as_markdown
    return '\n'.join(as_markdown(elem) for elem in obj)
  File "/home/zah/phd/reportengine/src/reportengine/report.py", line 398, in <genexpr>
    return '\n'.join(as_markdown(elem) for elem in obj)
  File "/home/zah/phd/reportengine/src/reportengine/report.py", line 406, in as_markdown
    return str(obj)
  File "/home/zah/anaconda3/lib/python3.7/reprlib.py", line 21, in wrapper
    result = user_function(self)
  File "/home/zah/anaconda3/lib/python3.7/collections/__init__.py", line 937, in __repr__
    self, ', '.join(map(repr, self.maps)))
  File "/home/zah/anaconda3/lib/python3.7/reprlib.py", line 21, in wrapper
    result = user_function(self)
  File "/home/zah/anaconda3/lib/python3.7/collections/__init__.py", line 937, in __repr__
    self, ', '.join(map(repr, self.maps)))
  File "/home/zah/anaconda3/lib/python3.7/reprlib.py", line 21, in wrapper
    result = user_function(self)
  File "/home/zah/anaconda3/lib/python3.7/collections/__init__.py", line 937, in __repr__
    self, ', '.join(map(repr, self.maps)))
  File "/home/zah/anaconda3/lib/python3.7/reprlib.py", line 21, in wrapper
    result = user_function(self)
  File "/home/zah/anaconda3/lib/python3.7/collections/__init__.py", line 937, in __repr__
    self, ', '.join(map(repr, self.maps)))
  File "/home/zah/anaconda3/lib/python3.7/reprlib.py", line 21, in wrapper
    result = user_function(self)
  File "/home/zah/anaconda3/lib/python3.7/collections/__init__.py", line 937, in __repr__
    self, ', '.join(map(repr, self.maps)))
  File "/home/zah/anaconda3/lib/python3.7/reprlib.py", line 21, in wrapper
    result = user_function(self)
  File "/home/zah/anaconda3/lib/python3.7/collections/__init__.py", line 937, in __repr__
    self, ', '.join(map(repr, self.maps)))
  File "/home/zah/anaconda3/lib/python3.7/reprlib.py", line 21, in wrapper
    result = user_function(self)
  File "/home/zah/anaconda3/lib/python3.7/collections/__init__.py", line 937, in __repr__
    self, ', '.join(map(repr, self.maps)))
  File "/home/zah/anaconda3/lib/python3.7/collections/__init__.py", line 1074, in __repr__
    def __repr__(self): return repr(self.data)
  File "/home/zah/nngit/nnpdf/validphys2/src/validphys/core.py", line 68, in __repr__
    self.comp_tuple))
  File "/home/zah/nngit/nnpdf/validphys2/src/validphys/core.py", line 67, in <genexpr>
    argvals = ', '.join('%s=%r'%vals for vals in zip(self.argnames(),
  File "/home/zah/nngit/nnpdf/validphys2/src/validphys/core.py", line 68, in __repr__
    self.comp_tuple))
  File "/home/zah/nngit/nnpdf/validphys2/src/validphys/core.py", line 67, in <genexpr>
    argvals = ', '.join('%s=%r'%vals for vals in zip(self.argnames(),
  File "/home/zah/nngit/nnpdf/validphys2/src/validphys/core.py", line 67, in __repr__
    argvals = ', '.join('%s=%r'%vals for vals in zip(self.argnames(),
  File "/home/zah/nngit/nnpdf/validphys2/src/validphys/core.py", line 55, in argnames
    return list(inspect.signature(cls.__init__).parameters.keys())[1:]
  File "/home/zah/anaconda3/lib/python3.7/inspect.py", line 3070, in signature
    return Signature.from_callable(obj, follow_wrapped=follow_wrapped)
  File "/home/zah/anaconda3/lib/python3.7/inspect.py", line 2820, in from_callable
    follow_wrapper_chains=follow_wrapped)
  File "/home/zah/anaconda3/lib/python3.7/inspect.py", line 2219, in _signature_from_callable
    obj = unwrap(obj, stop=(lambda f: hasattr(f, "__signature__")))
  File "/home/zah/anaconda3/lib/python3.7/inspect.py", line 510, in unwrap
    recursion_limit = sys.getrecursionlimit()
KeyboardInterrupt

wilsonmr · 2018-10-25T11:07:24Z

That was my bad, I tried to print those things. Ok I misinterpreted what the namespaces should be, I think it won't be too difficult to modify what we have to give the correct thing

Zaharid · 2018-10-25T11:08:08Z

That I certainly agree with!

RosalynLP · 2018-10-25T13:14:00Z

What is it that we were trying to print, sorry? I'm confused

Zaharid · 2018-10-25T16:23:07Z

{@shiftconfig@} ends up being a complicated namespace containing various things that contain themselves. So the repr is getting confused.

Zaharid · 2018-10-25T19:06:23Z

Any progress with this? As expected, the first thing I tried didn't work, but the second did. The only nontrivial thing is that there must be ways to recover the old values (what @wilsonmr was asking this morning), so e.g. the values of dataset_name and experiment_name are preserved where appropriate and we have a way to refer to the original configuartion for e.g. seeing which theories we started with.

diff --git a/validphys2/src/validphys/config.py b/validphys2/src/validphys/config.py
index 1f87f35..ff6ab78 100644
--- a/validphys2/src/validphys/config.py
+++ b/validphys2/src/validphys/config.py
@@ -569,19 +569,18 @@ class CoreConfig(configparser.Config):
     def produce_combined_shift_and_theory_dataspecs(self, theoryconfig, shiftconfig):
         total_dataspecs = theoryconfig["dataspecs"] + shiftconfig["dataspecs"]
         matched_datasets = self.produce_matched_datasets_from_dataspecs(total_dataspecs)
-        final_res = [
-            self.produce_dataspecs_with_matched_cuts(ds["dataspecs"])
-            for ds in matched_datasets
-        ]
+        for ns in matched_datasets:
+            ns["dataspecs"] = self.produce_dataspecs_with_matched_cuts(ns["dataspecs"])
         new_theoryconfig = []
         new_shiftconfig = []
         len_th = len(theoryconfig['dataspecs'])
-        for s in final_res:
-            new_theoryconfig.append({"dataspecs": s[:len_th]})
-            new_shiftconfig.append({"dataspecs": s[len_th:]})
+        for s in matched_datasets:
+            new_theoryconfig.append(ChainMap({"dataspecs": s['dataspecs'][:len_th]}, s))
+            new_shiftconfig.append(ChainMap({"dataspecs": s['dataspecs'][len_th:]},s))
+
         return {
-            "shiftconfig": ChainMap({"dataspecs": new_shiftconfig}, shiftconfig),
-            "theoryconfig": ChainMap({"dataspecs": new_theoryconfig}, theoryconfig),
+            "shiftconfig": {"dataspecs": new_shiftconfig, 'original': shiftconfig},
+            "theoryconfig": {"dataspecs": new_theoryconfig, 'original': theoryconfig},
         }
 
 
diff --git a/validphys2/src/validphys/theorycovariance.py b/validphys2/src/validphys/theorycovariance.py
index a7c91f8..7f814e5 100644
--- a/validphys2/src/validphys/theorycovariance.py
+++ b/validphys2/src/validphys/theorycovariance.py
@@ -1,4 +1,3 @@
-# -*- coding: utf-8 -*-
 """
 theorycovariance.py
 Tools for constructing and studying theory covariance matrices.
@@ -813,7 +812,7 @@ def plot_datasets_chi2_theory(experiments,
     return fig
 
 
-matched_dataspecs_results = collect('results', ['dataspecs_with_matched_cuts'])
+matched_dataspecs_results = collect('results', ['dataspecs'])
 
 LabeledShifts = namedtuple('LabeledShifts',
     ('experiment_name', 'dataset_name', 'shifts'))
@@ -832,7 +831,7 @@ def dataspecs_dataset_prediction_shift(matched_dataspecs_results, experiment_nam
                          experiment_name=experiment_name, shifts=res)
 
 matched_dataspecs_dataset_prediction_shift = collect(
-    'dataspecs_dataset_prediction_shift', ['matched_datasets_from_dataspecs'])
+    'dataspecs_dataset_prediction_shift', ['dataspecs'])
 
 
 #Not sure we want to export this, as it is 231 Mb...
@@ -886,13 +885,17 @@ def plot_matched_datasets_shift_matrix_correlations(
     return plot_corrmat_heatmap(
         corrmat, "Shift outer product normalized (correlation) matrix")
 
+
+
+
 all_matched_results = collect('matched_dataspecs_results',
-                              ['matched_datasets_from_dataspecs'])
+                              ['dataspecs'])
 
 def combine_by_type_dataspecs(process_lookup, all_matched_results, matched_dataspecs_dataset_name):
     return combine_by_type(process_lookup, all_matched_results, matched_dataspecs_dataset_name)
 
-datapsecs_theoryids = collect('theoryid', ['dataspecs'])
+datapsecs_theoryids = collect('theoryid', ['theoryconfig', 'original',
+    'dataspecs'])
 
 def process_starting_points_dataspecs(combine_by_type_dataspecs):
     return process_starting_points(combine_by_type_dataspecs)
@@ -915,12 +918,12 @@ def covmap_dataspecs(combine_by_type_dataspecs, matched_dataspecs_dataset_name):
     return covmap(combine_by_type_dataspecs, matched_dataspecs_dataset_name)
 
 matched_dataspecs_experiment_name = collect(
-    'experiment_name', ['matched_datasets_from_dataspecs'])
+    'experiment_name', ['dataspecs'])
 matched_dataspecs_dataset_name = collect(
-    'dataset_name', ['matched_datasets_from_dataspecs'])
-matched_cuts_datasets = collect('dataset', ['dataspecs_with_matched_cuts'])
+    'dataset_name', ['dataspecs'])
+matched_cuts_datasets = collect('dataset', ['dataspecs'])
 all_matched_datasets = collect('matched_cuts_datasets',
-                               ['matched_datasets_from_dataspecs'])
+                               ['dataspecs'])
 
 
 def all_matched_data_lengths(all_matched_datasets):
@@ -961,7 +964,17 @@ def theory_covmat_custom_dataspecs(covs_pt_prescrip_dataspecs, covmap_dataspecs,
     return theory_covmat_custom(covs_pt_prescrip_dataspecs, covmap_dataspecs,
                                 matched_experiments_index)
 
-@table
+thx_corrmat = collect('theory_corrmat_custom_dataspecs', ['combined_shift_and_theory_dataspecs','theoryconfig'])
+shx_corrmat = collect('matched_datasets_shift_matrix', ['combined_shift_and_theory_dataspecs','shiftconfig'])
+
+@figure
+def foo(thx_corrmat, shx_corrmat):
+    fig,ax = plt.subplots()
+    ax.matshow(thx_corrmat[0]*shx_corrmat[0])
+    return fig
+
+
+#@table
 def theory_corrmat_custom_dataspecs(theory_covmat_custom_dataspecs):
     """Calculates the theory correlation matrix for scale variations
     with variations by process type"""

With that I seem to be able to get a plot with seemingly right dimensions:
https://vp.nnpdf.science/0-fOf9UoTuWaSnRnCp7Aqw==/

RosalynLP · 2018-10-25T19:58:26Z

Ah I see, this will make things easier - I had been trying to implement various collect functions to calculate the theory matrix but with limited success. I'll see if I can reproduce some working results using the full dataspecs

RosalynLP · 2018-10-25T20:30:28Z

I tried plotting the ratio of the shift to theory covmats:

https://vp.nnpdf.science/8LNePCv8T9O5ev8zwV5Mrw==/

The plot looks a bit weird in that a lot of the elements are 0 which shouldn't be the case seeing as both the correlation matrices should be composed purely of elements with values plus or minus 1. I'll try and work out what's going on.

RosalynLP · 2018-10-25T20:39:21Z

This makes things a little clearer: https://vp.nnpdf.science/FYLkXOsbS-SRMsdzho-HOw==/

If you look at the shift correlation matrix here you can see that it looks different to before with a lot of 0 entries, translating to zero entries in the ratio plot. Something must be going wrong in the generation of the shift matrix. The theory matrix looks ok at a glance though.

Zaharid · 2018-11-05T10:59:47Z

So there are several things I don't like about this. I'll list a several but not all:

The tests do not pass for obvious reasons.
The process_lookup thing shouldn't be a provider but rather a normal function that gets called inside other providers. It should be dataset -> lookup string. As it is, it introduces some ugly dependencies.
New stuff does not have docstrings in general.
The previous interface was better in that it ensured that the dataspecs come from matched cuts and so the things had always the right dimensions. We should see how to have something similar here.

… theory covariance matrix

merge

…ctors

…tion

continuing from covmat_cuts branch

6f423bc

RosalynLP requested review from Zaharid and voisey October 22, 2018 11:33

voisey reviewed Oct 22, 2018

View reviewed changes

changing 2 to dataspecs

bae8d6c

voisey reviewed Oct 22, 2018

View reviewed changes

Fix dataset_names for daataspecs collection

03ed276

This was out of sync because we were zipping the results with the datasets, as ordered in the original experiments, rather than the datasets as ordered by matched_datasets_from_dataspecs.

updating produce_combined_shifts function

82ac339

RosalynLP added 3 commits October 25, 2018 21:39

adding functions for shift theory ratio plot

43bcb72

fixing shift correlation matrix plot

1594700

adding functions to compare elements in matrices

29ee758

Zaharid and others added 6 commits November 5, 2018 11:02

Merge branch 'master' into covmat_cuts2

63cc439

adding function to produce eigenvalues and eigenvectors of normalised…

ce1a572

… theory covariance matrix

Merge branch 'covmat_cuts2' of github.com:NNPDF/nnpdf into covmat_cuts2

41a9bc9

merge

working on action to find fmiss

64756e3

correcting definition of projection operator in theory_covmat_eigenve…

becdd60

…ctors

fix bug: extracting eigenvectors in the correct way from la.eigh func…

c1e157f

…tion

RosalynLP mentioned this pull request Nov 12, 2018

Matched cuts shift plot #331

Closed

RosalynLP closed this Nov 12, 2018

		return combine_by_type(process_lookup, all_matched_results, dataset_names)

		datapsecs_theoryids = collect('theoryid', ['dataspecs'])

[WIP]: theory shift covmat comparison - updated #314

[WIP]: theory shift covmat comparison - updated #314

Conversation

RosalynLP commented Oct 22, 2018

voisey Oct 22, 2018

Choose a reason for hiding this comment

voisey Oct 22, 2018

Choose a reason for hiding this comment

Zaharid Oct 22, 2018

Choose a reason for hiding this comment

Zaharid commented Oct 22, 2018

RosalynLP commented Oct 22, 2018

RosalynLP commented Oct 22, 2018

Zaharid commented Oct 22, 2018

voisey Oct 22, 2018

Choose a reason for hiding this comment

RosalynLP Oct 22, 2018

Choose a reason for hiding this comment

RosalynLP commented Oct 22, 2018

RosalynLP commented Oct 22, 2018

RosalynLP commented Oct 22, 2018

Zaharid commented Oct 22, 2018

RosalynLP commented Oct 22, 2018

Zaharid commented Oct 22, 2018

Zaharid commented Oct 22, 2018

Zaharid commented Oct 22, 2018 via email

RosalynLP commented Oct 22, 2018

RosalynLP commented Oct 22, 2018

RosalynLP commented Oct 22, 2018

RosalynLP commented Oct 22, 2018

Zaharid commented Oct 23, 2018

RosalynLP commented Oct 23, 2018

Zaharid commented Oct 23, 2018 via email

RosalynLP commented Oct 23, 2018

Zaharid commented Oct 23, 2018 via email

RosalynLP commented Oct 23, 2018

Zaharid commented Oct 23, 2018

RosalynLP commented Oct 24, 2018

Zaharid commented Oct 24, 2018 • edited Loading

Zaharid commented Oct 24, 2018

Zaharid commented Oct 25, 2018

RosalynLP commented Oct 25, 2018

RosalynLP commented Oct 25, 2018

wilsonmr commented Oct 25, 2018

Zaharid commented Oct 25, 2018

Zaharid commented Oct 25, 2018 • edited Loading

wilsonmr commented Oct 25, 2018

Zaharid commented Oct 25, 2018

RosalynLP commented Oct 25, 2018

Zaharid commented Oct 25, 2018

Zaharid commented Oct 25, 2018

RosalynLP commented Oct 25, 2018

RosalynLP commented Oct 25, 2018

RosalynLP commented Oct 25, 2018

Zaharid commented Nov 5, 2018

Zaharid commented Oct 24, 2018 •

edited

Loading

Zaharid commented Oct 25, 2018 •

edited

Loading