Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP]: theory shift covmat comparison - updated #314

Closed
wants to merge 16 commits into from
Closed

Conversation

RosalynLP
Copy link
Contributor

Cleaner version carrying on from #309

https://vp.nnpdf.science/bQCke9RZT4-eC4A7szF80g==/figures/plot_thcorrmat_heatmap_custom2.png
This looks a bit weird to me - first it is missing FT DY relative to https://vp.nnpdf.science/NlltmlyWRRqCtSeJbi1xIQ==/, and second the correlations between different experiments look different to what we were previously seeing - I don't see why the upper left block of experiments should be so strongly correlated as they are just presented alphabetically - it seems as if maybe the labelling has been done wrong - any ideas of the cause of this? I am trying to debug but haven't spotted the issue yet.

@RosalynLP RosalynLP requested review from Zaharid and voisey October 22, 2018 11:33
return combine_by_type(process_lookup, all_matched_results, dataset_names)

datapsecs_theoryids = collect('theoryid', ['dataspecs'])

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dataspecs_theoryids rather than datapsecs_theoryids?

all_matched_results = collect('matched_dataspecs_results',
['matched_datasets_from_dataspecs'])

def combine_by_type2(process_lookup, all_matched_results, dataset_names):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have the time could you please add descriptions to the functions? In particular, could you explain what the difference is between <function_name> and <function_name>2?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These things shouldn't be called _2 in the first places. Should be _dataspecs or something similar.

@Zaharid
Copy link
Contributor

Zaharid commented Oct 22, 2018

@RosalynLP Can we see the same covmat using the original functions? I agree this looks suspicious.

@RosalynLP
Copy link
Contributor Author

@RosalynLP
Copy link
Contributor Author

I'm going to try running it with just BCDMS and HERACOMB to see if that makes things any clearer

@Zaharid
Copy link
Contributor

Zaharid commented Oct 22, 2018

Also note that a potentially more elegant way to implement this is with NNPDF/reportengine#63. But let's get something working first, even if it's ugly.

return covs_pt_prescrip(combine_by_type_dataspecs, process_starting_points_dataspecs,
datapsecs_theoryids, fivetheories)

def covmap2(combine_by_type_dataspecs, dataset_names):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this '2' be replaced with 'dataspecs' too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it should, I haven't pushed that change yet :)

@RosalynLP
Copy link
Contributor Author

matched_cuts_debug.pdf
This is a comparison of the "old" (point prescriptions as previously implemented) and "new" (implementing using matched cuts and all the dataspecs functions) theory correlation matrices, just for BCDMS versus HERACOMB. You can see they're basically quite similar but if you look closely there are some subtle differences - it appears as if HERACOMB has less stringent cuts in the new case but for both I am using

fit: 180421-lr-nlo-central_global
use_cuts: "internal"
q2min: 3
w2min: 5

I might try adding in a few more datasets to see if it makes it clearer what's going on

@RosalynLP
Copy link
Contributor Author

matched_cuts_debug.pdf
This might provide some more insight - you can see that something funny is going on. Look at the correlations between HERACOMB and BCDMS and now they are very weak. The correlations within HERACOMB itself have also appeared as a weird block diagonal form. You can also see what appears to be a strong correlation of LHCb with parts of HERACOMB. However, the rough sizes of each dataset seem OK which makes me think that perhaps the labelling is "OK", but something is going wrong at the construction of the matrix stage? Any thoughts?

@RosalynLP
Copy link
Contributor Author

It seems like maybe the "same process type" matrix construction is being applied in some random regions which don't correspond to full datasets?

@Zaharid
Copy link
Contributor

Zaharid commented Oct 22, 2018

Can you please upload these things with the relevant runcards?

@RosalynLP
Copy link
Contributor Author

@Zaharid
Copy link
Contributor

Zaharid commented Oct 22, 2018

I think I made a stupid mistake with dataset_names. It should also use the per-datsepc version, which is different. I hadn't realized that at some point we have:

    for dataset, name in zip(each_dataset_results_bytheory, dataset_names):

in combine_by_type, so these things need to match properly. Should work better by using matched_dataspecs_dataset_name instead.

This was out of sync because we were zipping the results with the
datasets, as ordered in the original experiments, rather than the
datasets as ordered by matched_datasets_from_dataspecs.
@Zaharid
Copy link
Contributor

Zaharid commented Oct 22, 2018

It looks more reasonable now:

https://vp.nnpdf.science/OarGiaUTQ2yNMxKxnFDRMQ==

@Zaharid
Copy link
Contributor

Zaharid commented Oct 22, 2018 via email

@RosalynLP
Copy link
Contributor Author

Aha excellent, good point this looks far better

@RosalynLP
Copy link
Contributor Author

OK I will change the cuts back, I altered it to match yours just for debugging purposes because that also introduced a difference.

@RosalynLP
Copy link
Contributor Author

@RosalynLP
Copy link
Contributor Author

Compared to https://vp.nnpdf.science/NlltmlyWRRqCtSeJbi1xIQ==/figures/plot_matched_datasets_shift_marix_correlations.png we can see that there is some similarity in the patterns at least. The next step is to create functions which take in both of these and compare the mutual parts, right? Presumably this means using another function a bit like matched cuts to find the mutual datasets with matching ccuts in each one and only use those elements? Or would it make sense if the matched cuts are computed separately using information from the 3 different fits, and then these are applied to both functions?

@Zaharid
Copy link
Contributor

Zaharid commented Oct 23, 2018

I would do a production rule that takes 'shiftconfig' and 'theoryconfig' and returns a mapping containing 'shidtconfig' and 'theoryconfig' where each has the initial datasepcs and returns a mapping with 'shiftconfig' and 'theoryconfig' again, but with matched dataspecs, with the same data and the same matched cuts. Probably that production rule can use the existing ones (matched_datasets_from_datasepcs and dataspecs_with_matched_cuts).

@RosalynLP
Copy link
Contributor Author

OK so what I have been trying is the runcard below, and now I am writing a production rule in config.py. I am a little unsure of what to take in. Can I do something like:

def produce_dataspecs_matched_between_namespaces(self, shiftconfig, theoryconfig):

or should I be passing in something other thanshiftconfigand theoryconfig here? My plan was to do something like load in dataspecs from both shiftconfig and theoryconfig and then call matched_datasets_from_dataspecs and dataspecs_with_matched_cuts on these combined dataspecs. However, I am not sure how to get dataspecs from shiftconfig and theoryconfig as these namespaces both have a number of sub-elements so are not a simple list of things (i.e. they also have information on the fit and experiments and things, they aren't just lists of dataspecs).


meta:
    title: Testing combining namespaces for shift matrices
    author: Rosalyn Pearson
    keywords: [test, theory uncertainties]

experiments:
  - experiment: BCDMS
    datasets:
      - dataset: BCDMSP
      - dataset: BCDMSD
  - experiment: HERACOMB
    datasets:
      - dataset: HERACOMBNCEM 
      - dataset: HERACOMBNCEP460
      - dataset: HERACOMBNCEP575
      - dataset: HERACOMBNCEP820
      - dataset: HERACOMBNCEP920
      - dataset: HERACOMBCCEM 
      - dataset: HERACOMBCCEP 
  - experiment: CMS
    datasets:
      - dataset: CMSWEASY840PB
      - dataset: CMSWMASY47FB
 #     - dataset: CMSWCHARMRAT
      - dataset: CMSDY2D11
      - dataset: CMSWMU8TEV
      - dataset: CMSJETS11
      - dataset: CMSTTBARTOT
      - dataset: CMSTOPDIFF8TEVTTRAPNORM
  - experiment: LHCb
    datasets:
      - dataset: LHCBZ940PB
      - dataset: LHCBZEE2FB

theoryconfig:

   theoryids:
      - 163
      - 180
      - 173

   theoryid: 163

   use_cuts: 'fromfit'

   fit: 180421-lr-nlo-central_global

   pdf: 
      from_: fit

   dataspecs:
           - theoryid: 163
             speclabel: $(\xi_F,\xi_R)=(1,1)$
             experiments:
               - experiment: BCDMS
                 datasets:
                    - dataset: BCDMSP
                    - dataset: BCDMSD
               - experiment: HERACOMB
                 datasets:
                    - dataset: HERACOMBNCEM 
                    - dataset: HERACOMBNCEP460
                    - dataset: HERACOMBNCEP575
                    - dataset: HERACOMBNCEP820
                    - dataset: HERACOMBNCEP920
                    - dataset: HERACOMBCCEM 
                    - dataset: HERACOMBCCEP 
               - experiment: CMS
                 datasets:
                    - dataset: CMSWEASY840PB
                    - dataset: CMSWMASY47FB
       #             - dataset: CMSWCHARMRAT
                    - dataset: CMSDY2D11
                    - dataset: CMSWMU8TEV
                    - dataset: CMSJETS11
                    - dataset: CMSTTBARTOT
                    - dataset: CMSTOPDIFF8TEVTTRAPNORM
               - experiment: LHCb
                 datasets:
                    - dataset: LHCBZ940PB
                    - dataset: LHCBZEE2FB
           - theoryid: 180
             speclabel: $(\xi_F,\xi_R)=(2,2)$
             experiments:
               - experiment: BCDMS
                 datasets:
                    - dataset: BCDMSP
                    - dataset: BCDMSD
               - experiment: HERACOMB
                 datasets:
                    - dataset: HERACOMBNCEM 
                    - dataset: HERACOMBNCEP460
                    - dataset: HERACOMBNCEP575
                    - dataset: HERACOMBNCEP820
                    - dataset: HERACOMBNCEP920
                    - dataset: HERACOMBCCEM 
                    - dataset: HERACOMBCCEP 
               - experiment: CMS
                 datasets:
                    - dataset: CMSWEASY840PB
                    - dataset: CMSWMASY47FB
         #           - dataset: CMSWCHARMRAT
                    - dataset: CMSDY2D11
                    - dataset: CMSWMU8TEV
                    - dataset: CMSJETS11
                    - dataset: CMSTTBARTOT
                    - dataset: CMSTOPDIFF8TEVTTRAPNORM
               - experiment: LHCb
                 datasets:
                    - dataset: LHCBZ940PB
                    - dataset: LHCBZEE2FB
           - theoryid: 173
             speclabel: $(\xi_F,\xi_R)=(0.5,0.5)$
             experiments:
               - experiment: BCDMS
                 datasets:
                    - dataset: BCDMSP
                    - dataset: BCDMSD
               - experiment: HERACOMB
                 datasets:
                    - dataset: HERACOMBNCEM 
                    - dataset: HERACOMBNCEP460
                    - dataset: HERACOMBNCEP575
                    - dataset: HERACOMBNCEP820
                    - dataset: HERACOMBNCEP920
                    - dataset: HERACOMBCCEM 
                    - dataset: HERACOMBCCEP 
               - experiment: CMS
                 datasets:
                    - dataset: CMSWEASY840PB
                    - dataset: CMSWMASY47FB
       #             - dataset: CMSWCHARMRAT
                    - dataset: CMSDY2D11
                    - dataset: CMSWMU8TEV
                    - dataset: CMSJETS11
                    - dataset: CMSTTBARTOT
                    - dataset: CMSTOPDIFF8TEVTTRAPNORM
               - experiment: LHCb
                 datasets:
                    - dataset: LHCBZ940PB
                    - dataset: LHCBZEE2FB

shiftconfig:

   theoryid: 52

   fit: NNPDF31_nlo_as_0118_1000
   use_cuts: 'fromfit'

   experiments:
       from_: fit

   dataspecs:
       - theoryid: 52
         pdf: NNPDF31_nlo_as_0118_hessian
         speclabel: "NLO"
         fit: NNPDF31_nlo_as_0118_1000

       - theoryid: 53
         pdf: NNPDF31_nnlo_as_0118_hessian
         speclabel: "NNLO"
         fit: NNPDF31_nnlo_as_0118_1000

actions_:
    - shiftconfig plot_matched_datasets_shift_matrix_correlations
    - theoryconfig plot_thcorrmat_heatmap_custom_dataspecs

    



@Zaharid
Copy link
Contributor

Zaharid commented Oct 23, 2018 via email

@RosalynLP
Copy link
Contributor Author

@Zaharid
Copy link
Contributor

Zaharid commented Oct 23, 2018 via email

@RosalynLP
Copy link
Contributor Author

We have the functions produce_matched_datasets_from_dataspecs and produce_dataspecs_with_matched_cuts but I can't see anywhere where matched_datasets_from_dataspecs and dataspecs_with_matched_cuts themselves are defined. Presumably they are the output of the above functions but how does validphys know this?

So far I am attempting to do


   def produce_dataspecs_matched_between_namespaces(self, 
                                                    shiftconfig, 
                                                    theoryconfig, 
                                                    produce_matched_datasets_from_dataspecs,
                                                    produce_dataspecs_with_matched_cuts):
       total_dataspecs = theoryconfig['dataspecs'] + shiftconfig['dataspecs']
       matched_datasets = produce_matched_datasets_from_dataspecs(total_dataspecs)
       matched_datasets_with_cuts = produce_dataspecs_with_matched_cuts(matched_datasets)
       theoryconfig['dataspecs'] = matched_datasets_with_cuts
       shiftconfig['dataspecs'] = matched_datasets_with_cuts
       return theoryconfig, shiftconfig

(will this '+' operation work?). Does this look OK? I am not really sure how to reassign theoryconfig and shiftconfig - presumably just returning them like this doesn't really work? I then also don't know how to make validphys run this production rule in order to try and test it... do I have to somehow use it in theorycovariance.py?

@Zaharid
Copy link
Contributor

Zaharid commented Oct 23, 2018

Have a look here

https://data.nnpdf.science/validphys-docs/guide.html#configuration

the quality of the writing is not optimal but it should be ok to see how these things work. Indeed things like dataspecs_with_matched_cuts are defined by the corresponding production rule.

As a general comment, start from something small that works and is simple and evolve it to something that works and also does what you want. Always divide the work in pieces that you can test and reason about individually (and print things and fire up a terminal). An example of something small that works is a function that takes some parameters and does nothing with them, which can be useful to see how they get computed by vp.

For example vp wouldn't give you parameters called produce_XXX but rather the corresponding XXX.
But we don't want to do any of that here. You take just the two mappings that you start with, and then call the production rules on the total dataspecs (the + operator is fine, you can start by printing the result of it).
Now the production rules are normal methods of the Config class, which return a value that can be useful by itself, e.g. inside some other production rule (in addition reportengine calls production rules when asked to resolve specific names). See e.g.

full_ds = self.produce_dataset(

In this case the result we want is a list of namespaces that are the result of calling the two rules (one after the next) on the total_dataspecs list you have. The final result we want is a namespace (i.e. a dictionary) with the corresponding reshuffled lists of dataspecs, e.g return {'shiftconfig': ChainMap({'dataspecs': new_shift_specs}, shiftconfig), 'theoryconfig': ChainMap({'dataspecs': new_th_specs}, theoryconfig)}

@RosalynLP
Copy link
Contributor Author

Hi @Zaharid, @wilsonmr and I have been working on this today - it's currently a bit messy but the stage we are at is we think we have replaced the shiftconfig and theoryconfig datasets with matched datasets. We're now looking into testing this on the functions in theorycovariance.py but have got a bit confused over the sequence of collects which is necessary. For example before we had things like


matched_dataspecs_experiment_name = collect(
    'experiment_name', ['matched_datasets_from_dataspecs'])
matched_dataspecs_dataset_name = collect(
    'dataset_name', ['matched_datasets_from_dataspecs'])

matched_cuts_datasets = collect('dataset', ['dataspecs_with_matched_cuts'])
all_matched_datasets = collect('matched_cuts_datasets',
                               ['matched_datasets_from_dataspecs'])

but we no longer need to collect over matched_cuts etc. like this because the cuts should all be matched from the outset. Ideally we want to collect over dataspecs and then try something like

   {@with combined_shift_and_theory_dataspecs@}

   {@shiftconfig plot_matched_datasets_shift_matrix_correlations@}

   {@theoryconfig plot_thcorrmat_heatmap_custom_dataspecs@}

   {@endwith@}

but this doesn't seem to like to run (crashing) perhaps because of multiple uses of dataspecs_with_matched_cuts appearing due to the lines above?

@Zaharid
Copy link
Contributor

Zaharid commented Oct 24, 2018

@RosalynLP ISTM that the production rule you have now mixes the indexes of the datasets and of the input dataspecs. I think it should look more like this:

    def produce_combined_shift_and_theory_dataspecs(self, theoryconfig, shiftconfig):
        total_dataspecs = theoryconfig["dataspecs"] + shiftconfig["dataspecs"]
        matched_datasets = self.produce_matched_datasets_from_dataspecs(total_dataspecs)
        final_res = [
            self.produce_dataspecs_with_matched_cuts(ds["dataspecs"])
            for ds in matched_datasets
        ]
        new_theoryconfig = []
        new_shiftconfig = []
        len_th = len(theoryconfig['dataspecs'])
        for s in final_res:
            new_theoryconfig.append({"dataspecs": s[:len_th]})
            new_shiftconfig.append({"dataspecs": s[len_th:]})
        return {
            "shiftconfig": ChainMap({"dataspecs": new_shiftconfig}, shiftconfig),
            "theoryconfig": ChainMap({"dataspecs": new_theoryconfig}, theoryconfig),
        }

Then, given the runcard:

experiments:
    from_: fit

theory:
    from_: fit

theoryid:
    from_: theory

use_cuts: "fromfit"

shiftconfig:
    dataspecs:
        - fit: NNPDF31_nlo_as_0118_1000
        - fit: NNPDF31_nnlo_as_0118_1000


theoryconfig:
    dataspecs:
        - fit: 180421-lr-nlo-xF2dot0xR2dot0_global
        - fit: NNPDF31_nnlo_as_0118_uncorr_s3
        - fit: NNPDF31_nnlo_as_0118_1000


actions_:
    - combined_shift_and_theory_dataspecs

We have

In [1]: len(new_shiftconfig)                                                                                          
Out[1]: 45

In [2]: len(new_theoryconfig)                                                                                         
Out[2]: 45

In [3]: set(len(x['dataspecs']) for x in new_theoryconfig)                                                            
Out[3]: {3}

In [4]: set(len(x['dataspecs']) for x in new_shiftconfig)                                                             
Out[4]: {2}

@Zaharid
Copy link
Contributor

Zaharid commented Oct 24, 2018

Probably we can make shiftconfig and theoryconfig lists of namespaces themselves and remove the inner dataspecs in each of the input.

@Zaharid
Copy link
Contributor

Zaharid commented Oct 25, 2018

@RosalynLP @wilsonmr Have you had a chance to test this?

@RosalynLP
Copy link
Contributor Author

@Zaharid just making the changes now

@RosalynLP
Copy link
Contributor Author

@Zaharid I'm not sure whether I interpreted what you said right but I have modified the action and it runs fine for


actions_:
   - combined_shift_and_theory_dataspecs

in the runcard, but when I change this to

template_text: |
   {@with combined_shift_and_theory_dataspecs@}

   {@shiftconfig@}

   {@theoryconfig@}

   {@endwith@}

actions_:
   - report(main=True)

it seems to freeze the computer and continue to run for some indefinite length of time.

I can try leaving it to see what happens/if an error message crops up?

@wilsonmr
Copy link
Contributor

I'm also confused, why would we want

in[1]: len(new_shiftconfig)                                                                                          
Out[1]: 45

The new_shiftconfig should be a list of 2 dataspecs, which each in turn are dicts of whatever length no?

@Zaharid
Copy link
Contributor

Zaharid commented Oct 25, 2018

@wilsonmr I might be missing something here, but I'd say we want one namespace for each dataset where each of these namespaces contains the various ways to compute the dataset (e.g. with settings from different fits). This allows you to build an analog to all_matchec_results.

@Zaharid
Copy link
Contributor

Zaharid commented Oct 25, 2018

@RosalynLP The problem is that you are trying to print a deeply recursive complicated object. If you run with --debug and cancel with Control + C you can see that it is getting confused printing the thing (which surely you saw also when trying to print with IPython?)

Interrupted by user. Exiting.
[DEBUG]: The traceback of the exception below is:
Traceback (most recent call last):
  File "/home/zah/phd/reportengine/src/reportengine/app.py", line 374, in main
    self.run()
  File "/home/zah/nngit/nnpdf/validphys2/src/validphys/app.py", line 140, in run
    super().run()
  File "/home/zah/phd/reportengine/src/reportengine/app.py", line 359, in run
    rb.execute_sequential()
  File "/home/zah/phd/reportengine/src/reportengine/resourcebuilder.py", line 163, in execute_sequential
    result = callspec.function(self.rootns, callspec.nsspec)
  File "/home/zah/phd/reportengine/src/reportengine/report.py", line 426, in __call__
    expand_fuzzyspec=namespaces.expand_fuzzyspec,
  File "/home/zah/anaconda3/lib/python3.7/site-packages/jinja2/asyncsupport.py", line 76, in render
    return original_render(self, *args, **kwargs)
  File "/home/zah/anaconda3/lib/python3.7/site-packages/jinja2/environment.py", line 1005, in render
    return concat(self.root_render_func(self.new_context(vars)))
  File "<template>", line 17, in root
  File "/home/zah/anaconda3/lib/python3.7/site-packages/jinja2/runtime.py", line 262, in call
    return __obj(*args, **kwargs)
  File "/home/zah/phd/reportengine/src/reportengine/report.py", line 422, in format_collect_fuzzyspec
    return as_markdown(res)
  File "/home/zah/phd/reportengine/src/reportengine/report.py", line 398, in as_markdown
    return '\n'.join(as_markdown(elem) for elem in obj)
  File "/home/zah/phd/reportengine/src/reportengine/report.py", line 398, in <genexpr>
    return '\n'.join(as_markdown(elem) for elem in obj)
  File "/home/zah/phd/reportengine/src/reportengine/report.py", line 406, in as_markdown
    return str(obj)
  File "/home/zah/anaconda3/lib/python3.7/reprlib.py", line 21, in wrapper
    result = user_function(self)
  File "/home/zah/anaconda3/lib/python3.7/collections/__init__.py", line 937, in __repr__
    self, ', '.join(map(repr, self.maps)))
  File "/home/zah/anaconda3/lib/python3.7/reprlib.py", line 21, in wrapper
    result = user_function(self)
  File "/home/zah/anaconda3/lib/python3.7/collections/__init__.py", line 937, in __repr__
    self, ', '.join(map(repr, self.maps)))
  File "/home/zah/anaconda3/lib/python3.7/reprlib.py", line 21, in wrapper
    result = user_function(self)
  File "/home/zah/anaconda3/lib/python3.7/collections/__init__.py", line 937, in __repr__
    self, ', '.join(map(repr, self.maps)))
  File "/home/zah/anaconda3/lib/python3.7/reprlib.py", line 21, in wrapper
    result = user_function(self)
  File "/home/zah/anaconda3/lib/python3.7/collections/__init__.py", line 937, in __repr__
    self, ', '.join(map(repr, self.maps)))
  File "/home/zah/anaconda3/lib/python3.7/reprlib.py", line 21, in wrapper
    result = user_function(self)
  File "/home/zah/anaconda3/lib/python3.7/collections/__init__.py", line 937, in __repr__
    self, ', '.join(map(repr, self.maps)))
  File "/home/zah/anaconda3/lib/python3.7/reprlib.py", line 21, in wrapper
    result = user_function(self)
  File "/home/zah/anaconda3/lib/python3.7/collections/__init__.py", line 937, in __repr__
    self, ', '.join(map(repr, self.maps)))
  File "/home/zah/anaconda3/lib/python3.7/reprlib.py", line 21, in wrapper
    result = user_function(self)
  File "/home/zah/anaconda3/lib/python3.7/collections/__init__.py", line 937, in __repr__
    self, ', '.join(map(repr, self.maps)))
  File "/home/zah/anaconda3/lib/python3.7/collections/__init__.py", line 1074, in __repr__
    def __repr__(self): return repr(self.data)
  File "/home/zah/nngit/nnpdf/validphys2/src/validphys/core.py", line 68, in __repr__
    self.comp_tuple))
  File "/home/zah/nngit/nnpdf/validphys2/src/validphys/core.py", line 67, in <genexpr>
    argvals = ', '.join('%s=%r'%vals for vals in zip(self.argnames(),
  File "/home/zah/nngit/nnpdf/validphys2/src/validphys/core.py", line 68, in __repr__
    self.comp_tuple))
  File "/home/zah/nngit/nnpdf/validphys2/src/validphys/core.py", line 67, in <genexpr>
    argvals = ', '.join('%s=%r'%vals for vals in zip(self.argnames(),
  File "/home/zah/nngit/nnpdf/validphys2/src/validphys/core.py", line 67, in __repr__
    argvals = ', '.join('%s=%r'%vals for vals in zip(self.argnames(),
  File "/home/zah/nngit/nnpdf/validphys2/src/validphys/core.py", line 55, in argnames
    return list(inspect.signature(cls.__init__).parameters.keys())[1:]
  File "/home/zah/anaconda3/lib/python3.7/inspect.py", line 3070, in signature
    return Signature.from_callable(obj, follow_wrapped=follow_wrapped)
  File "/home/zah/anaconda3/lib/python3.7/inspect.py", line 2820, in from_callable
    follow_wrapper_chains=follow_wrapped)
  File "/home/zah/anaconda3/lib/python3.7/inspect.py", line 2219, in _signature_from_callable
    obj = unwrap(obj, stop=(lambda f: hasattr(f, "__signature__")))
  File "/home/zah/anaconda3/lib/python3.7/inspect.py", line 510, in unwrap
    recursion_limit = sys.getrecursionlimit()
KeyboardInterrupt

@wilsonmr
Copy link
Contributor

That was my bad, I tried to print those things. Ok I misinterpreted what the namespaces should be, I think it won't be too difficult to modify what we have to give the correct thing

@Zaharid
Copy link
Contributor

Zaharid commented Oct 25, 2018

That I certainly agree with!

@RosalynLP
Copy link
Contributor Author

What is it that we were trying to print, sorry? I'm confused

@Zaharid
Copy link
Contributor

Zaharid commented Oct 25, 2018

{@shiftconfig@} ends up being a complicated namespace containing various things that contain themselves. So the repr is getting confused.

@Zaharid
Copy link
Contributor

Zaharid commented Oct 25, 2018

Any progress with this? As expected, the first thing I tried didn't work, but the second did. The only nontrivial thing is that there must be ways to recover the old values (what @wilsonmr was asking this morning), so e.g. the values of dataset_name and experiment_name are preserved where appropriate and we have a way to refer to the original configuartion for e.g. seeing which theories we started with.

diff --git a/validphys2/src/validphys/config.py b/validphys2/src/validphys/config.py
index 1f87f35..ff6ab78 100644
--- a/validphys2/src/validphys/config.py
+++ b/validphys2/src/validphys/config.py
@@ -569,19 +569,18 @@ class CoreConfig(configparser.Config):
     def produce_combined_shift_and_theory_dataspecs(self, theoryconfig, shiftconfig):
         total_dataspecs = theoryconfig["dataspecs"] + shiftconfig["dataspecs"]
         matched_datasets = self.produce_matched_datasets_from_dataspecs(total_dataspecs)
-        final_res = [
-            self.produce_dataspecs_with_matched_cuts(ds["dataspecs"])
-            for ds in matched_datasets
-        ]
+        for ns in matched_datasets:
+            ns["dataspecs"] = self.produce_dataspecs_with_matched_cuts(ns["dataspecs"])
         new_theoryconfig = []
         new_shiftconfig = []
         len_th = len(theoryconfig['dataspecs'])
-        for s in final_res:
-            new_theoryconfig.append({"dataspecs": s[:len_th]})
-            new_shiftconfig.append({"dataspecs": s[len_th:]})
+        for s in matched_datasets:
+            new_theoryconfig.append(ChainMap({"dataspecs": s['dataspecs'][:len_th]}, s))
+            new_shiftconfig.append(ChainMap({"dataspecs": s['dataspecs'][len_th:]},s))
+
         return {
-            "shiftconfig": ChainMap({"dataspecs": new_shiftconfig}, shiftconfig),
-            "theoryconfig": ChainMap({"dataspecs": new_theoryconfig}, theoryconfig),
+            "shiftconfig": {"dataspecs": new_shiftconfig, 'original': shiftconfig},
+            "theoryconfig": {"dataspecs": new_theoryconfig, 'original': theoryconfig},
         }
 
 
diff --git a/validphys2/src/validphys/theorycovariance.py b/validphys2/src/validphys/theorycovariance.py
index a7c91f8..7f814e5 100644
--- a/validphys2/src/validphys/theorycovariance.py
+++ b/validphys2/src/validphys/theorycovariance.py
@@ -1,4 +1,3 @@
-# -*- coding: utf-8 -*-
 """
 theorycovariance.py
 Tools for constructing and studying theory covariance matrices.
@@ -813,7 +812,7 @@ def plot_datasets_chi2_theory(experiments,
     return fig
 
 
-matched_dataspecs_results = collect('results', ['dataspecs_with_matched_cuts'])
+matched_dataspecs_results = collect('results', ['dataspecs'])
 
 LabeledShifts = namedtuple('LabeledShifts',
     ('experiment_name', 'dataset_name', 'shifts'))
@@ -832,7 +831,7 @@ def dataspecs_dataset_prediction_shift(matched_dataspecs_results, experiment_nam
                          experiment_name=experiment_name, shifts=res)
 
 matched_dataspecs_dataset_prediction_shift = collect(
-    'dataspecs_dataset_prediction_shift', ['matched_datasets_from_dataspecs'])
+    'dataspecs_dataset_prediction_shift', ['dataspecs'])
 
 
 #Not sure we want to export this, as it is 231 Mb...
@@ -886,13 +885,17 @@ def plot_matched_datasets_shift_matrix_correlations(
     return plot_corrmat_heatmap(
         corrmat, "Shift outer product normalized (correlation) matrix")
 
+
+
+
 all_matched_results = collect('matched_dataspecs_results',
-                              ['matched_datasets_from_dataspecs'])
+                              ['dataspecs'])
 
 def combine_by_type_dataspecs(process_lookup, all_matched_results, matched_dataspecs_dataset_name):
     return combine_by_type(process_lookup, all_matched_results, matched_dataspecs_dataset_name)
 
-datapsecs_theoryids = collect('theoryid', ['dataspecs'])
+datapsecs_theoryids = collect('theoryid', ['theoryconfig', 'original',
+    'dataspecs'])
 
 def process_starting_points_dataspecs(combine_by_type_dataspecs):
     return process_starting_points(combine_by_type_dataspecs)
@@ -915,12 +918,12 @@ def covmap_dataspecs(combine_by_type_dataspecs, matched_dataspecs_dataset_name):
     return covmap(combine_by_type_dataspecs, matched_dataspecs_dataset_name)
 
 matched_dataspecs_experiment_name = collect(
-    'experiment_name', ['matched_datasets_from_dataspecs'])
+    'experiment_name', ['dataspecs'])
 matched_dataspecs_dataset_name = collect(
-    'dataset_name', ['matched_datasets_from_dataspecs'])
-matched_cuts_datasets = collect('dataset', ['dataspecs_with_matched_cuts'])
+    'dataset_name', ['dataspecs'])
+matched_cuts_datasets = collect('dataset', ['dataspecs'])
 all_matched_datasets = collect('matched_cuts_datasets',
-                               ['matched_datasets_from_dataspecs'])
+                               ['dataspecs'])
 
 
 def all_matched_data_lengths(all_matched_datasets):
@@ -961,7 +964,17 @@ def theory_covmat_custom_dataspecs(covs_pt_prescrip_dataspecs, covmap_dataspecs,
     return theory_covmat_custom(covs_pt_prescrip_dataspecs, covmap_dataspecs,
                                 matched_experiments_index)
 
-@table
+thx_corrmat = collect('theory_corrmat_custom_dataspecs', ['combined_shift_and_theory_dataspecs','theoryconfig'])
+shx_corrmat = collect('matched_datasets_shift_matrix', ['combined_shift_and_theory_dataspecs','shiftconfig'])
+
+@figure
+def foo(thx_corrmat, shx_corrmat):
+    fig,ax = plt.subplots()
+    ax.matshow(thx_corrmat[0]*shx_corrmat[0])
+    return fig
+
+
+#@table
 def theory_corrmat_custom_dataspecs(theory_covmat_custom_dataspecs):
     """Calculates the theory correlation matrix for scale variations
     with variations by process type"""

With that I seem to be able to get a plot with seemingly right dimensions:
https://vp.nnpdf.science/0-fOf9UoTuWaSnRnCp7Aqw==/

@RosalynLP
Copy link
Contributor Author

Ah I see, this will make things easier - I had been trying to implement various collect functions to calculate the theory matrix but with limited success. I'll see if I can reproduce some working results using the full dataspecs

@RosalynLP
Copy link
Contributor Author

I tried plotting the ratio of the shift to theory covmats:

https://vp.nnpdf.science/8LNePCv8T9O5ev8zwV5Mrw==/

The plot looks a bit weird in that a lot of the elements are 0 which shouldn't be the case seeing as both the correlation matrices should be composed purely of elements with values plus or minus 1. I'll try and work out what's going on.

@RosalynLP
Copy link
Contributor Author

This makes things a little clearer: https://vp.nnpdf.science/FYLkXOsbS-SRMsdzho-HOw==/

If you look at the shift correlation matrix here you can see that it looks different to before with a lot of 0 entries, translating to zero entries in the ratio plot. Something must be going wrong in the generation of the shift matrix. The theory matrix looks ok at a glance though.

@Zaharid
Copy link
Contributor

Zaharid commented Nov 5, 2018

So there are several things I don't like about this. I'll list a several but not all:

  1. The tests do not pass for obvious reasons.
  2. The process_lookup thing shouldn't be a provider but rather a normal function that gets called inside other providers. It should be dataset -> lookup string. As it is, it introduces some ugly dependencies.
  3. New stuff does not have docstrings in general.
  4. The previous interface was better in that it ensured that the dataspecs come from matched cuts and so the things had always the right dimensions. We should see how to have something similar here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants