-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP]: theory shift covmat comparison - updated #314
Conversation
return combine_by_type(process_lookup, all_matched_results, dataset_names) | ||
|
||
datapsecs_theoryids = collect('theoryid', ['dataspecs']) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dataspecs_theoryids rather than datapsecs_theoryids?
all_matched_results = collect('matched_dataspecs_results', | ||
['matched_datasets_from_dataspecs']) | ||
|
||
def combine_by_type2(process_lookup, all_matched_results, dataset_names): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you have the time could you please add descriptions to the functions? In particular, could you explain what the difference is between <function_name> and <function_name>2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These things shouldn't be called _2
in the first places. Should be _dataspecs
or something similar.
@RosalynLP Can we see the same covmat using the original functions? I agree this looks suspicious. |
I'm going to try running it with just BCDMS and HERACOMB to see if that makes things any clearer |
Also note that a potentially more elegant way to implement this is with NNPDF/reportengine#63. But let's get something working first, even if it's ugly. |
return covs_pt_prescrip(combine_by_type_dataspecs, process_starting_points_dataspecs, | ||
datapsecs_theoryids, fivetheories) | ||
|
||
def covmap2(combine_by_type_dataspecs, dataset_names): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this '2' be replaced with 'dataspecs' too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it should, I haven't pushed that change yet :)
matched_cuts_debug.pdf
I might try adding in a few more datasets to see if it makes it clearer what's going on |
matched_cuts_debug.pdf |
It seems like maybe the "same process type" matrix construction is being applied in some random regions which don't correspond to full datasets? |
Can you please upload these things with the relevant runcards? |
I think I made a stupid mistake with dataset_names. It should also use the per-datsepc version, which is different. I hadn't realized that at some point we have:
in |
This was out of sync because we were zipping the results with the datasets, as ordered in the original experiments, rather than the datasets as ordered by matched_datasets_from_dataspecs.
It looks more reasonable now: |
On Mon, Oct 22, 2018 at 4:50 PM RosalynLP ***@***.***> wrote:
matched_cuts_debug.pdf
<https://github.com/NNPDF/nnpdf/files/2502063/matched_cuts_debug.pdf>
This is a comparison of the "old" (point prescriptions as previously
implemented) and "new" (implementing using matched cuts and all the
dataspecs functions) theory correlation matrices, just for BCDMS versus
HERACOMB. You can see they're basically quite similar but if you look
closely there are some subtle differences - it appears as if HERACOMB has
less stringent cuts in the new case but for both I am using
fit: 180421-lr-nlo-central_global
use_cuts: "internal"
q2min: 3
w2min: 5
You shouldn't be using this cut configuration. These were bogus values that
I entered at random in the test runcard. q2min and w2min should not be
there and cuts should probably be "fromfit". Or else, you should find out
the correct q2min and w2min values.
… I might try adding in a few more datasets to see if it makes it clearer
what's going on
—
You are receiving this because your review was requested.
Reply to this email directly, view it on GitHub
<#314 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AFabUvp-FDqNW27sSSrTQ0KyJ7QlTYeDks5unekngaJpZM4XzONq>
.
|
Aha excellent, good point this looks far better |
OK I will change the cuts back, I altered it to match yours just for debugging purposes because that also introduced a difference. |
So this is the full thing: https://vp.nnpdf.science/3wNfZxXdR4SVPuRZEsg4rg==/figures/plot_thcorrmat_heatmap_custom_dataspecs.png |
Compared to https://vp.nnpdf.science/NlltmlyWRRqCtSeJbi1xIQ==/figures/plot_matched_datasets_shift_marix_correlations.png we can see that there is some similarity in the patterns at least. The next step is to create functions which take in both of these and compare the mutual parts, right? Presumably this means using another function a bit like matched cuts to find the mutual datasets with matching ccuts in each one and only use those elements? Or would it make sense if the matched cuts are computed separately using information from the 3 different fits, and then these are applied to both functions? |
I would do a production rule that takes 'shiftconfig' and 'theoryconfig' and returns a mapping containing 'shidtconfig' and 'theoryconfig' where each has the initial datasepcs and returns a mapping with 'shiftconfig' and 'theoryconfig' again, but with matched dataspecs, with the same data and the same matched cuts. Probably that production rule can use the existing ones ( |
OK so what I have been trying is the runcard below, and now I am writing a production rule in
or should I be passing in something other than
|
Have a look at my matched_cuts PR.
On Tue, Oct 23, 2018 at 2:32 PM RosalynLP ***@***.***> wrote:
OK so what I have been trying is the runcard below, and now I am writing a
production rule in config.py. I am a little unsure of what to take in.
Can I do something like:
def produce_dataspecs_matched_between_namespaces(self, shiftconfig,
theoryconfig):
or should I be passing in something other thanshiftconfigand theoryconfig
here? My plan was to do something like load in dataspecs from both
shiftconfig and theoryconfig and then call matched_datasets_from_dataspecs
and dataspecs_with_matched_cuts on these combined dataspecs. However, I
am not sure how to get dataspecs from shiftconfig and theoryconfig as
these namespaces both have a number of sub-elements so are not a simple
list of things (i.e. they also have information on the fit and experiments
and things, they aren't just lists of dataspecs).
This is simply a dictionary. To first approximation, it should be possible
to do e.g. theoryconfig['dataspecs'] and so on. I guess roughly the idea is
to merge the two lists of datasepcs, apply the two production rules to that
list and then split them again.
…
meta:
title: Testing combining namespaces for shift matrices
author: Rosalyn Pearson
keywords: [test, theory uncertainties]
experiments:
- experiment: BCDMS
datasets:
- dataset: BCDMSP
- dataset: BCDMSD
- experiment: HERACOMB
datasets:
- dataset: HERACOMBNCEM
- dataset: HERACOMBNCEP460
- dataset: HERACOMBNCEP575
- dataset: HERACOMBNCEP820
- dataset: HERACOMBNCEP920
- dataset: HERACOMBCCEM
- dataset: HERACOMBCCEP
- experiment: CMS
datasets:
- dataset: CMSWEASY840PB
- dataset: CMSWMASY47FB
# - dataset: CMSWCHARMRAT
- dataset: CMSDY2D11
- dataset: CMSWMU8TEV
- dataset: CMSJETS11
- dataset: CMSTTBARTOT
- dataset: CMSTOPDIFF8TEVTTRAPNORM
- experiment: LHCb
datasets:
- dataset: LHCBZ940PB
- dataset: LHCBZEE2FB
theoryconfig:
theoryids:
- 163
- 180
- 173
theoryid: 163
use_cuts: 'fromfit'
fit: 180421-lr-nlo-central_global
pdf:
from_: fit
dataspecs:
- theoryid: 163
speclabel: $(\xi_F,\xi_R)=(1,1)$
experiments:
- experiment: BCDMS
datasets:
- dataset: BCDMSP
- dataset: BCDMSD
- experiment: HERACOMB
datasets:
- dataset: HERACOMBNCEM
- dataset: HERACOMBNCEP460
- dataset: HERACOMBNCEP575
- dataset: HERACOMBNCEP820
- dataset: HERACOMBNCEP920
- dataset: HERACOMBCCEM
- dataset: HERACOMBCCEP
- experiment: CMS
datasets:
- dataset: CMSWEASY840PB
- dataset: CMSWMASY47FB
# - dataset: CMSWCHARMRAT
- dataset: CMSDY2D11
- dataset: CMSWMU8TEV
- dataset: CMSJETS11
- dataset: CMSTTBARTOT
- dataset: CMSTOPDIFF8TEVTTRAPNORM
- experiment: LHCb
datasets:
- dataset: LHCBZ940PB
- dataset: LHCBZEE2FB
- theoryid: 180
speclabel: $(\xi_F,\xi_R)=(2,2)$
experiments:
- experiment: BCDMS
datasets:
- dataset: BCDMSP
- dataset: BCDMSD
- experiment: HERACOMB
datasets:
- dataset: HERACOMBNCEM
- dataset: HERACOMBNCEP460
- dataset: HERACOMBNCEP575
- dataset: HERACOMBNCEP820
- dataset: HERACOMBNCEP920
- dataset: HERACOMBCCEM
- dataset: HERACOMBCCEP
- experiment: CMS
datasets:
- dataset: CMSWEASY840PB
- dataset: CMSWMASY47FB
# - dataset: CMSWCHARMRAT
- dataset: CMSDY2D11
- dataset: CMSWMU8TEV
- dataset: CMSJETS11
- dataset: CMSTTBARTOT
- dataset: CMSTOPDIFF8TEVTTRAPNORM
- experiment: LHCb
datasets:
- dataset: LHCBZ940PB
- dataset: LHCBZEE2FB
- theoryid: 173
speclabel: $(\xi_F,\xi_R)=(0.5,0.5)$
experiments:
- experiment: BCDMS
datasets:
- dataset: BCDMSP
- dataset: BCDMSD
- experiment: HERACOMB
datasets:
- dataset: HERACOMBNCEM
- dataset: HERACOMBNCEP460
- dataset: HERACOMBNCEP575
- dataset: HERACOMBNCEP820
- dataset: HERACOMBNCEP920
- dataset: HERACOMBCCEM
- dataset: HERACOMBCCEP
- experiment: CMS
datasets:
- dataset: CMSWEASY840PB
- dataset: CMSWMASY47FB
# - dataset: CMSWCHARMRAT
- dataset: CMSDY2D11
- dataset: CMSWMU8TEV
- dataset: CMSJETS11
- dataset: CMSTTBARTOT
- dataset: CMSTOPDIFF8TEVTTRAPNORM
- experiment: LHCb
datasets:
- dataset: LHCBZ940PB
- dataset: LHCBZEE2FB
shiftconfig:
theoryid: 52
fit: NNPDF31_nlo_as_0118_1000
use_cuts: 'fromfit'
experiments:
from_: fit
dataspecs:
- theoryid: 52
pdf: NNPDF31_nlo_as_0118_hessian
speclabel: "NLO"
fit: NNPDF31_nlo_as_0118_1000
- theoryid: 53
pdf: NNPDF31_nnlo_as_0118_hessian
speclabel: "NNLO"
fit: NNPDF31_nnlo_as_0118_1000
actions_:
- shiftconfig plot_matched_datasets_shift_matrix_correlations
- theoryconfig plot_thcorrmat_heatmap_custom_dataspecs
—
You are receiving this because your review was requested.
Reply to this email directly, view it on GitHub
<#314 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AFabUocETj69HqANvMkaynQ0R4Q4OQiaks5unxp9gaJpZM4XzONq>
.
|
… On Tue, Oct 23, 2018 at 3:11 PM RosalynLP ***@***.***> wrote:
This PR? https://github.com/NNPDF/nnpdf/pull/303/files
—
You are receiving this because your review was requested.
Reply to this email directly, view it on GitHub, or mute the thread.
|
We have the functions So far I am attempting to do
(will this '+' operation work?). Does this look OK? I am not really sure how to reassign |
Have a look here https://data.nnpdf.science/validphys-docs/guide.html#configuration the quality of the writing is not optimal but it should be ok to see how these things work. Indeed things like As a general comment, start from something small that works and is simple and evolve it to something that works and also does what you want. Always divide the work in pieces that you can test and reason about individually (and print things and fire up a terminal). An example of something small that works is a function that takes some parameters and does nothing with them, which can be useful to see how they get computed by vp. For example vp wouldn't give you parameters called nnpdf/validphys2/src/validphys/config.py Line 314 in 9da35f3
In this case the result we want is a list of namespaces that are the result of calling the two rules (one after the next) on the total_dataspecs list you have. The final result we want is a namespace (i.e. a dictionary) with the corresponding reshuffled lists of dataspecs, e.g return {'shiftconfig': ChainMap({'dataspecs': new_shift_specs}, shiftconfig), 'theoryconfig': ChainMap({'dataspecs': new_th_specs}, theoryconfig)}
|
Hi @Zaharid, @wilsonmr and I have been working on this today - it's currently a bit messy but the stage we are at is we think we have replaced the shiftconfig and theoryconfig datasets with matched datasets. We're now looking into testing this on the functions in theorycovariance.py but have got a bit confused over the sequence of collects which is necessary. For example before we had things like
but we no longer need to collect over matched_cuts etc. like this because the cuts should all be matched from the outset. Ideally we want to collect over dataspecs and then try something like
but this doesn't seem to like to run (crashing) perhaps because of multiple uses of |
@RosalynLP ISTM that the production rule you have now mixes the indexes of the datasets and of the input dataspecs. I think it should look more like this: def produce_combined_shift_and_theory_dataspecs(self, theoryconfig, shiftconfig):
total_dataspecs = theoryconfig["dataspecs"] + shiftconfig["dataspecs"]
matched_datasets = self.produce_matched_datasets_from_dataspecs(total_dataspecs)
final_res = [
self.produce_dataspecs_with_matched_cuts(ds["dataspecs"])
for ds in matched_datasets
]
new_theoryconfig = []
new_shiftconfig = []
len_th = len(theoryconfig['dataspecs'])
for s in final_res:
new_theoryconfig.append({"dataspecs": s[:len_th]})
new_shiftconfig.append({"dataspecs": s[len_th:]})
return {
"shiftconfig": ChainMap({"dataspecs": new_shiftconfig}, shiftconfig),
"theoryconfig": ChainMap({"dataspecs": new_theoryconfig}, theoryconfig),
} Then, given the runcard: experiments:
from_: fit
theory:
from_: fit
theoryid:
from_: theory
use_cuts: "fromfit"
shiftconfig:
dataspecs:
- fit: NNPDF31_nlo_as_0118_1000
- fit: NNPDF31_nnlo_as_0118_1000
theoryconfig:
dataspecs:
- fit: 180421-lr-nlo-xF2dot0xR2dot0_global
- fit: NNPDF31_nnlo_as_0118_uncorr_s3
- fit: NNPDF31_nnlo_as_0118_1000
actions_:
- combined_shift_and_theory_dataspecs We have In [1]: len(new_shiftconfig)
Out[1]: 45
In [2]: len(new_theoryconfig)
Out[2]: 45
In [3]: set(len(x['dataspecs']) for x in new_theoryconfig)
Out[3]: {3}
In [4]: set(len(x['dataspecs']) for x in new_shiftconfig)
Out[4]: {2}
|
Probably we can make |
@RosalynLP @wilsonmr Have you had a chance to test this? |
@Zaharid just making the changes now |
@Zaharid I'm not sure whether I interpreted what you said right but I have modified the action and it runs fine for
in the runcard, but when I change this to
it seems to freeze the computer and continue to run for some indefinite length of time. I can try leaving it to see what happens/if an error message crops up? |
I'm also confused, why would we want
The new_shiftconfig should be a list of 2 dataspecs, which each in turn are dicts of whatever length no? |
@wilsonmr I might be missing something here, but I'd say we want one namespace for each dataset where each of these namespaces contains the various ways to compute the dataset (e.g. with settings from different fits). This allows you to build an analog to |
@RosalynLP The problem is that you are trying to print a deeply recursive complicated object. If you run with Interrupted by user. Exiting.
[DEBUG]: The traceback of the exception below is:
Traceback (most recent call last):
File "/home/zah/phd/reportengine/src/reportengine/app.py", line 374, in main
self.run()
File "/home/zah/nngit/nnpdf/validphys2/src/validphys/app.py", line 140, in run
super().run()
File "/home/zah/phd/reportengine/src/reportengine/app.py", line 359, in run
rb.execute_sequential()
File "/home/zah/phd/reportengine/src/reportengine/resourcebuilder.py", line 163, in execute_sequential
result = callspec.function(self.rootns, callspec.nsspec)
File "/home/zah/phd/reportengine/src/reportengine/report.py", line 426, in __call__
expand_fuzzyspec=namespaces.expand_fuzzyspec,
File "/home/zah/anaconda3/lib/python3.7/site-packages/jinja2/asyncsupport.py", line 76, in render
return original_render(self, *args, **kwargs)
File "/home/zah/anaconda3/lib/python3.7/site-packages/jinja2/environment.py", line 1005, in render
return concat(self.root_render_func(self.new_context(vars)))
File "<template>", line 17, in root
File "/home/zah/anaconda3/lib/python3.7/site-packages/jinja2/runtime.py", line 262, in call
return __obj(*args, **kwargs)
File "/home/zah/phd/reportengine/src/reportengine/report.py", line 422, in format_collect_fuzzyspec
return as_markdown(res)
File "/home/zah/phd/reportengine/src/reportengine/report.py", line 398, in as_markdown
return '\n'.join(as_markdown(elem) for elem in obj)
File "/home/zah/phd/reportengine/src/reportengine/report.py", line 398, in <genexpr>
return '\n'.join(as_markdown(elem) for elem in obj)
File "/home/zah/phd/reportengine/src/reportengine/report.py", line 406, in as_markdown
return str(obj)
File "/home/zah/anaconda3/lib/python3.7/reprlib.py", line 21, in wrapper
result = user_function(self)
File "/home/zah/anaconda3/lib/python3.7/collections/__init__.py", line 937, in __repr__
self, ', '.join(map(repr, self.maps)))
File "/home/zah/anaconda3/lib/python3.7/reprlib.py", line 21, in wrapper
result = user_function(self)
File "/home/zah/anaconda3/lib/python3.7/collections/__init__.py", line 937, in __repr__
self, ', '.join(map(repr, self.maps)))
File "/home/zah/anaconda3/lib/python3.7/reprlib.py", line 21, in wrapper
result = user_function(self)
File "/home/zah/anaconda3/lib/python3.7/collections/__init__.py", line 937, in __repr__
self, ', '.join(map(repr, self.maps)))
File "/home/zah/anaconda3/lib/python3.7/reprlib.py", line 21, in wrapper
result = user_function(self)
File "/home/zah/anaconda3/lib/python3.7/collections/__init__.py", line 937, in __repr__
self, ', '.join(map(repr, self.maps)))
File "/home/zah/anaconda3/lib/python3.7/reprlib.py", line 21, in wrapper
result = user_function(self)
File "/home/zah/anaconda3/lib/python3.7/collections/__init__.py", line 937, in __repr__
self, ', '.join(map(repr, self.maps)))
File "/home/zah/anaconda3/lib/python3.7/reprlib.py", line 21, in wrapper
result = user_function(self)
File "/home/zah/anaconda3/lib/python3.7/collections/__init__.py", line 937, in __repr__
self, ', '.join(map(repr, self.maps)))
File "/home/zah/anaconda3/lib/python3.7/reprlib.py", line 21, in wrapper
result = user_function(self)
File "/home/zah/anaconda3/lib/python3.7/collections/__init__.py", line 937, in __repr__
self, ', '.join(map(repr, self.maps)))
File "/home/zah/anaconda3/lib/python3.7/collections/__init__.py", line 1074, in __repr__
def __repr__(self): return repr(self.data)
File "/home/zah/nngit/nnpdf/validphys2/src/validphys/core.py", line 68, in __repr__
self.comp_tuple))
File "/home/zah/nngit/nnpdf/validphys2/src/validphys/core.py", line 67, in <genexpr>
argvals = ', '.join('%s=%r'%vals for vals in zip(self.argnames(),
File "/home/zah/nngit/nnpdf/validphys2/src/validphys/core.py", line 68, in __repr__
self.comp_tuple))
File "/home/zah/nngit/nnpdf/validphys2/src/validphys/core.py", line 67, in <genexpr>
argvals = ', '.join('%s=%r'%vals for vals in zip(self.argnames(),
File "/home/zah/nngit/nnpdf/validphys2/src/validphys/core.py", line 67, in __repr__
argvals = ', '.join('%s=%r'%vals for vals in zip(self.argnames(),
File "/home/zah/nngit/nnpdf/validphys2/src/validphys/core.py", line 55, in argnames
return list(inspect.signature(cls.__init__).parameters.keys())[1:]
File "/home/zah/anaconda3/lib/python3.7/inspect.py", line 3070, in signature
return Signature.from_callable(obj, follow_wrapped=follow_wrapped)
File "/home/zah/anaconda3/lib/python3.7/inspect.py", line 2820, in from_callable
follow_wrapper_chains=follow_wrapped)
File "/home/zah/anaconda3/lib/python3.7/inspect.py", line 2219, in _signature_from_callable
obj = unwrap(obj, stop=(lambda f: hasattr(f, "__signature__")))
File "/home/zah/anaconda3/lib/python3.7/inspect.py", line 510, in unwrap
recursion_limit = sys.getrecursionlimit()
KeyboardInterrupt |
That was my bad, I tried to print those things. Ok I misinterpreted what the namespaces should be, I think it won't be too difficult to modify what we have to give the correct thing |
That I certainly agree with! |
What is it that we were trying to print, sorry? I'm confused |
|
Any progress with this? As expected, the first thing I tried didn't work, but the second did. The only nontrivial thing is that there must be ways to recover the old values (what @wilsonmr was asking this morning), so e.g. the values of diff --git a/validphys2/src/validphys/config.py b/validphys2/src/validphys/config.py
index 1f87f35..ff6ab78 100644
--- a/validphys2/src/validphys/config.py
+++ b/validphys2/src/validphys/config.py
@@ -569,19 +569,18 @@ class CoreConfig(configparser.Config):
def produce_combined_shift_and_theory_dataspecs(self, theoryconfig, shiftconfig):
total_dataspecs = theoryconfig["dataspecs"] + shiftconfig["dataspecs"]
matched_datasets = self.produce_matched_datasets_from_dataspecs(total_dataspecs)
- final_res = [
- self.produce_dataspecs_with_matched_cuts(ds["dataspecs"])
- for ds in matched_datasets
- ]
+ for ns in matched_datasets:
+ ns["dataspecs"] = self.produce_dataspecs_with_matched_cuts(ns["dataspecs"])
new_theoryconfig = []
new_shiftconfig = []
len_th = len(theoryconfig['dataspecs'])
- for s in final_res:
- new_theoryconfig.append({"dataspecs": s[:len_th]})
- new_shiftconfig.append({"dataspecs": s[len_th:]})
+ for s in matched_datasets:
+ new_theoryconfig.append(ChainMap({"dataspecs": s['dataspecs'][:len_th]}, s))
+ new_shiftconfig.append(ChainMap({"dataspecs": s['dataspecs'][len_th:]},s))
+
return {
- "shiftconfig": ChainMap({"dataspecs": new_shiftconfig}, shiftconfig),
- "theoryconfig": ChainMap({"dataspecs": new_theoryconfig}, theoryconfig),
+ "shiftconfig": {"dataspecs": new_shiftconfig, 'original': shiftconfig},
+ "theoryconfig": {"dataspecs": new_theoryconfig, 'original': theoryconfig},
}
diff --git a/validphys2/src/validphys/theorycovariance.py b/validphys2/src/validphys/theorycovariance.py
index a7c91f8..7f814e5 100644
--- a/validphys2/src/validphys/theorycovariance.py
+++ b/validphys2/src/validphys/theorycovariance.py
@@ -1,4 +1,3 @@
-# -*- coding: utf-8 -*-
"""
theorycovariance.py
Tools for constructing and studying theory covariance matrices.
@@ -813,7 +812,7 @@ def plot_datasets_chi2_theory(experiments,
return fig
-matched_dataspecs_results = collect('results', ['dataspecs_with_matched_cuts'])
+matched_dataspecs_results = collect('results', ['dataspecs'])
LabeledShifts = namedtuple('LabeledShifts',
('experiment_name', 'dataset_name', 'shifts'))
@@ -832,7 +831,7 @@ def dataspecs_dataset_prediction_shift(matched_dataspecs_results, experiment_nam
experiment_name=experiment_name, shifts=res)
matched_dataspecs_dataset_prediction_shift = collect(
- 'dataspecs_dataset_prediction_shift', ['matched_datasets_from_dataspecs'])
+ 'dataspecs_dataset_prediction_shift', ['dataspecs'])
#Not sure we want to export this, as it is 231 Mb...
@@ -886,13 +885,17 @@ def plot_matched_datasets_shift_matrix_correlations(
return plot_corrmat_heatmap(
corrmat, "Shift outer product normalized (correlation) matrix")
+
+
+
all_matched_results = collect('matched_dataspecs_results',
- ['matched_datasets_from_dataspecs'])
+ ['dataspecs'])
def combine_by_type_dataspecs(process_lookup, all_matched_results, matched_dataspecs_dataset_name):
return combine_by_type(process_lookup, all_matched_results, matched_dataspecs_dataset_name)
-datapsecs_theoryids = collect('theoryid', ['dataspecs'])
+datapsecs_theoryids = collect('theoryid', ['theoryconfig', 'original',
+ 'dataspecs'])
def process_starting_points_dataspecs(combine_by_type_dataspecs):
return process_starting_points(combine_by_type_dataspecs)
@@ -915,12 +918,12 @@ def covmap_dataspecs(combine_by_type_dataspecs, matched_dataspecs_dataset_name):
return covmap(combine_by_type_dataspecs, matched_dataspecs_dataset_name)
matched_dataspecs_experiment_name = collect(
- 'experiment_name', ['matched_datasets_from_dataspecs'])
+ 'experiment_name', ['dataspecs'])
matched_dataspecs_dataset_name = collect(
- 'dataset_name', ['matched_datasets_from_dataspecs'])
-matched_cuts_datasets = collect('dataset', ['dataspecs_with_matched_cuts'])
+ 'dataset_name', ['dataspecs'])
+matched_cuts_datasets = collect('dataset', ['dataspecs'])
all_matched_datasets = collect('matched_cuts_datasets',
- ['matched_datasets_from_dataspecs'])
+ ['dataspecs'])
def all_matched_data_lengths(all_matched_datasets):
@@ -961,7 +964,17 @@ def theory_covmat_custom_dataspecs(covs_pt_prescrip_dataspecs, covmap_dataspecs,
return theory_covmat_custom(covs_pt_prescrip_dataspecs, covmap_dataspecs,
matched_experiments_index)
-@table
+thx_corrmat = collect('theory_corrmat_custom_dataspecs', ['combined_shift_and_theory_dataspecs','theoryconfig'])
+shx_corrmat = collect('matched_datasets_shift_matrix', ['combined_shift_and_theory_dataspecs','shiftconfig'])
+
+@figure
+def foo(thx_corrmat, shx_corrmat):
+ fig,ax = plt.subplots()
+ ax.matshow(thx_corrmat[0]*shx_corrmat[0])
+ return fig
+
+
+#@table
def theory_corrmat_custom_dataspecs(theory_covmat_custom_dataspecs):
"""Calculates the theory correlation matrix for scale variations
with variations by process type""" With that I seem to be able to get a plot with seemingly right dimensions: |
Ah I see, this will make things easier - I had been trying to implement various collect functions to calculate the theory matrix but with limited success. I'll see if I can reproduce some working results using the full dataspecs |
I tried plotting the ratio of the shift to theory covmats: https://vp.nnpdf.science/8LNePCv8T9O5ev8zwV5Mrw==/ The plot looks a bit weird in that a lot of the elements are 0 which shouldn't be the case seeing as both the correlation matrices should be composed purely of elements with values plus or minus 1. I'll try and work out what's going on. |
This makes things a little clearer: https://vp.nnpdf.science/FYLkXOsbS-SRMsdzho-HOw==/ If you look at the shift correlation matrix here you can see that it looks different to before with a lot of 0 entries, translating to zero entries in the ratio plot. Something must be going wrong in the generation of the shift matrix. The theory matrix looks ok at a glance though. |
So there are several things I don't like about this. I'll list a several but not all:
|
… theory covariance matrix
Cleaner version carrying on from #309
https://vp.nnpdf.science/bQCke9RZT4-eC4A7szF80g==/figures/plot_thcorrmat_heatmap_custom2.png
This looks a bit weird to me - first it is missing FT DY relative to https://vp.nnpdf.science/NlltmlyWRRqCtSeJbi1xIQ==/, and second the correlations between different experiments look different to what we were previously seeing - I don't see why the upper left block of experiments should be so strongly correlated as they are just presented alphabetically - it seems as if maybe the labelling has been done wrong - any ideas of the cause of this? I am trying to debug but haven't spotted the issue yet.