Support scenario s3 only #158

kmarinushkin · 2022-10-01T23:54:08Z

Support OECM (S3 separate) benchmark

The changes touch only OECM benchmark for separate S3 scope. The default OECM benchmark for S1S2 scope is not affected.

To see the changes in GUI:

in the drop-down list "Select climate scenario", select "OECM (Scope 3) 1.5 degC"
click anywhere on the page, to make selection confirmed by the GUI
wait for the new temperature score to calculate and load

For this particular bemchmark, only S3 matters. If following commits, we use the absence of S1S2 as an indicator of S3-scope calculations Signed-off-by: Kirill Marinushkin <[email protected]>

kmarinushkin · 2022-10-01T23:59:33Z

@MichaelTiemannOSC i don't know, how to set reviewers for the PR. Maybe i don't have permissions, to do that. In any case, i ask you to become a reviewer.
Please also try this branch on your machine, i am curious if what you will see is what you expect.
I am open for feedback on both low-level and high-level views

P.S. the count of modified lines in this PR is giant, but it's only because of the first commit

MichaelTiemannOSC · 2022-10-02T00:06:58Z

I'll enhance your permissions in the GitHub org!

kmarinushkin · 2022-10-02T07:49:42Z

(Edited) Resolved in the latest commit:

So perhaps a better terminology cleanup would be to describe this as OECM (S3 separate) benchmark. We don’t need to be so verbose everywhere, but don’t want you to be surprised if/when you talk about scenarios when you mean benchmarks.

ITR/data/base_providers.py

MichaelTiemannOSC · 2022-10-02T23:23:13Z

ITR/data/base_providers.py

        """
        get the projected productions for list of companies in ghg_scope12
        :param company_sector_region_info: DataFrame with at least the following columns :
        ColumnsConfig.COMPANY_ID, ColumnsConfig.GHG_SCOPE12, ColumnsConfig.SECTOR and ColumnsConfig.REGION
+        :param scope: benchmark scope for projections


Ditto here: scope projections for production make no sense.

A long time ago, the code had significant confusion around naming where things that were emissions were labeled as production, and vice versa. I fixed most of that, but believe this is a vestigial case. Emissions and Emissions intensity have scopes. Production, by itself, has no scope.

Yes, agree. We will track this discussion in a different comment

I have looked for, but not found, where we finalized the discussion about using a scope parameter for get_company_projected_production. In any case, unless you have a really great example of where production output is somehow scope-dependent, I'm going to revert all these. Note that in your test case (test_base_providers.py:test_get_projected_production where you pass EScope.S1S2 to one calculation and EScope.S3 to a second calculation), both result in the same answer (which you also remark in the comments). Because there will never be any difference, it's just much cleaner and simpler to remove that parameter from that functional chain, which I will do in my subsequent merge.

ITR/data/base_providers.py

ITR/data/data_warehouse.py

test/test_base_providers.py

Scope to calculate is usually S1S2, unless benchmark doesn't specify it. The second candidate is S3 If scope to calculate is S3, `DataWarehouse` shouldn't merge S3 into S1S2. Signed-off-by: Kirill Marinushkin <[email protected]>

ITR/temperature_score.py

MichaelTiemannOSC · 2022-10-13T06:56:57Z

Please don't merge yet. The approval was intended to apply only to a given commit, not the whole PR.

The review was issued prematurely (GitHub UX not understood)

ITR/data/data_warehouse.py

examples/quick_template_score_calc.ipynb

ITR/temperature_score.py

MichaelTiemannOSC · 2022-10-24T20:05:53Z

examples/ITR_UI.py

@@ -608,12 +619,13 @@ def update_graph(
    fig1.update_layout(legend=dict(orientation="h", yanchor="bottom", y=1, xanchor="center", x=0.5))

    # Covered companies analysis
-    coverage = filt_df[['company_id', 'ghg_s1s2', 'cumulative_target']].copy()
+    scope_column = 'ghg_' + EI_bm.scope_to_calc.name.lower()
+    coverage = filt_df[['company_id', scope_column, 'cumulative_target']].copy()
    zeroE = Q_(0, 't CO2')


Note that in the following lines, the EXISTENCE of GHG_SCOPE12 is tested (which is value) but not the value (other than NaN, zero, or non-zero). Per earlier comments, that's fine--this is a way of telling whether data in a row exists. But any use of GHG_SCOPE12 as a value for actual weighting or computations would be highly suspect.

so, what's wrong in these lines?

We clarified that this use of GHG_SCOPE12 is valid, but any use other than initializing projections is not really value (such as for a weighting).

@MichaelTiemannOSC
i found this article (https://sciencebasedtargets.org/resources/legacy/2020/09/Temperature-Rating-Methodology-V1.pdf)
it suggests (page 22), that GHG weights only apply only for S1S2S3.
the table 5 shows, that S1S2 goes w/o weights, and S3 goes w/o weights.

Are you sure we need to apply weights for this PR?
Looks like it's for S1S2S3 only

I additionally looked through the os-climate internal document "BUILDING OPEN SOURCE PORTFOLIO ALIGNMENT CAPABILITIES
METHODOLOGY & PROCESS DOCUMENTATION".

On page 15, it says:

Temperature weighting: The two separately calculated temperature scores can be transparently combined using a (potentially) company-specific probability measure p(c) for the estimated likelihood that the target will be reached:
𝐼𝑇𝑅(𝑐) = 𝑝(𝑐) ∗ 𝐼𝑇𝑅(𝑐, 𝑡𝑎𝑟𝑔𝑒𝑡) + (1 – 𝑝(𝑐)) ∗ 𝐼𝑇𝑅(𝑐, 𝑡𝑟𝑎𝑗𝑒𝑐𝑡𝑜𝑟𝑦)

AFAIU, it confirms, that weights should apply to combined score S1S2S3, but not to individual scored S1S2 and S3.

Based on 2 references i posted earlier, i removed the commit, which earlier introduced weights to S1S2 and S3.
I think, there was a misunderstanding of the methodology, which from my perspective only recommends weights for combined score S1S2S3.
I understand that you are sure, that weight need to apply any way. If you still think that's the case, please reference some source of methodology, which convinces you in necessity of such move.

I left the adapted unit tests, so, we now have a better test coverage of S3 calculations.

Sanity check: for the company CBRE with high S3 emissions, the notebook s1s2_s3_calc.ipynb now calculates high S3 temperature score, as we might expect.

MichaelTiemannOSC · 2022-10-24T20:08:59Z

examples/data/json-units/benchmark_production_OECM.json

@@ -2,7 +2,7 @@
  "benchmark_temperature": "1.5 delta_degC",
  "benchmark_global_budget": "396 Gt CO2",
  "is_AFOLU_included": false,


S1S2S3 is 'AllScopes'. Also the TPI benchmarks are based on fully blended S1S2S3 numbers. TPI has a different global budget, different numbers, and different decarbonization pathways, but an internally consistent logic that we can follow and use. When a company reports S1, S2, and (categorized) S3 numbers, the tool can blend them to get a valid calc. Alas, there's no way to judge whether the S1+S2 components are heavy or light with respect to the S3 component, but it should still be possible to show how the two do stack up against the S1S2S3 number.

what do you suggest me to change?

Suggesting you change 'AllScopes' to 'S1S2S3' both in the code and the benchmark. The OECM data processing in itr-data-pipeline should be able to generate a proper S1S2S3 benchmark (which would save you the trouble of creating it from two separate benchmarks). I'll be AFK for an hour, then can answer back.

We earlier discussed, that production bm are unrelated to scope.
"AllScopes" is an attempt to highlight this fact.
Changing to "S1S2S3" may be confusing, because it's a standalone "scope" within our logic across the tool.
Let's find a more suitable name for this item.

"ScopeIndependent"? "General"? "All"?

I renamed into "AnyScope".
This might be more clear

Modify production benchmark input to `AnyScope` Signed-off-by: Kirill Marinushkin <[email protected]>

Add support for `scope` argument where necessary. Give benchmark scope_to_calc as argument. Signed-off-by: Kirill Marinushkin <[email protected]>

Use benchmark scope_to_calc for: * creation of `TemperatureScore` object * selection of scope content in the table in GUI Signed-off-by: Kirill Marinushkin <[email protected]>

This allows us to calculate temperature score for companies, which provided S3 data. Otherwise, we will get an exception "The value for S3 is missing for the following companies" Signed-off-by: Kirill Marinushkin <[email protected]>

The change only touches GUI, where the term 'scenario' was used for Emissions Intensity benchmark. This could confuse users Signed-off-by: Kirill Marinushkin <[email protected]>

GitHub token was required for this notebook, but never used. At the same time, not all ITR users have a GitHub token, which made this notebook not executable for them. When i removed it, it's absense didn't effect execution of the notebook. Which means, that it can be removed safely, and make the notebook more accessible for ITR users. Signed-off-by: Kirill Marinushkin <[email protected]>

This w/a resolves the exception: ~~~~ KeyError: "[('Europe', 'Construction Buildings', <EScope.S3: 'S3'>)] not in index" ~~~~ TODO: Remove this w/a, when associated EI benchmarks become available Signed-off-by: Kirill Marinushkin <[email protected]>

The notebook separately loads EI benchmarks for S1S2 and S3, and separately calculates temperature scores for selected scopes. The output is 2 separate tables with separately calculated scores Signed-off-by: Kirill Marinushkin <[email protected]>

MichaelTiemannOSC · 2022-11-29T20:48:47Z

test/test_base_providers.py

@@ -57,14 +63,17 @@ def setUp(self) -> None:
        self.base_warehouse = DataWarehouse(self.base_company_data, self.base_production_bm, self.base_EI_bm)
        self.company_ids = ["US0079031078",
                            "US00724F1012",
-                            "FR0000125338"]
+                            "FR0000125338",


A note for the future: this is an example of test data that we should re-sync with our sample public dataset. It's not helpful, as a test case, to have random ISINs and random production, emission, and intensity numbers that don't tie back to anything meaningful, unless the test data has been carefully constructed to trigger a particular code path. This is not an example of that; in the future we should fix this bit of code we inherited. Issue filed: #167

MichaelTiemannOSC · 2022-11-29T21:03:23Z

ITR/data/base_providers.py

@@ -107,6 +107,12 @@ def __init__(self, EI_benchmarks: IEIBenchmarkScopes,
        self._EI_benchmarks = EI_benchmarks
        self.temp_config = tempscore_config
        self.column_config = column_config
+        # Benchmark's scope, for which we calculate


The changes I made in template-v2-scopes change the paradigm from "what is scope you are trying to calculate, I'll give it to you" to "here are all the scope calculations relevant to the benchmark". In this way, the ITR code doesn't have to guess how the benchmark should be interpreted, it just gives all the answers and the caller decides how to aggregate the scopes. In this bit of code there's an internal logic prioritizing scopes S1+S2 over S3 if the first case is available. But that logic may not be correct for all cases (or cases where S1+S2+S3 should be computed). Best to let the caller aggregate (as the benchmark intends).

MichaelTiemannOSC · 2022-11-30T15:40:48Z

test/test_base_providers.py

        self.root = os.path.dirname(os.path.abspath(__file__))
        self.company_json = os.path.join(self.root, "inputs", "json", "fundamental_data.json")
        self.benchmark_prod_json = os.path.join(self.root, "inputs", "json", "benchmark_production_OECM.json")
-        self.benchmark_EI_json = os.path.join(self.root, "inputs", "json", "benchmark_EI_OECM_PC.json")
+        self.benchmark_EI_json = os.path.join(self.root, "inputs", "json", eibm_filename)


Later in this function, at new lines 48-49, the company data is re-parsed. But the fundamental company data (in JSON format) specifies intensity and target projections only for S1S2 data. With no S3 projections, the S3 budget cannot be calculated in the aggregation step. Copying the S1S2 data and declaring it as S3 data breaks other tests (because it changes other cumulative emissions calculations). We can create a separate Scope 3-only company data file with only Scope 3 data...but what, really, should we be testing? I think the answer can be found in the template-v2-scope changes of PR #166 .

json-units: Reduce benchmark_EI_OECM_S3 to S3

e52d393

For this particular bemchmark, only S3 matters. If following commits, we use the absence of S1S2 as an indicator of S3-scope calculations Signed-off-by: Kirill Marinushkin <[email protected]>

kmarinushkin requested a review from MichaelTiemannOSC October 2, 2022 07:17