Implement climate projections #929

ekatef · 2023-11-30T15:44:06Z

Changes proposed in this Pull Request

A preliminary version to add climate projections into cutout.

Current limitations

Projection is calculated as a shift + stretch transformation of a cutout file which is assumed to be a representative weather/climate dataset.
Simulation results for global climate models (GCM) are taken from the IPCC Global Atlas.
Model ranking is not used, a multimodel ensemble contains a maximum possible number of GCM.
A base year used to calculate the parameter change is taken form the period contained in GCMs runs.

Checklist

I consent to the release of this PR's code under the AGPLv3 license and non-code contributions under CC0-1.0 and CC-BY-4.0.
I tested my contribution locally and it seems to work fine.
Code and workflow changes are sufficiently documented.
Newly introduced dependencies are added to envs/environment.yaml and doc/requirements.txt.
Changes in configuration options are added in all of config.default.yaml and config.tutorial.yaml.
Add a test config or line additions to test/ (note tests are changing the config.tutorial.yaml)
Changes in configuration options are also documented in doc/configtables/*.csv and line references are adjusted in doc/configuration.rst and doc/tutorial.rst.
A note for the release notes doc/release_notes.rst is amended in the format of previous release notes, including reference to the requested PR.

davide-f

Thanks @ekatef for the great draft!!! :D
So, I think it is a nice draft and also to improve the review some ittle documentation and docstring of the functions may help.
The code seems in interesting shape.
We could think of improving the automatization of the snakemake.

In particular, we could specify that these projections start with name "projection-" or something like that.
Accordingly, we could add to the wildcard_constraints that the wildcard cutout cannot start with "projection-".
And this rule could have:
input: "cutouts/{cutout}"
output: "cutouts/projection-{cutout}"
That should make sure that if we require the workflow to use the projection-{anycutout} may fully automatize everything, but needs some check.

The wildcard constraint. may be something like "^(?!projection-).*"

I really look forward for the full implementation :D

davide-f · 2023-12-02T23:33:34Z

scripts/build_climate_projections.py

+    snapshots = pd.date_range(freq="h", **snakemake.params.snapshots)
+    season_in_focus = snapshots.month.unique().to_list()
+
+    cmip6_xr = xr.open_dataset(cmip6)


How to load this dataset?

As discussed, the dataset is available via Copernicus and licensed as CC-BY 4.0, being IPCC product 🙏🏽

So, it can be downloaded manually or included into a dedicated databundle or loaded via Copernicus API with a request as simple as

c.retrieve( 'projections-climate-atlas', { 'format': 'zip', 'origin': 'CMIP6', 'experiment': 'ssp2_4_5', 'domain': 'global', 'period': '2015-2100', 'variable': 'monthly_mean_of_daily_mean_temperature', }, 'download.zip')

davide-f · 2023-12-02T23:36:45Z

scripts/build_climate_projections.py

+    for i in range(0, len(cutout_xr.y)):
+        for j in range(0, len(cutout_xr.x)):
+            cutout_xr.temperature[k_time, i, j] = np.add(
+                cutout_xr.temperature[k_time, i, j],
+                np.array([dt_xr[i, j].item()] * k_time.sum().item()),
+            )


At first sight, it feels like this code may take long time to compute.
What are you trying to do here?

I need to modify a temperature value at each spatial point according to a projection for an increase in the monthly temperature. You are absolutely right that it may be quite time-consuming. Do you have any ideas on possible approaches to increase performance?

davide-f · 2023-12-02T23:41:37Z

scripts/build_climate_projections.py

+    # TODO read from the cutout file instead of hardcoding
+    dx_new = 0.3
+
+    newlon = np.arange(


May be good to use np.max and alike but unsure performences increas significantly

Thanks a lot for the hint! Definitely worth investigation.

Have done some profiling, and np.min( ), np.max( ) have been 5-10% faster. Agree that it's not breakthrough in performance, but it is consistent improvement which doesn't affect code readability. So, I'd opt for it. Thanks for the suggestion! :)

Thanks for checking, by looking at the atlite code it raised few ideas:

is it possible to load the cmip file as an atlite cutout? this doesn't mean to use the cutout for atlite but simply leverage on the utility functions it has

if not, the cutout.py file gives quite some hints to improve the functionalities if the task is actually time consuming. For example, the minimum value may always be the first of the list while the maximum the last one (to verify though)

Link to the cutout.py file : https://github.com/PyPSA/atlite/blob/a71bca2f6f7221b70dbbc371752aef56d937b1ff/atlite/cutout.py#L314

scripts/build_climate_projections.py

ekatef · 2023-12-04T23:19:12Z

Thanks @ekatef for the great draft!!! :D So, I think it is a nice draft and also to improve the review some ittle documentation and docstring of the functions may help. The code seems in interesting shape. We could think of improving the automatization of the snakemake.

In particular, we could specify that these projections start with name "projection-" or something like that. Accordingly, we could add to the wildcard_constraints that the wildcard cutout cannot start with "projection-". And this rule could have: input: "cutouts/{cutout}" output: "cutouts/projection-{cutout}" That should make sure that if we require the workflow to use the projection-{anycutout} may fully automatize everything, but needs some check.

The wildcard constraint. may be something like "^(?!projection-).*"

I really look forward for the full implementation :D

Thank you so much @davide-f!

Great idea with the projection prefix. Regarding wildcards, I'm also thinking that probably we may want to include also the scenario name into the file name, but not sure if it wouldn't over-complication. An alternative is to keep details of the projection approach as attributes inside the projection file.

Absolutely agree on your suggestions regarding the doc-strings, and will play a bit with performance. Thank you so much for your support ;)

…atef/pypsa-earth into implement_climate_projections

ekatef · 2023-12-28T19:49:01Z

Results for performance testing of interpolate_cmip6_to_cutout_grid: base min( ), max( ) vs np.min( ), np.max( ):

Argentina, 1-month cutout

1 attempt

profiling results - baseline implementation
20.76103920803871
profiling results - np implementation
12.266976625076495

7 attempts

profiling results - baseline implementation
105.9964517080225
profiling results - np implementaion
92.74140579102095

20 attempts

profiling results - baseline implementation
254.6588744580513
profiling results - np implementation
239.80646516697016

Silk-Road region, 1-year cutout

1 attempt

profiling results - baseline implementation
125.40555191691965
profiling results - np implementation
124.65340708300937

5 attempts

profiling results - baseline implementation
598.7846881250152
profiling results - np implementation
579.6119187080767

ekatef · 2023-12-28T20:07:01Z

Have added a stretch approach, improved docstrings, removed hardcoding in the names of parameters to be projected and revised Snakemake interfaces.

From the functional perspective, I think that is good time to stop now and finalise the implementation. In particular:

validity checks should be added to avoid incorrect results due to some mismatch in the dates;
climate inputs should be added to the databundle: we have to discuss it from the usability perspective;
a test should be added to cover this part, as currently CI simply ignores this part being kind of over-optimistic;
notebooks, examples and viz approaches are needed, and best way to develop them is to start work on some meaningful applications.

ekatef · 2024-01-11T18:38:24Z

As discussed during the weekly meeting:

we can keep the CMIP6 databundle as an option additional to the main workflow;
given high quality and moderate size of CMIP6 inputs, we can adding all the datasets which may be relevant to be used in building projections.

ekatef · 2024-01-20T19:31:07Z

A possible approach to improve performance may be making calculations on the whole grid instead of modifying the array point-by-point (this part, in particular). Some nice hints on that can be found here. However, that leads to a performance-memory trade-off, for which chunks magic of dask can help, like suggests this approach.

ekatef · 2024-01-20T19:37:21Z

@davide-f my feeling is that this PR may be ready for the next review round. There is still some technical work to be done: datakit preparation, adding a test and probably performance improvements. But it would be great to check that the general design and interfaces look reasonable. I'd be very grateful for your opinion on that 🙂

davide-f · 2024-01-22T18:18:24Z

scripts/build_climate_projections.py

+
+    cmip6_region = cmip6_xr.sel(
+        lat=slice(
+            min(cutout_xr.coords["y"].values) - d_pad,
+            max(cutout_xr.coords["y"].values) + d_pad,
+        ),
+        lon=slice(
+            min(cutout_xr.coords["x"].values) - d_pad,
+            max(cutout_xr.coords["x"].values) + d_pad,
+        ),
+    )


very nice! I'm wondering, instead of doing min/max, may it be that the first element of the list is also the lowest?
so like cutout_xr.coords["y"].values[0] is the lowest and cutout_xr.coords["y"].values[-1] is the max value?

if so we could avoid the max/mins in the whole document.
Maybe we could use some utility functions for that.

I'm wondering, but is this cmip6 compatible with the object cutout in atlite?
As it is an xarray maybe?

is we load it as a cutout, we can use the bounds function of the cutout object and this facilitates a lot the whole style.
This is a guess though.

Copying here the link to the bounds function and the atlite file, that can give some ideas for the coding style :)
https://github.com/PyPSA/atlite/blob/a71bca2f6f7221b70dbbc371752aef56d937b1ff/atlite/cutout.py#L314

Thank you! A really great idea to align with atlite approaches.

As for the particular bounds calculation, I have investigated this idea, but would say that it doesn't feel like a clear approach to use "magic" index numbers. Agree that it's quite common. But personally, I'm not capable to keep in mind which integer means what, and that is usually quite tricky to deal with the code using such conventions. Not sure it's justifiable but performance gains, as currently performance bottlenecks seem to be in spatial calculations.

What would potentially be interesting to adopt from atlite is work with data chunks. It may be probably a good idea to investigate it for increasing the performance. What do you think?

Mmm maybe it would be good to at least create here an auxiliary function that, having as input an xarray, returns the boundaries like (xmin, xmax, ymin, ymax) or alike.
having that function may help clarifying the readability regardless of their implementation; although, I believe the xmin may be the first value of x and the last one the maximum.
Likewise, an auxiliary function that calculate the dx and dy may be useful although trivial: the advantage may be readability and manutentibility.

What do you think?

davide-f

Hello @ekatef ,

The PR is in great shape! I added few stylish comments, but the document is in great shape 🗡️
There is no automatic download here but I guess this is expected for the moment right?

davide-f · 2024-01-22T18:25:00Z

scripts/build_climate_projections.py

+    # TODO read from the cutout file instead of hardcoding
+    dx_new = 0.3
+
+    newlon = np.arange(


Thanks for checking, by looking at the atlite code it raised few ideas:

is it possible to load the cmip file as an atlite cutout? this doesn't mean to use the cutout for atlite but simply leverage on the utility functions it has

if not, the cutout.py file gives quite some hints to improve the functionalities if the task is actually time consuming. For example, the minimum value may always be the first of the list while the maximum the last one (to verify though)

Link to the cutout.py file : https://github.com/PyPSA/atlite/blob/a71bca2f6f7221b70dbbc371752aef56d937b1ff/atlite/cutout.py#L314

scripts/build_climate_projections.py

davide-f · 2024-01-22T18:28:07Z

scripts/build_climate_projections.py

+        CMIP6 climate projections loaded as xarray
+    month: float
+        A month value to be considered further for the morphing procedure
+    year0: integer


are year 0 and 1 like the base year and the prediction year?
the name could be more explicit

Agree, revised. Are they more clear now?

davide-f · 2024-01-22T18:28:41Z

scripts/build_climate_projections.py

+                month=k_month,
+                year0=year0,
+                year1=year1,
+                years_window=5,


may this be a parameter?

It must be, I have forgotten to use it there. Fixed now. Thanks :)

davide-f · 2024-01-22T18:32:35Z

Snakefile

@@ -363,6 +365,33 @@ if config["enable"].get("build_cutout", False):
            "scripts/build_cutout.py"


+if config["enable"].get("modify_cutout", False):


I think we could call the option climate_projection_cutout or something like that?

I was considering if adding an option different from build_cutout may be necessary, but I believe so to avoid overwriting existing "original" cutouts

Yeah, agree that it would be nice to keep the main workflow as safe as possible from interactions with these additions.

The option name revised. Have I understood you idea correctly? :)

ekatef · 2024-03-09T21:46:15Z

Hello @ekatef ,

The PR is in great shape! I added few stylish comments, but the document is in great shape 🗡️ There is no automatic download here but I guess this is expected for the moment right?

Hello @davide-f! Thanks a lot for the review :D That is fantastic to have your support.

Have revised the code, trying to address you comments. Not sure I have get all the points right (sorry for that!), and happy to discuss further in any form which is convenient for you.

Also, added a number of checks to prepare_cmip6 with an intention is to eliminate the most annoying errors which can happen due to dates mismatch (like when there is not enough years for the requested period, and what is supposed to be a multi-annual average is calculated on data for one-two years only).

Absolutely agree that it would be perfect to apply some approaches from atlite to improve the performance. My feeling is that it may be a good approach to parallelise computations using chunks, but I haven't yet fully understood atlite implementation. Would be very grateful for any suggestions on that :)

ekatef · 2024-03-09T21:59:09Z

Ah, regarding the inputs: they are not addressed yet. My feeling is that it would be great to load to zenodo datasets for the major parameters which may be relevant for energy modeling. If using the simplest shift calculation approach, amount of the data needed is about 3GB for the whole globe. Adding a stretch transformation requires additional 6GB for each parameter (fields of min and max parameter values).

The whole wish-list may look like:

air temperature;
wind speed;
precipitation (mainly needed as a predictor for runoff);
irradiation or cloudiness (accuracy is not really great in general climate models, but there is not much alternatives anyway).

Probably, we can start with temperature only, and adjust the approaches along the way. What do you think?

davide-f

Great, I think the PR is in great shape :) minor comments.
P.S. there are still a lot of plots here that may be removed in the finalized version.

davide-f · 2024-03-12T20:21:14Z

scripts/build_climate_projections.py

+
+    cmip6_region = cmip6_xr.sel(
+        lat=slice(
+            min(cutout_xr.coords["y"].values) - d_pad,
+            max(cutout_xr.coords["y"].values) + d_pad,
+        ),
+        lon=slice(
+            min(cutout_xr.coords["x"].values) - d_pad,
+            max(cutout_xr.coords["x"].values) + d_pad,
+        ),
+    )


Mmm maybe it would be good to at least create here an auxiliary function that, having as input an xarray, returns the boundaries like (xmin, xmax, ymin, ymax) or alike.
having that function may help clarifying the readability regardless of their implementation; although, I believe the xmin may be the first value of x and the last one the maximum.
Likewise, an auxiliary function that calculate the dx and dy may be useful although trivial: the advantage may be readability and manutentibility.

What do you think?

ekatef added 11 commits November 19, 2023 16:58

Initial commit

73601f9

Move a projection script into script directory

c68abbe

Restructure into Snakemake workflow

2beb879

Organize code

1a24bad

Add inputs

174b0b1

Fix Snakemake inputs

4594541

Improve documenting

83c6b0b

Code clean-up

b196e5a

Remove hardcoding and add debug plotting

369bc9b

Add parameters to the tacked configs

d1fd454

Merge branch 'main' into implement_climate_projections

21a1751

davide-f reviewed Dec 2, 2023

View reviewed changes

ekatef added 17 commits December 26, 2023 13:22

Merge branch 'main' into implement_climate_projections

33225c7

Fix wildcards for the projection

72272d5

Merge branch 'pypsa-meets-earth:main' into implement_climate_projections

b288ce7

Merge branch 'implement_climate_projections' of https://github.com/ek…

7d3de0f

…atef/pypsa-earth into implement_climate_projections

Adjust inputs-outputs in the script docstring

bc5575f

Add methodology

d77af95

Add docstrings

51c8a09

Add TODOs

cebf6a3

Fix format

0d7ec1d

Fix input

0e507c1

Remove hardcoding for a parameter name in a projection dataset

47c628a

Remove hardcoding for a parameter name in a cutout dataset

dc735ba

Wrap plots into functions

4500f45

Update docstrings

7d8e17e

Draft stretch implementation

42da0b8

Remove a TODO

e4780a9

Use numpy functions

dbe81ac

ekatef added 2 commits December 28, 2023 22:43

Test stretch

f68bf21

Fix plot parameters

ba3d1a7

ekatef added 2 commits January 20, 2024 21:39

Improve comments to the config parameters

c1bfe0c

Add comments to the projection calculations

9d21fde

davide-f reviewed Jan 22, 2024

View reviewed changes

ekatef added 15 commits March 8, 2024 17:44

Merge branch 'main' into implement_climate_projections

58ec438

Fix formatting

a8f8636

Enhance docstrings

33d90d2

Improve naming

f5cb70e

Fix parameters in a function call

2177bb1

Merge branch 'main' into implement_climate_projections

49117f2

Improve explanation for stretch calculations

19b65fc

Improve naming and add enable parameters to the configs

7ca6216

Fix typo

39fd2d1

Replace hard-coding

517e422

Improve naming

1f3520d

Add a check for time match

dd4b060

Fix typo

f71390f

Update the docstring

14707c8

Remove TODOs

47e7946

ekatef marked this pull request as ready for review March 9, 2024 21:46

davide-f reviewed Mar 12, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement climate projections #929

Implement climate projections #929

ekatef commented Nov 30, 2023 •

edited

Loading

davide-f left a comment •

edited

Loading

davide-f Dec 2, 2023

ekatef Dec 4, 2023

davide-f Dec 2, 2023

ekatef Dec 4, 2023

davide-f Dec 2, 2023

ekatef Dec 4, 2023

ekatef Dec 28, 2023

davide-f Jan 22, 2024

ekatef commented Dec 4, 2023

ekatef commented Dec 28, 2023 •

edited

Loading

ekatef commented Dec 28, 2023 •

edited

Loading

ekatef commented Jan 11, 2024

ekatef commented Jan 20, 2024

ekatef commented Jan 20, 2024

davide-f Jan 22, 2024 •

edited

Loading

ekatef Mar 9, 2024

davide-f Mar 12, 2024

davide-f left a comment

davide-f Jan 22, 2024

davide-f Jan 22, 2024

ekatef Mar 9, 2024

davide-f Jan 22, 2024

ekatef Mar 9, 2024

davide-f Jan 22, 2024

ekatef Mar 9, 2024

ekatef commented Mar 9, 2024

ekatef commented Mar 9, 2024

davide-f left a comment

davide-f Mar 12, 2024

		@@ -363,6 +365,33 @@ if config["enable"].get("build_cutout", False):
		"scripts/build_cutout.py"


		if config["enable"].get("modify_cutout", False):

Implement climate projections #929

Are you sure you want to change the base?

Implement climate projections #929

Conversation

ekatef commented Nov 30, 2023 • edited Loading

Changes proposed in this Pull Request

Current limitations

Checklist

davide-f left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ekatef commented Dec 4, 2023

ekatef commented Dec 28, 2023 • edited Loading

Argentina, 1-month cutout

1 attempt

7 attempts

20 attempts

Silk-Road region, 1-year cutout

1 attempt

5 attempts

ekatef commented Dec 28, 2023 • edited Loading

ekatef commented Jan 11, 2024

ekatef commented Jan 20, 2024

ekatef commented Jan 20, 2024

davide-f Jan 22, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davide-f left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ekatef commented Mar 9, 2024

ekatef commented Mar 9, 2024

davide-f left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ekatef commented Nov 30, 2023 •

edited

Loading

davide-f left a comment •

edited

Loading

ekatef commented Dec 28, 2023 •

edited

Loading

ekatef commented Dec 28, 2023 •

edited

Loading

davide-f Jan 22, 2024 •

edited

Loading