Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement climate projections #929

Open
wants to merge 56 commits into
base: main
Choose a base branch
from

Conversation

ekatef
Copy link
Member

@ekatef ekatef commented Nov 30, 2023

Changes proposed in this Pull Request

A preliminary version to add climate projections into cutout.

Current limitations

  1. Projection is calculated as a shift + stretch transformation of a cutout file which is assumed to be a representative weather/climate dataset.
  2. Simulation results for global climate models (GCM) are taken from the IPCC Global Atlas.
  3. Model ranking is not used, a multimodel ensemble contains a maximum possible number of GCM.
  4. A base year used to calculate the parameter change is taken form the period contained in GCMs runs.

Checklist

  • I consent to the release of this PR's code under the AGPLv3 license and non-code contributions under CC0-1.0 and CC-BY-4.0.
  • I tested my contribution locally and it seems to work fine.
  • Code and workflow changes are sufficiently documented.
  • Newly introduced dependencies are added to envs/environment.yaml and doc/requirements.txt.
  • Changes in configuration options are added in all of config.default.yaml and config.tutorial.yaml.
  • Add a test config or line additions to test/ (note tests are changing the config.tutorial.yaml)
  • Changes in configuration options are also documented in doc/configtables/*.csv and line references are adjusted in doc/configuration.rst and doc/tutorial.rst.
  • A note for the release notes doc/release_notes.rst is amended in the format of previous release notes, including reference to the requested PR.

Copy link
Member

@davide-f davide-f left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ekatef for the great draft!!! :D
So, I think it is a nice draft and also to improve the review some ittle documentation and docstring of the functions may help.
The code seems in interesting shape.
We could think of improving the automatization of the snakemake.

In particular, we could specify that these projections start with name "projection-" or something like that.
Accordingly, we could add to the wildcard_constraints that the wildcard cutout cannot start with "projection-".
And this rule could have:
input: "cutouts/{cutout}"
output: "cutouts/projection-{cutout}"
That should make sure that if we require the workflow to use the projection-{anycutout} may fully automatize everything, but needs some check.

The wildcard constraint. may be something like "^(?!projection-).*"

I really look forward for the full implementation :D

snapshots = pd.date_range(freq="h", **snakemake.params.snapshots)
season_in_focus = snapshots.month.unique().to_list()

cmip6_xr = xr.open_dataset(cmip6)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How to load this dataset?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, the dataset is available via Copernicus and licensed as CC-BY 4.0, being IPCC product 🙏🏽

So, it can be downloaded manually or included into a dedicated databundle or loaded via Copernicus API with a request as simple as

c.retrieve(
    'projections-climate-atlas',
    {
        'format': 'zip',
        'origin': 'CMIP6',
        'experiment': 'ssp2_4_5',
        'domain': 'global',
        'period': '2015-2100',
        'variable': 'monthly_mean_of_daily_mean_temperature',
    },
    'download.zip')

Comment on lines 123 to 128
for i in range(0, len(cutout_xr.y)):
for j in range(0, len(cutout_xr.x)):
cutout_xr.temperature[k_time, i, j] = np.add(
cutout_xr.temperature[k_time, i, j],
np.array([dt_xr[i, j].item()] * k_time.sum().item()),
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first sight, it feels like this code may take long time to compute.
What are you trying to do here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to modify a temperature value at each spatial point according to a projection for an increase in the monthly temperature. You are absolutely right that it may be quite time-consuming. Do you have any ideas on possible approaches to increase performance?

# TODO read from the cutout file instead of hardcoding
dx_new = 0.3

newlon = np.arange(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be good to use np.max and alike but unsure performences increas significantly

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the hint! Definitely worth investigation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have done some profiling, and np.min( ), np.max( ) have been 5-10% faster. Agree that it's not breakthrough in performance, but it is consistent improvement which doesn't affect code readability. So, I'd opt for it. Thanks for the suggestion! :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for checking, by looking at the atlite code it raised few ideas:

  1. is it possible to load the cmip file as an atlite cutout? this doesn't mean to use the cutout for atlite but simply leverage on the utility functions it has
  2. if not, the cutout.py file gives quite some hints to improve the functionalities if the task is actually time consuming. For example, the minimum value may always be the first of the list while the maximum the last one (to verify though)

Link to the cutout.py file : https://github.com/PyPSA/atlite/blob/a71bca2f6f7221b70dbbc371752aef56d937b1ff/atlite/cutout.py#L314

scripts/build_climate_projections.py Show resolved Hide resolved
@ekatef
Copy link
Member Author

ekatef commented Dec 4, 2023

Thanks @ekatef for the great draft!!! :D So, I think it is a nice draft and also to improve the review some ittle documentation and docstring of the functions may help. The code seems in interesting shape. We could think of improving the automatization of the snakemake.

In particular, we could specify that these projections start with name "projection-" or something like that. Accordingly, we could add to the wildcard_constraints that the wildcard cutout cannot start with "projection-". And this rule could have: input: "cutouts/{cutout}" output: "cutouts/projection-{cutout}" That should make sure that if we require the workflow to use the projection-{anycutout} may fully automatize everything, but needs some check.

The wildcard constraint. may be something like "^(?!projection-).*"

I really look forward for the full implementation :D

Thank you so much @davide-f!

Great idea with the projection prefix. Regarding wildcards, I'm also thinking that probably we may want to include also the scenario name into the file name, but not sure if it wouldn't over-complication. An alternative is to keep details of the projection approach as attributes inside the projection file.

Absolutely agree on your suggestions regarding the doc-strings, and will play a bit with performance. Thank you so much for your support ;)

@ekatef
Copy link
Member Author

ekatef commented Dec 28, 2023

Results for performance testing of interpolate_cmip6_to_cutout_grid: base min( ), max( ) vs np.min( ), np.max( ):

Argentina, 1-month cutout

1 attempt

profiling results - baseline implementation
20.76103920803871
profiling results - np implementation
12.266976625076495

7 attempts

profiling results - baseline implementation
105.9964517080225
profiling results - np implementaion
92.74140579102095

20 attempts

profiling results - baseline implementation
254.6588744580513
profiling results - np implementation
239.80646516697016

Silk-Road region, 1-year cutout

1 attempt

profiling results - baseline implementation
125.40555191691965
profiling results - np implementation
124.65340708300937

5 attempts

profiling results - baseline implementation
598.7846881250152
profiling results - np implementation
579.6119187080767

@ekatef
Copy link
Member Author

ekatef commented Dec 28, 2023

Have added a stretch approach, improved docstrings, removed hardcoding in the names of parameters to be projected and revised Snakemake interfaces.

From the functional perspective, I think that is good time to stop now and finalise the implementation. In particular:

  • validity checks should be added to avoid incorrect results due to some mismatch in the dates;
  • climate inputs should be added to the databundle: we have to discuss it from the usability perspective;
  • a test should be added to cover this part, as currently CI simply ignores this part being kind of over-optimistic;
  • notebooks, examples and viz approaches are needed, and best way to develop them is to start work on some meaningful applications.

@ekatef
Copy link
Member Author

ekatef commented Jan 11, 2024

As discussed during the weekly meeting:

  1. we can keep the CMIP6 databundle as an option additional to the main workflow;
  2. given high quality and moderate size of CMIP6 inputs, we can adding all the datasets which may be relevant to be used in building projections.

@ekatef
Copy link
Member Author

ekatef commented Jan 20, 2024

A possible approach to improve performance may be making calculations on the whole grid instead of modifying the array point-by-point (this part, in particular). Some nice hints on that can be found here. However, that leads to a performance-memory trade-off, for which chunks magic of dask can help, like suggests this approach.

@ekatef
Copy link
Member Author

ekatef commented Jan 20, 2024

@davide-f my feeling is that this PR may be ready for the next review round. There is still some technical work to be done: datakit preparation, adding a test and probably performance improvements. But it would be great to check that the general design and interfaces look reasonable. I'd be very grateful for your opinion on that 🙂

Comment on lines +100 to +110

cmip6_region = cmip6_xr.sel(
lat=slice(
min(cutout_xr.coords["y"].values) - d_pad,
max(cutout_xr.coords["y"].values) + d_pad,
),
lon=slice(
min(cutout_xr.coords["x"].values) - d_pad,
max(cutout_xr.coords["x"].values) + d_pad,
),
)
Copy link
Member

@davide-f davide-f Jan 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nice! I'm wondering, instead of doing min/max, may it be that the first element of the list is also the lowest?
so like cutout_xr.coords["y"].values[0] is the lowest and cutout_xr.coords["y"].values[-1] is the max value?

if so we could avoid the max/mins in the whole document.
Maybe we could use some utility functions for that.

I'm wondering, but is this cmip6 compatible with the object cutout in atlite?
As it is an xarray maybe?

is we load it as a cutout, we can use the bounds function of the cutout object and this facilitates a lot the whole style.
This is a guess though.

Copying here the link to the bounds function and the atlite file, that can give some ideas for the coding style :)
https://github.com/PyPSA/atlite/blob/a71bca2f6f7221b70dbbc371752aef56d937b1ff/atlite/cutout.py#L314

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! A really great idea to align with atlite approaches.

As for the particular bounds calculation, I have investigated this idea, but would say that it doesn't feel like a clear approach to use "magic" index numbers. Agree that it's quite common. But personally, I'm not capable to keep in mind which integer means what, and that is usually quite tricky to deal with the code using such conventions. Not sure it's justifiable but performance gains, as currently performance bottlenecks seem to be in spatial calculations.

What would potentially be interesting to adopt from atlite is work with data chunks. It may be probably a good idea to investigate it for increasing the performance. What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmm maybe it would be good to at least create here an auxiliary function that, having as input an xarray, returns the boundaries like (xmin, xmax, ymin, ymax) or alike.
having that function may help clarifying the readability regardless of their implementation; although, I believe the xmin may be the first value of x and the last one the maximum.
Likewise, an auxiliary function that calculate the dx and dy may be useful although trivial: the advantage may be readability and manutentibility.

What do you think?

Copy link
Member

@davide-f davide-f left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @ekatef ,

The PR is in great shape! I added few stylish comments, but the document is in great shape 🗡️
There is no automatic download here but I guess this is expected for the moment right?

# TODO read from the cutout file instead of hardcoding
dx_new = 0.3

newlon = np.arange(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for checking, by looking at the atlite code it raised few ideas:

  1. is it possible to load the cmip file as an atlite cutout? this doesn't mean to use the cutout for atlite but simply leverage on the utility functions it has
  2. if not, the cutout.py file gives quite some hints to improve the functionalities if the task is actually time consuming. For example, the minimum value may always be the first of the list while the maximum the last one (to verify though)

Link to the cutout.py file : https://github.com/PyPSA/atlite/blob/a71bca2f6f7221b70dbbc371752aef56d937b1ff/atlite/cutout.py#L314

scripts/build_climate_projections.py Show resolved Hide resolved
CMIP6 climate projections loaded as xarray
month: float
A month value to be considered further for the morphing procedure
year0: integer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are year 0 and 1 like the base year and the prediction year?
the name could be more explicit

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, revised. Are they more clear now?

month=k_month,
year0=year0,
year1=year1,
years_window=5,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may this be a parameter?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It must be, I have forgotten to use it there. Fixed now. Thanks :)

Snakefile Outdated
@@ -363,6 +365,33 @@ if config["enable"].get("build_cutout", False):
"scripts/build_cutout.py"


if config["enable"].get("modify_cutout", False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could call the option climate_projection_cutout or something like that?

I was considering if adding an option different from build_cutout may be necessary, but I believe so to avoid overwriting existing "original" cutouts

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, agree that it would be nice to keep the main workflow as safe as possible from interactions with these additions.

The option name revised. Have I understood you idea correctly? :)

@ekatef
Copy link
Member Author

ekatef commented Mar 9, 2024

Hello @ekatef ,

The PR is in great shape! I added few stylish comments, but the document is in great shape 🗡️ There is no automatic download here but I guess this is expected for the moment right?

Hello @davide-f! Thanks a lot for the review :D That is fantastic to have your support.

Have revised the code, trying to address you comments. Not sure I have get all the points right (sorry for that!), and happy to discuss further in any form which is convenient for you.

Also, added a number of checks to prepare_cmip6 with an intention is to eliminate the most annoying errors which can happen due to dates mismatch (like when there is not enough years for the requested period, and what is supposed to be a multi-annual average is calculated on data for one-two years only).

Absolutely agree that it would be perfect to apply some approaches from atlite to improve the performance. My feeling is that it may be a good approach to parallelise computations using chunks, but I haven't yet fully understood atlite implementation. Would be very grateful for any suggestions on that :)

@ekatef ekatef marked this pull request as ready for review March 9, 2024 21:46
@ekatef
Copy link
Member Author

ekatef commented Mar 9, 2024

Ah, regarding the inputs: they are not addressed yet. My feeling is that it would be great to load to zenodo datasets for the major parameters which may be relevant for energy modeling. If using the simplest shift calculation approach, amount of the data needed is about 3GB for the whole globe. Adding a stretch transformation requires additional 6GB for each parameter (fields of min and max parameter values).

The whole wish-list may look like:

  • air temperature;
  • wind speed;
  • precipitation (mainly needed as a predictor for runoff);
  • irradiation or cloudiness (accuracy is not really great in general climate models, but there is not much alternatives anyway).

Probably, we can start with temperature only, and adjust the approaches along the way. What do you think?

Copy link
Member

@davide-f davide-f left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, I think the PR is in great shape :) minor comments.
P.S. there are still a lot of plots here that may be removed in the finalized version.

Comment on lines +100 to +110

cmip6_region = cmip6_xr.sel(
lat=slice(
min(cutout_xr.coords["y"].values) - d_pad,
max(cutout_xr.coords["y"].values) + d_pad,
),
lon=slice(
min(cutout_xr.coords["x"].values) - d_pad,
max(cutout_xr.coords["x"].values) + d_pad,
),
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmm maybe it would be good to at least create here an auxiliary function that, having as input an xarray, returns the boundaries like (xmin, xmax, ymin, ymax) or alike.
having that function may help clarifying the readability regardless of their implementation; although, I believe the xmin may be the first value of x and the last one the maximum.
Likewise, an auxiliary function that calculate the dx and dy may be useful although trivial: the advantage may be readability and manutentibility.

What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants