Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Challenge 11 - Atmospheric Composition Dataset Explorer #2

Open
EsperanzaCuartero opened this issue Feb 24, 2023 · 8 comments
Open
Labels
Stream 1 Software Development for Earth Sciences

Comments

@EsperanzaCuartero
Copy link
Contributor

EsperanzaCuartero commented Feb 24, 2023

Challenge 11 - Atmospheric Composition Dataset Explorer

Stream 1 - Software Development for Earth Sciences

Goal

Develop an application which will be capable of creating atmospheric composition diagnostics plots on-demand. The minimum outcome would be an application which is able to generate some of the plots in the table below.
A more ambitious target is to develop a generic framework which would allow rapid prototyping of new products. Such a system would comprise data selection, post-processing, aggregation and visualization elements.
We have some ideas on how to build such an application (see Skills required) but we invite candidates to propose their own ideas on the implementation details.

Mentors and skills

  • Mentors: Miha Razinger, Anna Agusti-Panareda, James Varndell, Mark Parrington, Antje Inness, Frederic Chevallier
  • Skills required:
    • Python data analysis and visualization libraries (geopandas, xarray, matplotlib, cartopy, plotly, shapely, climetlab, dask, zarr ...)
    • User interface/dashboard libraries (Jupyter, dash, Voila, streamlit ...)

Note: Only nationals from European Union (EU) Member States and countries associated with EU’s Space Programme (currently Iceland and Norway) are eligible to participate (see Terms and Conditions).


Challenge description

Based on the developments and the experiences gained during the last year's ESoWC project called Wildfire Emission Explorer. The aim of the project was to create an application which allows the creation of wildfire emission plots on demand.

You can watch the final presentation here (skip to 8:50 if you would just like to see the demo)

The project code is here.

Now, we would like to extend the same idea to other CAMS atmospheric composition datasets, primarily to CAMS global greenhouse gas fluxes dataset and CAMS atmospheric composition reanalysis which are both available from the Atmosphere Data Store (ADS): https://ads.atmosphere.copernicus.eu](https://ads.atmosphere.copernicus.eu/#!/home)

The data access method and data format will be different compared to last year's project, but some plots that we would like to create are similar.

Expected outcomes

  • User interface:
    A user should be able to select input dataset and time resolution (daily, monthly, yearly), plot type, date period for the reference period, date period of the specific episode and geographical domain, i.e. bounding box, a country from a drop-down lists, a specific region by using an interactive user interface.
  • API Service:
    Ideally, an API which would offer the same functionality as the interactive application would also be developed.
  • Optimal data cache:
    Relaying on the ADS might not be the best option for an interactive application. As the proposed datasets subsets are not large, we might consider cashing the data. If time permits, we would also like to explore what is the optimal data format and data organization for sub-setting and aggregation performance.

Examples of current plots

The aim of this project is to create an application which would simplify and speed-up creation of various atmospheric composition diagnostics plots based on a subset of a dataset.

Plot example Dataset Processing steps
C3S_indicators_GHG_fluxes_Fig4_apr22_branded Annual CO2 flux (MtCO2/year) from the ‘agriculture, forestry and other land use’ (AFOLU) sector in ten large parties to the United Nations Framework Convention on Climate Change (UNFCCC), estimated by two CAMS inversions: in-situ-driven (blue) and satellite-driven (orange), with uncertainty[2] for each flux (light shading). Note that the scale of the y-axis varies by party. Positive values indicate that the party is a source and negative values indicate that the party is a sink for CO2. Data source: CAMS greenhouse gas flux data. Credit: CAMS/ECMWF/LSCE cams-global-greenhouse-gas-inversion 1) Select the target countries and the inversion types to visualize 2) Retrieve the global CAMS inversion data 3) Select the fraction of pixels corresponding to the managed lands of the target countries aggregate the CAMS values within each country and compute the annual totals 4) Get the associated time-varying uncertainty from a separate database 5) Plot the time series 6) Option to superimpose the time series of the official national reports (OECD countries only) or the fossil fuel emissions
CAMS_tcno2_ts image-2023-2-14_19-17-42 cams-global-reanalysis-eac4 or cams-global-reanalysis-eac4-monthly 1) Calculate monthly means (or monthly mean anomalies for a reference period) 2) Alternative, extract daily data 3) Plot timeseries of data over a selected geographical region 4) This should be possible for surface fields, total column fields or fields on pressure levels 5) Option to superimpose curves of several species in one plot (e.g. different aerosol species)
cams_hovmoeller_o3 cams-global-reanalysis-eac4 or cams-global-reanalysis-eac4-monthly 1) Vertical hovmoeller plots of values or anomalies for selected reference period 2) Should work for daily data or monthly means 3) Download pressure level data for selected area/country and period 4) Plot vertical hovmoeller plots of values or anomalies
cams_lat_time_o3 Lat-time or lon-time hovmoeller plots of values or anomalies for selected reference period cams-global-reanalysis-eac4 or cams-global-reanalysis-eac4-monthly 1) Select type of hovmoeller plot 2) Download data for selected area/country and period 3) This could be total column, surface or pressure level data 4) Should work of daily data or monthly means 5) Produce plots of values or anomalies
@EsperanzaCuartero EsperanzaCuartero added the Stream 1 Software Development for Earth Sciences label Feb 24, 2023
@EsperanzaCuartero EsperanzaCuartero changed the title Challenge 2 -Atmospheric Composition Dataset Explorer Challenge 11 -Atmospheric Composition Dataset Explorer Feb 27, 2023
@miha-at-ecmwf miha-at-ecmwf changed the title Challenge 11 -Atmospheric Composition Dataset Explorer Challenge 11 - Atmospheric Composition Dataset Explorer Feb 28, 2023
@elisaliv
Copy link

elisaliv commented Apr 6, 2023

Hi! I'm drafting a proposal for this challenge.
Thinking about prioritization, would you say it is more important to develop optimal data cache (as listed in the expected outcomes) or to make the framework as generic as possible to also use it with other data products (the "more ambitious target" described in the Goal introduction)?
Thanks in advance.

@miha-at-ecmwf
Copy link

Hello @elisaliv and thank you for your interest in this challenge.

My advice would be to play to your strengths. Unless you already have some experience on caching multidimensional and heterogeneous datasets, it's maybe better to focus on making the framework generic.

Looking forward to your proposal.

@elisaliv
Copy link

Hi @miha-at-ecmwf, thank you for your reply!

@luigibrancati and I have another question: what do you mean more precisely by "making the framework generic to prototype new data products"?

Here are some ideas we had:

  1. Having a high-level APIs for filtering, post-processing and aggregation of generic time-series weather data, to apply the most used data transformations
  2. Having a high-level APIs to generate 'standard' weather reports within provided time and spatial ranges

I guess step 1 is necessary to also develop step 2. Is that correct?
And is this what you'd like to achieve with a generic framework?

Thank you again.

@timometz
Copy link

Hi,

I have 3 questions regarding Challenge 11:

  1. Should we re-use the code of the GUI/API developed last year as much as possible? I.e. to which extend can we build on that code?
  2. In which sense should we use caching of the data? Should caching be used in a session to avoid re-downloading the data multiple times within one session, or do you plan to save the CAMS dataset with a coarser temporal or spatial resolution to be quicker downloaded?

Similar to elisaliv's question:
3. What does “New product” refer to in the goal description? Does it mean that the code should be easily adaptable for new datasets other than CAMS, or does it refer to higher flexibility in the creation of new plot types as wished by a user?

Looking forward to your answer and thank you in advance!

best
Timo

@miha-at-ecmwf
Copy link

Hi @elisaliv, @luigibrancati,

The idea is to make the building blocks (GUI, data retrieval, data homogenization, data slicing and sub-setting, aggregation, visualization of results ...) of the application as modular as possible with clean interfaces between them.

So if we need to use a new dataset in the future, we just have to write new data acquisition and (potentially) data homogenization code. If we wanted a new plot type, statistical methods and visualization code would have to be updated ...

If you want to see additional examples of the types of plots we regularly create, please look at the CAMS validation reports, this is the latest one:
https://atmosphere.copernicus.eu/sites/default/files/publications/32_CAMS2_82_2022SC1_D82.1.1.5-SON2022.pdf

For even more inspiration (with source code!), check the Climate Data Store applications' collection:
https://cds.climate.copernicus.eu/cdsapp#!/search?type=application

Miha

@miha-at-ecmwf
Copy link

Dear @timometz,

Thank you for your questions.

  1. You don't have to reuse the code. However, the last year's solution should give you a basic understanding about what we are hoping to achieve with this challenge.
  2. It's a bit of an open topic, we welcome your ideas how would you make the applications as responsive as possible. We will make a development server available for the duration of the project, so the benefits of a global server-side cache could be explored.
  3. We will limit ourselves to CAMS data. The application should be flexible enough that adding a new dataset or a new plot type (i.e. not listed in the table of plot examples), would not be too complicated. See also my last answer to @elisaliv.

Miha

@luigibrancati
Copy link

Hello @miha-at-ecmwf, what's the deadline for the proposal? I see 12 April, but not time and timezone specified

@trakasa
Copy link
Contributor

trakasa commented Apr 11, 2023

@luigibrancati submission deadline is 12 April 2023 (23:59 UTC).
It's a bit hidden in the T&Cs Article 4 - https://codeforearth.ecmwf.int/terms-and-conditions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Stream 1 Software Development for Earth Sciences
Projects
None yet
Development

No branches or pull requests

10 participants