From 67325638d6501bb3414ea782a7afdde0011673b8 Mon Sep 17 00:00:00 2001 From: Ray Bell Date: Sun, 9 May 2021 02:40:58 -0400 Subject: [PATCH 1/9] list API in alphabetical order --- docs/source/api.rst | 47 +-- docs/source/index.rst | 2 +- docs/source/quick-start.ipynb | 540 +++++----------------------------- 3 files changed, 98 insertions(+), 491 deletions(-) diff --git a/docs/source/api.rst b/docs/source/api.rst index a351298a..4de831b5 100644 --- a/docs/source/api.rst +++ b/docs/source/api.rst @@ -16,15 +16,16 @@ Correlation Metrics .. autosummary:: :toctree: api/ + effective_sample_size pearson_r pearson_r_p_value pearson_r_eff_p_value + linslope + r2 spearman_r spearman_r_p_value spearman_r_eff_p_value - effective_sample_size - r2 - linslope + Distance Metrics ~~~~~~~~~~~~~~~~ @@ -32,13 +33,13 @@ Distance Metrics .. autosummary:: :toctree: api/ - me - rmse - mse mae + mape + me median_absolute_error + mse + rmse smape - mape Probabilistic Metrics @@ -55,12 +56,13 @@ Currently, most of our probabilistic metrics are ported over from crps_ensemble crps_gaussian crps_quadrature - threshold_brier_score - rps - rank_histogram discrimination + rank_histogram reliability roc + rps + threshold_brier_score + Contingency-based Metrics ------------------------- @@ -87,19 +89,20 @@ Dichotomous-Only (yes/no) Metrics .. autosummary:: :toctree: api/ - Contingency.hits - Contingency.misses - Contingency.false_alarms - Contingency.correct_negatives Contingency.bias_score - Contingency.hit_rate - Contingency.false_alarm_ratio - Contingency.false_alarm_rate - Contingency.success_ratio - Contingency.threat_score + Contingency.correct_negatives Contingency.equit_threat_score + Contingency.false_alarm_rate + Contingency.false_alarm_ratio + Contingency.false_alarms + Contingency.hit_rate + Contingency.hits + Contingency.misses Contingency.odds_ratio Contingency.odds_ratio_skill_score + Contingency.success_ratio + Contingency.threat_score + Multi-Category Metrics ~~~~~~~~~~~~~~~~~~~~~~ @@ -108,11 +111,12 @@ Multi-Category Metrics :toctree: api/ Contingency.accuracy + Contingency.gerrity_score Contingency.heidke_score Contingency.peirce_score - Contingency.gerrity_score roc + Comparative ----------- @@ -121,8 +125,9 @@ Tests to compare whether one forecast is significantly better than another one. .. autosummary:: :toctree: api/ - sign_test mae_test + sign_test + Resampling ---------- diff --git a/docs/source/index.rst b/docs/source/index.rst index bc2367e6..03244e99 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -39,7 +39,7 @@ You can also install the bleeding edge (pre-release versions) by running: .. code-block:: bash - pip install git+https://github.com/xarray-contrib/xskillscore@master --upgrade + pip install git+https://github.com/xarray-contrib/xskillscore@main --upgrade **Getting Started** diff --git a/docs/source/quick-start.ipynb b/docs/source/quick-start.ipynb index d0e10e6e..e9a480c4 100644 --- a/docs/source/quick-start.ipynb +++ b/docs/source/quick-start.ipynb @@ -6,7 +6,7 @@ "source": [ "# Quick Start\n", "\n", - "See the API for more detailed information, examples, formulas, and references for each function." + "See the [API](https://xskillscore.readthedocs.io/en/stable/api.html) for more detailed information, examples, formulas, and references for each function." ] }, { @@ -26,7 +26,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Here, we generate some sample gridded data. Our data has three time steps, and a 4x5 latitude/longitude grid. `obs` replicates some verification data and `fct` some forecast (e.g. from a statistical or dynamical model)." + "Here, we generate some sample gridded data. Our data has three time steps, and a 4x5 latitude/longitude grid. `obs` denotes some verification data (sometimes termed `y`) and `fct` some forecast data (e.g. from a statistical or dynamical model; sometimes termed `yhat`)." ] }, { @@ -55,29 +55,29 @@ "source": [ "## Deterministic Metrics\n", "\n", - "`xskillscore` offers a suite of correlation-based and distance-based deterministic metrics.\n", + "`xskillscore` offers a suite of correlation-based and distance-based deterministic metrics:\n", "\n", "### Correlation-Based \n", "\n", + "* Effective Sample Size (`effective_sample_size`)\n", "* Pearson Correlation (`pearson_r`)\n", - "* Pearson Correlation p value (`pearson_r_p_value`)\n", "* Pearson Correlation effective p value (`pearson_r_eff_p_value`)\n", + "* Pearson Correlation p value (`pearson_r_p_value`)\n", + "* Slope of Linear Fit (`linslope`)\n", "* Spearman Correlation (`spearman_r`)\n", - "* Spearman Correlation p value (`spearman_r_p_value`)\n", "* Spearman Correlation effective p value (`spearman_r_eff_p_value`)\n", - "* Effective Sample Size (`effective_sample_size`)\n", - "* Slope of Linear Fit (`linslope`)\n", - "* Coefficient of Determination (`r2`)\n", + "* Spearman Correlation p value (`spearman_r_p_value`)\n", "\n", "### Distance-Based\n", "\n", + "* Coefficient of Determination (`r2`)\n", + "* Mean Absolute Error (`mae`)\n", + "* Mean Absolute Percentage Error (`mape`)\n", "* Mean Error (`me`)\n", - "* Root Mean Squared Error (`rmse`)\n", "* Mean Squared Error (`mse`)\n", - "* Mean Absolute Error (`mae`)\n", "* Median Absolute Error (`median_absolute_error`)\n", - "* Symmetric Mean Absolute Percentage Error (`smape`)\n", - "* Mean Absolute Percentage Error (`mape`)" + "* Root Mean Squared Error (`rmse`)\n", + "* Symmetric Mean Absolute Percentage Error (`smape`)" ] }, { @@ -169,7 +169,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "All deterministic metrics except for `pearson_r_eff_p_value`, `spearman_r_eff_p_value`, and `effective_sample_size` can take the kwarg `weights=...`. `weights` should be a DataArray of the size of the reduced dimension (e.g., if time is being reduced it should be of length 3 in our example).\n", + "All deterministic metrics except for `effective_sample_size`, `pearson_r_eff_p_value` and `spearman_r_eff_p_value` can take the kwarg `weights=...`. `weights` should be a DataArray of the size of the reduced dimension (e.g., if time is being reduced it should be of length 3 in our example).\n", "\n", "Weighting is a common practice when working with observations and model simulations of the Earth system. When working with rectilinear grids, one can weight the data by the cosine of the latitude, which is maximum at the equator and minimum at the poles (as in the below example). More complicated model grids tend to be accompanied by a cell area varaible, which could also be passed into this function." ] @@ -253,7 +253,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "You can also pass the optional boolean kwarg `skipna=...`. If `True`, ignore any NaNs (pairwise) in `a` and `b` when computing the result. If `False`, return NaNs anywhere there are pairwise NaNs." + "You can also pass the optional boolean kwarg `skipna`. If `True`, ignore any NaNs (pairwise) in `obs` and `fct` when computing the result. If `False`, return NaNs anywhere there are pairwise NaNs." ] }, { @@ -341,25 +341,25 @@ "source": [ "## Probabilistic Metrics\n", "\n", - "`xskillscore` offers a suite of probabilistic metrics.\n", + "`xskillscore` offers a suite of probabilistic metrics:\n", "\n", - "* Continuous Ranked Probability Score with the ensemble distribution (`crps_ensemble`)\n", + "* Brier Score (`brier_score`)\n", + "* Brier scores of an ensemble for exceeding given thresholds (`threshold_brier_score`)\n", "* Continuous Ranked Probability Score with a Gaussian distribution (`crps_gaussian`)\n", "* Continuous Ranked Probability Score with numerical integration of the normal distribution (`crps_quadrature`)\n", - "* Brier scores of an ensemble for exceeding given thresholds (`threshold_brier_score`)\n", - "* Brier Score (`brier_score`)\n", - "* Ranked Probability Score (`rps`)\n", + "* Continuous Ranked Probability Score with the ensemble distribution (`crps_ensemble`)\n", "* Discrimination (`discrimination`)\n", "* Rank Histogram (`rank_histogram`)\n", - "* Reliability (`reliability`)\n", - "* Receiver operating characteristic (ROC) (`roc`)" + "* Ranked Probability Score (`rps`)\n", + "* Receiver operating characteristic (`roc`)\n", + "* Reliability (`reliability`)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "We now create some data with an ensemble member dimension. In this case, we envision an ensemble forecast with multiple members to validate against our theoretical observations." + "We now create some data with an ensemble member dimension. In this case, we envision an ensemble forecast with multiple members to validate against our theoretical observations:" ] }, { @@ -664,423 +664,12 @@ "metadata": {}, "outputs": [], "source": [ - "dichotomous_category_edges = np.array([0, 0.5, 1]) # \"dichotomous\" mean two-category\n", - "dichotomous_contingency = xs.Contingency(obs, fct,\n", - " dichotomous_category_edges,\n", - " dichotomous_category_edges,\n", - " dim=['lat','lon'])\n", - "dichotomous_contingency_table = dichotomous_contingency.table" - ] - }, - { - "cell_type": "code", - "execution_count": 25, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "\n", - "
<xarray.DataArray 'histogram_observations_forecasts' (time: 3, observations_category: 2, forecasts_category: 2)>\n",
-       "array([[[5, 6],\n",
-       "        [6, 3]],\n",
-       "\n",
-       "       [[6, 5],\n",
-       "        [4, 5]],\n",
-       "\n",
-       "       [[5, 5],\n",
-       "        [4, 6]]])\n",
-       "Coordinates:\n",
-       "  * time                          (time) object 2000-01-01 00:00:00 ... 2000-...\n",
-       "    observations_category_bounds  (observations_category) <U10 '[0.0, 0.5)' '...\n",
-       "    forecasts_category_bounds     (forecasts_category) <U10 '[0.0, 0.5)' '[0....\n",
-       "  * observations_category         (observations_category) int64 1 2\n",
-       "  * forecasts_category            (forecasts_category) int64 1 2
" - ], - "text/plain": [ - "\n", - "array([[[5, 6],\n", - " [6, 3]],\n", - "\n", - " [[6, 5],\n", - " [4, 5]],\n", - "\n", - " [[5, 5],\n", - " [4, 6]]])\n", - "Coordinates:\n", - " * time (time) object 2000-01-01 00:00:00 ... 2000-...\n", - " observations_category_bounds (observations_category) Date: Sun, 9 May 2021 02:53:27 -0400 Subject: [PATCH 2/9] me eq fix --- xskillscore/core/deterministic.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/xskillscore/core/deterministic.py b/xskillscore/core/deterministic.py index 2a6b1bcb..0b26a8b1 100644 --- a/xskillscore/core/deterministic.py +++ b/xskillscore/core/deterministic.py @@ -801,7 +801,7 @@ def me(a, b, dim=None, weights=None, skipna=False, keep_attrs=False): """Mean Error. .. math:: - \\mathrm{ME} = \\frac{1}{n}\\sum_{i=1}^{n}\\a - b + \\mathrm{ME} = \\frac{1}{n}\\sum_{i=1}^{n}(a_{i} - b_{i}) Parameters ---------- From 3387da910a831fee8b56bbb42d3dd328ddfbaccb Mon Sep 17 00:00:00 2001 From: Ray Bell Date: Sun, 9 May 2021 17:49:59 -0400 Subject: [PATCH 3/9] more info on contingency methods --- docs/source/api.rst | 2 +- docs/source/quick-start.ipynb | 19 +++++++++++-------- xskillscore/core/contingency.py | 8 ++++++++ 3 files changed, 20 insertions(+), 9 deletions(-) diff --git a/docs/source/api.rst b/docs/source/api.rst index 4de831b5..5a1126a7 100644 --- a/docs/source/api.rst +++ b/docs/source/api.rst @@ -21,7 +21,6 @@ Correlation Metrics pearson_r_p_value pearson_r_eff_p_value linslope - r2 spearman_r spearman_r_p_value spearman_r_eff_p_value @@ -38,6 +37,7 @@ Distance Metrics me median_absolute_error mse + r2 rmse smape diff --git a/docs/source/quick-start.ipynb b/docs/source/quick-start.ipynb index e9a480c4..7f6870d4 100644 --- a/docs/source/quick-start.ipynb +++ b/docs/source/quick-start.ipynb @@ -345,13 +345,13 @@ "\n", "* Brier Score (`brier_score`)\n", "* Brier scores of an ensemble for exceeding given thresholds (`threshold_brier_score`)\n", - "* Continuous Ranked Probability Score with a Gaussian distribution (`crps_gaussian`)\n", + "* Continuous Ranked Probability Score with a gaussian distribution (`crps_gaussian`)\n", "* Continuous Ranked Probability Score with numerical integration of the normal distribution (`crps_quadrature`)\n", "* Continuous Ranked Probability Score with the ensemble distribution (`crps_ensemble`)\n", "* Discrimination (`discrimination`)\n", "* Rank Histogram (`rank_histogram`)\n", "* Ranked Probability Score (`rps`)\n", - "* Receiver operating characteristic (`roc`)\n", + "* Receiver Operating Characteristic (`roc`)\n", "* Reliability (`reliability`)" ] }, @@ -782,17 +782,20 @@ "* Accuracy (`accuracy`)\n", "* Bias Score (`bias_score`)\n", "* Equitable Threat Score (`equit_threat_score`)\n", - "* False Alarm Ratio (`false_alarm_ratio`)\n", - "* False Alarm Rate (`false_alarm_rate`)\n", + "* False Alarms / False Positives (`false_alarms`)\n", + "* False Alarm Ratio / False Discovery Rate (`false_alarm_ratio`)\n", + "* False Alarm Rate / False Positive Rate / Fall-out (`false_alarm_rate`)\n", "* Gerrity Score (`gerrity_score`)\n", "* Heidke Score (`heidke_score`)\n", - "* Hit Rate (`hit_rate`)\n", + "* Hit Rate / Recall / Sensitivity / True Positive Rate (`hit_rate`)\n", + "* Hits / True Positives (`hits`)\n", + "* Misses / False Negatives (`misses`)\n", "* Odds Ratio (`odds_ratio`)\n", "* Odds Ratio Skill Score (`odds_ratio_skill_score`)\n", "* Peirce Score (`peirce_score`)\n", - "* Receiver operating characteristic (`roc`)\n", - "* Success Ratio (`success_ratio`)\n", - "* Threat Score (`threat_score`)\n", + "* Receiver Operating Characteristic (`roc`)\n", + "* Success Ratio / Precision / Positive Predictive Value (`success_ratio`)\n", + "* Threat Score / Critical Success Index (`threat_score`)\n", "\n", "Below, we share a few examples of these in action:" ] diff --git a/xskillscore/core/contingency.py b/xskillscore/core/contingency.py index 98f9f7c7..45ca8f7a 100644 --- a/xskillscore/core/contingency.py +++ b/xskillscore/core/contingency.py @@ -399,6 +399,10 @@ def hit_rate(self, yes_category=2): xarray.Dataset or xarray.DataArray An array containing the hit rate(s) + See Also + -------- + sklearn.metrics.recall_score + References ---------- https://www.cawcr.gov.au/projects/verification/#Contingency_table @@ -479,6 +483,10 @@ def success_ratio(self, yes_category=2): xarray.Dataset or xarray.DataArray An array containing the success ratio(s) + See Also + -------- + sklearn.metrics.precision_score + References ---------- https://www.cawcr.gov.au/projects/verification/#Contingency_table From 7c05494605a9ce4995d8da4ba375de2d52e92a5d Mon Sep 17 00:00:00 2001 From: Ray Bell Date: Sun, 9 May 2021 21:08:48 -0400 Subject: [PATCH 4/9] more contingenc info --- docs/source/quick-start.ipynb | 2 +- xskillscore/core/contingency.py | 8 ++++++++ 2 files changed, 9 insertions(+), 1 deletion(-) diff --git a/docs/source/quick-start.ipynb b/docs/source/quick-start.ipynb index 7f6870d4..ad1cde99 100644 --- a/docs/source/quick-start.ipynb +++ b/docs/source/quick-start.ipynb @@ -786,7 +786,7 @@ "* False Alarm Ratio / False Discovery Rate (`false_alarm_ratio`)\n", "* False Alarm Rate / False Positive Rate / Fall-out (`false_alarm_rate`)\n", "* Gerrity Score (`gerrity_score`)\n", - "* Heidke Score (`heidke_score`)\n", + "* Heidke Score / Cohan Kappa (`heidke_score`)\n", "* Hit Rate / Recall / Sensitivity / True Positive Rate (`hit_rate`)\n", "* Hits / True Positives (`hits`)\n", "* Misses / False Negatives (`misses`)\n", diff --git a/xskillscore/core/contingency.py b/xskillscore/core/contingency.py index 45ca8f7a..ac66eb56 100644 --- a/xskillscore/core/contingency.py +++ b/xskillscore/core/contingency.py @@ -633,6 +633,10 @@ def accuracy(self): xarray.Dataset or xarray.DataArray An array containing the accuracy score(s) + See Also + -------- + sklearn.metrics.accuracy_score + References ---------- https://www.cawcr.gov.au/projects/verification/#Contingency_table @@ -662,6 +666,10 @@ def heidke_score(self): xarray.Dataset or xarray.DataArray An array containing the Heidke score(s) + See Also + -------- + sklearn.metrics.cohen_kappa_score + References ---------- https://www.cawcr.gov.au/projects/verification/#Contingency_table From 553d493127e46b93d6db1c4cf543c03c9775096b Mon Sep 17 00:00:00 2001 From: Ray Bell Date: Sun, 9 May 2021 21:28:24 -0400 Subject: [PATCH 5/9] lint --- ci/docs_notebooks.yml | 5 +++++ docs/source/quick-start.ipynb | 2 +- 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/ci/docs_notebooks.yml b/ci/docs_notebooks.yml index 6e4b78db..6625a3b8 100644 --- a/ci/docs_notebooks.yml +++ b/ci/docs_notebooks.yml @@ -23,6 +23,11 @@ dependencies: - sphinx - sphinxcontrib-napoleon - sphinx_rtd_theme + - black + - doc8 + - isort + - flake8 + - pre-commit - pip - pip: - sphinx_autosummary_accessors diff --git a/docs/source/quick-start.ipynb b/docs/source/quick-start.ipynb index ad1cde99..66ae9e3e 100644 --- a/docs/source/quick-start.ipynb +++ b/docs/source/quick-start.ipynb @@ -1380,4 +1380,4 @@ }, "nbformat": 4, "nbformat_minor": 4 -} \ No newline at end of file +} From e31bf1c82199ad278e2b29a758af672c1df19192 Mon Sep 17 00:00:00 2001 From: Ray Bell Date: Tue, 11 May 2021 00:08:25 -0400 Subject: [PATCH 6/9] typo --- CHANGELOG.rst | 4 ++++ docs/source/quick-start.ipynb | 4 ++-- 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/CHANGELOG.rst b/CHANGELOG.rst index 5565599f..6f21984e 100644 --- a/CHANGELOG.rst +++ b/CHANGELOG.rst @@ -5,6 +5,10 @@ Changelog History xskillscore v0.0.21 (2021-XX-XX) -------------------------------- +Documentation +~~~~~~~~~~~~~ +- Added more info in ``quick-start.ipynb`` (:pr:`316`) `Ray Bell`_. + xskillscore v0.0.20 (2021-05-08) -------------------------------- diff --git a/docs/source/quick-start.ipynb b/docs/source/quick-start.ipynb index 66ae9e3e..f37cd367 100644 --- a/docs/source/quick-start.ipynb +++ b/docs/source/quick-start.ipynb @@ -786,7 +786,7 @@ "* False Alarm Ratio / False Discovery Rate (`false_alarm_ratio`)\n", "* False Alarm Rate / False Positive Rate / Fall-out (`false_alarm_rate`)\n", "* Gerrity Score (`gerrity_score`)\n", - "* Heidke Score / Cohan Kappa (`heidke_score`)\n", + "* Heidke Score / Cohen's Kappa (`heidke_score`)\n", "* Hit Rate / Recall / Sensitivity / True Positive Rate (`hit_rate`)\n", "* Hits / True Positives (`hits`)\n", "* Misses / False Negatives (`misses`)\n", @@ -1380,4 +1380,4 @@ }, "nbformat": 4, "nbformat_minor": 4 -} +} \ No newline at end of file From df0f9555fa7b6b0b5093d83e3ea83344a909ada0 Mon Sep 17 00:00:00 2001 From: Ray Bell Date: Tue, 11 May 2021 00:16:05 -0400 Subject: [PATCH 7/9] lint --- docs/source/quick-start.ipynb | 2680 ++++++++++++++++----------------- 1 file changed, 1299 insertions(+), 1381 deletions(-) diff --git a/docs/source/quick-start.ipynb b/docs/source/quick-start.ipynb index f37cd367..ae227a3a 100644 --- a/docs/source/quick-start.ipynb +++ b/docs/source/quick-start.ipynb @@ -1,1383 +1,1301 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Quick Start\n", - "\n", - "See the [API](https://xskillscore.readthedocs.io/en/stable/api.html) for more detailed information, examples, formulas, and references for each function." - ] - }, - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "import numpy as np\n", - "import xarray as xr\n", - "import xskillscore as xs\n", - "import matplotlib.pyplot as plt\n", - "np.random.seed(seed=42)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Here, we generate some sample gridded data. Our data has three time steps, and a 4x5 latitude/longitude grid. `obs` denotes some verification data (sometimes termed `y`) and `fct` some forecast data (e.g. from a statistical or dynamical model; sometimes termed `yhat`)." - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [], - "source": [ - "obs = xr.DataArray(\n", - " np.random.rand(3, 4, 5),\n", - " coords=[\n", - " xr.cftime_range(\"2000-01-01\", \"2000-01-03\", freq=\"D\"),\n", - " np.arange(4),\n", - " np.arange(5),\n", - " ],\n", - " dims=[\"time\", \"lat\", \"lon\"],\n", - " name='var'\n", - " )\n", - "fct = obs.copy()\n", - "fct.values = np.random.rand(3, 4, 5)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Deterministic Metrics\n", - "\n", - "`xskillscore` offers a suite of correlation-based and distance-based deterministic metrics:\n", - "\n", - "### Correlation-Based \n", - "\n", - "* Effective Sample Size (`effective_sample_size`)\n", - "* Pearson Correlation (`pearson_r`)\n", - "* Pearson Correlation effective p value (`pearson_r_eff_p_value`)\n", - "* Pearson Correlation p value (`pearson_r_p_value`)\n", - "* Slope of Linear Fit (`linslope`)\n", - "* Spearman Correlation (`spearman_r`)\n", - "* Spearman Correlation effective p value (`spearman_r_eff_p_value`)\n", - "* Spearman Correlation p value (`spearman_r_p_value`)\n", - "\n", - "### Distance-Based\n", - "\n", - "* Coefficient of Determination (`r2`)\n", - "* Mean Absolute Error (`mae`)\n", - "* Mean Absolute Percentage Error (`mape`)\n", - "* Mean Error (`me`)\n", - "* Mean Squared Error (`mse`)\n", - "* Median Absolute Error (`median_absolute_error`)\n", - "* Root Mean Squared Error (`rmse`)\n", - "* Symmetric Mean Absolute Percentage Error (`smape`)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Calling the functions is very straight-forward. All deterministic functions take the form `func(a, b, dim=None, **kwargs)`. **Notice that the original dataset is reduced by the dimension passed.** I.e., since we passed `time` as the dimension here, we are returned an object with dimensions `(lat, lon)`. For correlation metrics `dim` cannot be `[]`." - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "array([[ 0.99509676, -0.88499394, 0.94083077, 0.96521259, -0.13696899],\n", - " [-0.90613709, 0.51585291, 0.72875703, 0.19331043, 0.79754067],\n", - " [-0.80112059, -0.95632624, -0.23640403, -0.57684283, 0.43389289],\n", - " [ 0.00230351, -0.58970109, -0.87332763, -0.99992557, -0.31404248]])\n", - "Coordinates:\n", - " * lat (lat) int64 0 1 2 3\n", - " * lon (lon) int64 0 1 2 3 4\n" - ] - } - ], - "source": [ - "r = xs.pearson_r(obs, fct, dim='time')\n", - "print(r)" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "array([[0.06306879, 0.30832471, 0.22009394, 0.1684121 , 0.91252786],\n", - " [0.2780348 , 0.6549502 , 0.48019675, 0.87615511, 0.41226788],\n", - " [0.40847506, 0.1888421 , 0.84806222, 0.60856901, 0.71427925],\n", - " [0.99853354, 0.59849112, 0.32391484, 0.00776728, 0.79663312]])\n", - "Coordinates:\n", - " * lat (lat) int64 0 1 2 3\n", - " * lon (lon) int64 0 1 2 3 4\n" - ] - } - ], - "source": [ - "p = xs.pearson_r_p_value(obs, fct, dim=\"time\")\n", - "print(p)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You can also specify multiple axes for deterministic metrics. Here, we apply it over the latitude and longitude dimension (a pattern correlation)." - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "array([-0.16920304, -0.06326809, 0.18040449])\n", - "Coordinates:\n", - " * time (time) object 2000-01-01 00:00:00 ... 2000-01-03 00:00:00\n" - ] - } - ], - "source": [ - "r = xs.pearson_r(obs, fct, dim=[\"lat\", \"lon\"])\n", - "print(r)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "All deterministic metrics except for `effective_sample_size`, `pearson_r_eff_p_value` and `spearman_r_eff_p_value` can take the kwarg `weights=...`. `weights` should be a DataArray of the size of the reduced dimension (e.g., if time is being reduced it should be of length 3 in our example).\n", - "\n", - "Weighting is a common practice when working with observations and model simulations of the Earth system. When working with rectilinear grids, one can weight the data by the cosine of the latitude, which is maximum at the equator and minimum at the poles (as in the below example). More complicated model grids tend to be accompanied by a cell area varaible, which could also be passed into this function." - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [], - "source": [ - "obs2 = xr.DataArray(\n", - " np.random.rand(3, 180, 360),\n", - " coords=[\n", - " xr.cftime_range(\"2000-01-01\", \"2000-01-03\", freq=\"D\"),\n", - " np.linspace(-89.5, 89.5, 180),\n", - " np.linspace(-179.5, 179.5, 360),\n", - " ],\n", - " dims=[\"time\", \"lat\", \"lon\"],\n", - " )\n", - "fct2 = obs2.copy()\n", - "fct2.values = np.random.rand(3, 180, 360)" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [], - "source": [ - "# make weights as cosine of the latitude and broadcast\n", - "weights = np.cos(np.deg2rad(obs2.lat))\n", - "_, weights = xr.broadcast(obs2, weights)\n", - "\n", - "# Remove the time dimension from weights\n", - "weights = weights.isel(time=0)" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "array([-0.0020303 , -0.00498588, -0.00401522])\n", - "Coordinates:\n", - " * time (time) object 2000-01-01 00:00:00 ... 2000-01-03 00:00:00\n" - ] - } - ], - "source": [ - "r_weighted = xs.pearson_r(obs2, fct2, dim=[\"lat\", \"lon\"], weights=weights)\n", - "print(r_weighted)" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "array([ 5.72646719e-05, -4.32380560e-03, 4.17909845e-05])\n", - "Coordinates:\n", - " * time (time) object 2000-01-01 00:00:00 ... 2000-01-03 00:00:00\n" - ] - } - ], - "source": [ - "r_unweighted = xs.pearson_r(obs2, fct2, dim=[\"lat\", \"lon\"], weights=None)\n", - "print(r_unweighted)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You can also pass the optional boolean kwarg `skipna`. If `True`, ignore any NaNs (pairwise) in `obs` and `fct` when computing the result. If `False`, return NaNs anywhere there are pairwise NaNs." - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "array([[[ nan, nan, nan, nan, nan],\n", - " [ nan, nan, nan, nan, nan],\n", - " [0.02058449, 0.96990985, 0.83244264, 0.21233911, 0.18182497],\n", - " [0.18340451, 0.30424224, 0.52475643, 0.43194502, 0.29122914]],\n", - "\n", - " [[ nan, nan, nan, nan, nan],\n", - " [ nan, nan, nan, nan, nan],\n", - " [0.60754485, 0.17052412, 0.06505159, 0.94888554, 0.96563203],\n", - " [0.80839735, 0.30461377, 0.09767211, 0.68423303, 0.44015249]],\n", - "\n", - " [[ nan, nan, nan, nan, nan],\n", - " [ nan, nan, nan, nan, nan],\n", - " [0.96958463, 0.77513282, 0.93949894, 0.89482735, 0.59789998],\n", - " [0.92187424, 0.0884925 , 0.19598286, 0.04522729, 0.32533033]]])\n", - "Coordinates:\n", - " * time (time) object 2000-01-01 00:00:00 ... 2000-01-03 00:00:00\n", - " * lat (lat) int64 0 1 2 3\n", - " * lon (lon) int64 0 1 2 3 4\n" - ] - } - ], - "source": [ - "obs_with_nans = obs.where(obs.lat > 1)\n", - "fct_with_nans = fct.where(fct.lat > 1)\n", - "print(obs_with_nans)" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "array([0.51901116, 0.41623426, 0.32621064])\n", - "Coordinates:\n", - " * time (time) object 2000-01-01 00:00:00 ... 2000-01-03 00:00:00\n" - ] - } - ], - "source": [ - "mae_with_skipna = xs.mae(obs_with_nans, fct_with_nans, dim=['lat', 'lon'], skipna=True)\n", - "print(mae_with_skipna)" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "array([nan, nan, nan])\n", - "Coordinates:\n", - " * time (time) object 2000-01-01 00:00:00 ... 2000-01-03 00:00:00\n" - ] - } - ], - "source": [ - "mae_without_skipna = xs.mae(obs_with_nans, fct_with_nans, dim=['lat', 'lon'], skipna=False)\n", - "print(mae_without_skipna)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Probabilistic Metrics\n", - "\n", - "`xskillscore` offers a suite of probabilistic metrics:\n", - "\n", - "* Brier Score (`brier_score`)\n", - "* Brier scores of an ensemble for exceeding given thresholds (`threshold_brier_score`)\n", - "* Continuous Ranked Probability Score with a gaussian distribution (`crps_gaussian`)\n", - "* Continuous Ranked Probability Score with numerical integration of the normal distribution (`crps_quadrature`)\n", - "* Continuous Ranked Probability Score with the ensemble distribution (`crps_ensemble`)\n", - "* Discrimination (`discrimination`)\n", - "* Rank Histogram (`rank_histogram`)\n", - "* Ranked Probability Score (`rps`)\n", - "* Receiver Operating Characteristic (`roc`)\n", - "* Reliability (`reliability`)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We now create some data with an ensemble member dimension. In this case, we envision an ensemble forecast with multiple members to validate against our theoretical observations:" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": {}, - "outputs": [], - "source": [ - "obs3 = xr.DataArray(\n", - " np.random.rand(4, 5),\n", - " coords=[np.arange(4), np.arange(5)],\n", - " dims=[\"lat\", \"lon\"],\n", - " name='var'\n", - " )\n", - "fct3 = xr.DataArray(\n", - " np.random.rand(3, 4, 5),\n", - " coords=[np.arange(3), np.arange(4), np.arange(5)],\n", - " dims=[\"member\", \"lat\", \"lon\"],\n", - " name='var'\n", - " )" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Continuous Ranked Probability Score with the ensemble distribution. Pass `dim=[]` to get the same behaviour as `properscoring.crps_ensemble` without any averaging over `dim`." - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "array([[0.19918258, 0.10670612, 0.11858151, 0.15974459, 0.26841063],\n", - " [0.08038415, 0.13237479, 0.23778382, 0.18009214, 0.08326884],\n", - " [0.08589149, 0.11666573, 0.21579228, 0.09646599, 0.12855359],\n", - " [0.19891371, 0.10470738, 0.05289158, 0.107965 , 0.11143681]])\n", - "Coordinates:\n", - " * lat (lat) int64 0 1 2 3\n", - " * lon (lon) int64 0 1 2 3 4\n" - ] - } - ], - "source": [ - "crps_ensemble = xs.crps_ensemble(obs3, fct3, dim=[])\n", - "print(crps_ensemble)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The CRPS with a Gaussian distribution requires two parameters: $\\mu$ and $\\sigma$ from the forecast distribution. Here, we just use the ensemble mean and ensemble spread." - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "array([[0.19821619, 0.11640329, 0.14219455, 0.15912935, 0.28104703],\n", - " [0.08953392, 0.11758925, 0.25156378, 0.095484 , 0.10679842],\n", - " [0.05069082, 0.07081479, 0.24529056, 0.08700853, 0.09535839],\n", - " [0.1931706 , 0.11233935, 0.0783092 , 0.09593862, 0.11037143]])\n", - "Coordinates:\n", - " * lat (lat) int64 0 1 2 3\n", - " * lon (lon) int64 0 1 2 3 4\n" - ] - } - ], - "source": [ - "crps_gaussian = xs.crps_gaussian(obs3, fct3.mean(\"member\"), fct3.std(\"member\"), dim=[])\n", - "print(crps_gaussian)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The CRPS quadrature metric requires a callable distribution function. Here we use `norm` from `scipy.stats`." - ] - }, - { - "cell_type": "code", - "execution_count": 16, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "array([[0.52852898, 0.58042038, 0.46945497, 0.25013942, 0.23370234],\n", - " [0.39109762, 0.24071855, 0.25557803, 0.28994381, 0.23764056],\n", - " [0.40236669, 0.33477031, 0.24063375, 0.45538915, 0.48236113],\n", - " [0.42011508, 0.4174865 , 0.24837346, 0.43954946, 0.44689198]])\n", - "Coordinates:\n", - " * lat (lat) int64 0 1 2 3\n", - " * lon (lon) int64 0 1 2 3 4\n" - ] - } - ], - "source": [ - "from scipy.stats import norm\n", - "crps_quadrature = xs.crps_quadrature(obs3, norm, dim=[])\n", - "print(crps_quadrature)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We can also use a threshold Brier Score, to score hits over a certain threshold. Ranked Probability Score for two categories yields the same result." - ] - }, - { - "cell_type": "code", - "execution_count": 17, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "array(0.15555556)\n", - "Coordinates:\n", - " threshold float64 0.5\n" - ] - } - ], - "source": [ - "threshold_brier_score = xs.threshold_brier_score(obs3, fct3, 0.5, dim=None)\n", - "print(threshold_brier_score)" - ] - }, - { - "cell_type": "code", - "execution_count": 18, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "array(0.15555556)\n" - ] - } - ], - "source": [ - "brier_score = xs.brier_score(obs3>.5, (fct3>.5).mean('member'))\n", - "print(brier_score)" - ] - }, - { - "cell_type": "code", - "execution_count": 19, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "array(0.15555556)\n" - ] - } - ], - "source": [ - "rps = xs.rps(obs3>.5, fct3>.5, category_edges=np.array([0.5]))\n", - "print(rps)" - ] - }, - { - "cell_type": "code", - "execution_count": 20, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "array([3, 8, 6, 3])\n", - "Coordinates:\n", - " * rank (rank) float64 1.0 2.0 3.0 4.0\n" - ] - } - ], - "source": [ - "rank_histogram = xs.rank_histogram(obs3, fct3)\n", - "print(rank_histogram)" - ] - }, - { - "cell_type": "code", - "execution_count": 21, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "array([[0. , 0.08333333, 0. , 0.66666667, 0.25 ],\n", - " [0.125 , 0.5 , 0. , 0.375 , 0. ]])\n", - "Coordinates:\n", - " * forecast_probability (forecast_probability) float64 0.1 0.3 0.5 0.7 0.9\n", - " * event (event) bool True False\n" - ] - } - ], - "source": [ - "disc = xs.discrimination(obs3 > 0.5, (fct3 > 0.5).mean(\"member\"))\n", - "print(disc)" - ] - }, - { - "cell_type": "code", - "execution_count": 22, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "array([0. , 0.2 , nan, 0.72727273, 1. ])\n", - "Coordinates:\n", - " * forecast_probability (forecast_probability) float64 0.1 0.3 0.5 0.7 0.9\n", - " samples (forecast_probability) float64 1.0 5.0 0.0 11.0 3.0\n" - ] - } - ], - "source": [ - "rel = xs.reliability(obs3 > 0.5, (fct3 > 0.5).mean(\"member\"))\n", - "print(rel)" - ] - }, - { - "cell_type": "code", - "execution_count": 23, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "0.8229166666666666" - ] - }, - "execution_count": 23, - "metadata": {}, - "output_type": "execute_result" + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Quick Start\n", + "\n", + "See the [API](https://xskillscore.readthedocs.io/en/stable/api.html) for more detailed information, examples, formulas, and references for each function.", + ], + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "import xarray as xr\n", + "import xskillscore as xs\n", + "import matplotlib.pyplot as plt\n", + "np.random.seed(seed=42)", + ], + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Here, we generate some sample gridded data. Our data has three time steps, and a 4x5 latitude/longitude grid. `obs` denotes some verification data (sometimes termed `y`) and `fct` some forecast data (e.g. from a statistical or dynamical model; sometimes termed `yhat`)." + ], + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "obs = xr.DataArray(\n", + " np.random.rand(3, 4, 5),\n", + " coords=[\n", + ' xr.cftime_range("2000-01-01", "2000-01-03", freq="D"),\n', + " np.arange(4),\n", + " np.arange(5),\n", + " ],\n", + ' dims=["time", "lat", "lon"],\n', + " name='var'\n", + " )\n", + "fct = obs.copy()\n", + "fct.values = np.random.rand(3, 4, 5)", + ], + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Deterministic Metrics\n", + "\n", + "`xskillscore` offers a suite of correlation-based and distance-based deterministic metrics:\n", + "\n", + "### Correlation-Based \n", + "\n", + "* Effective Sample Size (`effective_sample_size`)\n", + "* Pearson Correlation (`pearson_r`)\n", + "* Pearson Correlation effective p value (`pearson_r_eff_p_value`)\n", + "* Pearson Correlation p value (`pearson_r_p_value`)\n", + "* Slope of Linear Fit (`linslope`)\n", + "* Spearman Correlation (`spearman_r`)\n", + "* Spearman Correlation effective p value (`spearman_r_eff_p_value`)\n", + "* Spearman Correlation p value (`spearman_r_p_value`)\n", + "\n", + "### Distance-Based\n", + "\n", + "* Coefficient of Determination (`r2`)\n", + "* Mean Absolute Error (`mae`)\n", + "* Mean Absolute Percentage Error (`mape`)\n", + "* Mean Error (`me`)\n", + "* Mean Squared Error (`mse`)\n", + "* Median Absolute Error (`median_absolute_error`)\n", + "* Root Mean Squared Error (`rmse`)\n", + "* Symmetric Mean Absolute Percentage Error (`smape`)", + ], + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Calling the functions is very straight-forward. All deterministic functions take the form `func(a, b, dim=None, **kwargs)`. **Notice that the original dataset is reduced by the dimension passed.** I.e., since we passed `time` as the dimension here, we are returned an object with dimensions `(lat, lon)`. For correlation metrics `dim` cannot be `[]`." + ], + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "array([[ 0.99509676, -0.88499394, 0.94083077, 0.96521259, -0.13696899],\n", + " [-0.90613709, 0.51585291, 0.72875703, 0.19331043, 0.79754067],\n", + " [-0.80112059, -0.95632624, -0.23640403, -0.57684283, 0.43389289],\n", + " [ 0.00230351, -0.58970109, -0.87332763, -0.99992557, -0.31404248]])\n", + "Coordinates:\n", + " * lat (lat) int64 0 1 2 3\n", + " * lon (lon) int64 0 1 2 3 4\n", + ], + } + ], + "source": ["r = xs.pearson_r(obs, fct, dim='time')\n", "print(r)"], + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "array([[0.06306879, 0.30832471, 0.22009394, 0.1684121 , 0.91252786],\n", + " [0.2780348 , 0.6549502 , 0.48019675, 0.87615511, 0.41226788],\n", + " [0.40847506, 0.1888421 , 0.84806222, 0.60856901, 0.71427925],\n", + " [0.99853354, 0.59849112, 0.32391484, 0.00776728, 0.79663312]])\n", + "Coordinates:\n", + " * lat (lat) int64 0 1 2 3\n", + " * lon (lon) int64 0 1 2 3 4\n", + ], + } + ], + "source": ['p = xs.pearson_r_p_value(obs, fct, dim="time")\n', "print(p)"], + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can also specify multiple axes for deterministic metrics. Here, we apply it over the latitude and longitude dimension (a pattern correlation)." + ], + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "array([-0.16920304, -0.06326809, 0.18040449])\n", + "Coordinates:\n", + " * time (time) object 2000-01-01 00:00:00 ... 2000-01-03 00:00:00\n", + ], + } + ], + "source": ['r = xs.pearson_r(obs, fct, dim=["lat", "lon"])\n', "print(r)"], + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "All deterministic metrics except for `effective_sample_size`, `pearson_r_eff_p_value` and `spearman_r_eff_p_value` can take the kwarg `weights=...`. `weights` should be a DataArray of the size of the reduced dimension (e.g., if time is being reduced it should be of length 3 in our example).\n", + "\n", + "Weighting is a common practice when working with observations and model simulations of the Earth system. When working with rectilinear grids, one can weight the data by the cosine of the latitude, which is maximum at the equator and minimum at the poles (as in the below example). More complicated model grids tend to be accompanied by a cell area varaible, which could also be passed into this function.", + ], + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "obs2 = xr.DataArray(\n", + " np.random.rand(3, 180, 360),\n", + " coords=[\n", + ' xr.cftime_range("2000-01-01", "2000-01-03", freq="D"),\n', + " np.linspace(-89.5, 89.5, 180),\n", + " np.linspace(-179.5, 179.5, 360),\n", + " ],\n", + ' dims=["time", "lat", "lon"],\n', + " )\n", + "fct2 = obs2.copy()\n", + "fct2.values = np.random.rand(3, 180, 360)", + ], + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "# make weights as cosine of the latitude and broadcast\n", + "weights = np.cos(np.deg2rad(obs2.lat))\n", + "_, weights = xr.broadcast(obs2, weights)\n", + "\n", + "# Remove the time dimension from weights\n", + "weights = weights.isel(time=0)", + ], + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "array([-0.0020303 , -0.00498588, -0.00401522])\n", + "Coordinates:\n", + " * time (time) object 2000-01-01 00:00:00 ... 2000-01-03 00:00:00\n", + ], + } + ], + "source": [ + 'r_weighted = xs.pearson_r(obs2, fct2, dim=["lat", "lon"], weights=weights)\n', + "print(r_weighted)", + ], + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "array([ 5.72646719e-05, -4.32380560e-03, 4.17909845e-05])\n", + "Coordinates:\n", + " * time (time) object 2000-01-01 00:00:00 ... 2000-01-03 00:00:00\n", + ], + } + ], + "source": [ + 'r_unweighted = xs.pearson_r(obs2, fct2, dim=["lat", "lon"], weights=None)\n', + "print(r_unweighted)", + ], + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can also pass the optional boolean kwarg `skipna`. If `True`, ignore any NaNs (pairwise) in `obs` and `fct` when computing the result. If `False`, return NaNs anywhere there are pairwise NaNs." + ], + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "array([[[ nan, nan, nan, nan, nan],\n", + " [ nan, nan, nan, nan, nan],\n", + " [0.02058449, 0.96990985, 0.83244264, 0.21233911, 0.18182497],\n", + " [0.18340451, 0.30424224, 0.52475643, 0.43194502, 0.29122914]],\n", + "\n", + " [[ nan, nan, nan, nan, nan],\n", + " [ nan, nan, nan, nan, nan],\n", + " [0.60754485, 0.17052412, 0.06505159, 0.94888554, 0.96563203],\n", + " [0.80839735, 0.30461377, 0.09767211, 0.68423303, 0.44015249]],\n", + "\n", + " [[ nan, nan, nan, nan, nan],\n", + " [ nan, nan, nan, nan, nan],\n", + " [0.96958463, 0.77513282, 0.93949894, 0.89482735, 0.59789998],\n", + " [0.92187424, 0.0884925 , 0.19598286, 0.04522729, 0.32533033]]])\n", + "Coordinates:\n", + " * time (time) object 2000-01-01 00:00:00 ... 2000-01-03 00:00:00\n", + " * lat (lat) int64 0 1 2 3\n", + " * lon (lon) int64 0 1 2 3 4\n", + ], + } + ], + "source": [ + "obs_with_nans = obs.where(obs.lat > 1)\n", + "fct_with_nans = fct.where(fct.lat > 1)\n", + "print(obs_with_nans)", + ], + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "array([0.51901116, 0.41623426, 0.32621064])\n", + "Coordinates:\n", + " * time (time) object 2000-01-01 00:00:00 ... 2000-01-03 00:00:00\n", + ], + } + ], + "source": [ + "mae_with_skipna = xs.mae(obs_with_nans, fct_with_nans, dim=['lat', 'lon'], skipna=True)\n", + "print(mae_with_skipna)", + ], + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "array([nan, nan, nan])\n", + "Coordinates:\n", + " * time (time) object 2000-01-01 00:00:00 ... 2000-01-03 00:00:00\n", + ], + } + ], + "source": [ + "mae_without_skipna = xs.mae(obs_with_nans, fct_with_nans, dim=['lat', 'lon'], skipna=False)\n", + "print(mae_without_skipna)", + ], + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Probabilistic Metrics\n", + "\n", + "`xskillscore` offers a suite of probabilistic metrics:\n", + "\n", + "* Brier Score (`brier_score`)\n", + "* Brier scores of an ensemble for exceeding given thresholds (`threshold_brier_score`)\n", + "* Continuous Ranked Probability Score with a gaussian distribution (`crps_gaussian`)\n", + "* Continuous Ranked Probability Score with numerical integration of the normal distribution (`crps_quadrature`)\n", + "* Continuous Ranked Probability Score with the ensemble distribution (`crps_ensemble`)\n", + "* Discrimination (`discrimination`)\n", + "* Rank Histogram (`rank_histogram`)\n", + "* Ranked Probability Score (`rps`)\n", + "* Receiver Operating Characteristic (`roc`)\n", + "* Reliability (`reliability`)", + ], + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We now create some data with an ensemble member dimension. In this case, we envision an ensemble forecast with multiple members to validate against our theoretical observations:" + ], + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [], + "source": [ + "obs3 = xr.DataArray(\n", + " np.random.rand(4, 5),\n", + " coords=[np.arange(4), np.arange(5)],\n", + ' dims=["lat", "lon"],\n', + " name='var'\n", + " )\n", + "fct3 = xr.DataArray(\n", + " np.random.rand(3, 4, 5),\n", + " coords=[np.arange(3), np.arange(4), np.arange(5)],\n", + ' dims=["member", "lat", "lon"],\n', + " name='var'\n", + " )", + ], + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Continuous Ranked Probability Score with the ensemble distribution. Pass `dim=[]` to get the same behaviour as `properscoring.crps_ensemble` without any averaging over `dim`." + ], + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "array([[0.19918258, 0.10670612, 0.11858151, 0.15974459, 0.26841063],\n", + " [0.08038415, 0.13237479, 0.23778382, 0.18009214, 0.08326884],\n", + " [0.08589149, 0.11666573, 0.21579228, 0.09646599, 0.12855359],\n", + " [0.19891371, 0.10470738, 0.05289158, 0.107965 , 0.11143681]])\n", + "Coordinates:\n", + " * lat (lat) int64 0 1 2 3\n", + " * lon (lon) int64 0 1 2 3 4\n", + ], + } + ], + "source": [ + "crps_ensemble = xs.crps_ensemble(obs3, fct3, dim=[])\n", + "print(crps_ensemble)", + ], + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The CRPS with a Gaussian distribution requires two parameters: $\\mu$ and $\\sigma$ from the forecast distribution. Here, we just use the ensemble mean and ensemble spread." + ], + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "array([[0.19821619, 0.11640329, 0.14219455, 0.15912935, 0.28104703],\n", + " [0.08953392, 0.11758925, 0.25156378, 0.095484 , 0.10679842],\n", + " [0.05069082, 0.07081479, 0.24529056, 0.08700853, 0.09535839],\n", + " [0.1931706 , 0.11233935, 0.0783092 , 0.09593862, 0.11037143]])\n", + "Coordinates:\n", + " * lat (lat) int64 0 1 2 3\n", + " * lon (lon) int64 0 1 2 3 4\n", + ], + } + ], + "source": [ + 'crps_gaussian = xs.crps_gaussian(obs3, fct3.mean("member"), fct3.std("member"), dim=[])\n', + "print(crps_gaussian)", + ], + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The CRPS quadrature metric requires a callable distribution function. Here we use `norm` from `scipy.stats`." + ], + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "array([[0.52852898, 0.58042038, 0.46945497, 0.25013942, 0.23370234],\n", + " [0.39109762, 0.24071855, 0.25557803, 0.28994381, 0.23764056],\n", + " [0.40236669, 0.33477031, 0.24063375, 0.45538915, 0.48236113],\n", + " [0.42011508, 0.4174865 , 0.24837346, 0.43954946, 0.44689198]])\n", + "Coordinates:\n", + " * lat (lat) int64 0 1 2 3\n", + " * lon (lon) int64 0 1 2 3 4\n", + ], + } + ], + "source": [ + "from scipy.stats import norm\n", + "crps_quadrature = xs.crps_quadrature(obs3, norm, dim=[])\n", + "print(crps_quadrature)", + ], + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can also use a threshold Brier Score, to score hits over a certain threshold. Ranked Probability Score for two categories yields the same result." + ], + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "array(0.15555556)\n", + "Coordinates:\n", + " threshold float64 0.5\n", + ], + } + ], + "source": [ + "threshold_brier_score = xs.threshold_brier_score(obs3, fct3, 0.5, dim=None)\n", + "print(threshold_brier_score)", + ], + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": ["\n", "array(0.15555556)\n"], + } + ], + "source": [ + "brier_score = xs.brier_score(obs3>.5, (fct3>.5).mean('member'))\n", + "print(brier_score)", + ], + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": ["\n", "array(0.15555556)\n"], + } + ], + "source": [ + "rps = xs.rps(obs3>.5, fct3>.5, category_edges=np.array([0.5]))\n", + "print(rps)", + ], + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "array([3, 8, 6, 3])\n", + "Coordinates:\n", + " * rank (rank) float64 1.0 2.0 3.0 4.0\n", + ], + } + ], + "source": [ + "rank_histogram = xs.rank_histogram(obs3, fct3)\n", + "print(rank_histogram)", + ], + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "array([[0. , 0.08333333, 0. , 0.66666667, 0.25 ],\n", + " [0.125 , 0.5 , 0. , 0.375 , 0. ]])\n", + "Coordinates:\n", + " * forecast_probability (forecast_probability) float64 0.1 0.3 0.5 0.7 0.9\n", + " * event (event) bool True False\n", + ], + } + ], + "source": [ + 'disc = xs.discrimination(obs3 > 0.5, (fct3 > 0.5).mean("member"))\n', + "print(disc)", + ], + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "array([0. , 0.2 , nan, 0.72727273, 1. ])\n", + "Coordinates:\n", + " * forecast_probability (forecast_probability) float64 0.1 0.3 0.5 0.7 0.9\n", + " samples (forecast_probability) float64 1.0 5.0 0.0 11.0 3.0\n", + ], + } + ], + "source": [ + 'rel = xs.reliability(obs3 > 0.5, (fct3 > 0.5).mean("member"))\n', + "print(rel)", + ], + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": {}, + "outputs": [ + { + "data": {"text/plain": ["0.8229166666666666"]}, + "execution_count": 23, + "metadata": {}, + "output_type": "execute_result", + }, + { + "data": { + "image/png": "\n", + "text/plain": ["
"], + }, + "metadata": {"needs_background": "light"}, + "output_type": "display_data", + }, + ], + "source": [ + "# ROC for probabilistic forecasts and bin_edges='continuous' default\n", + "roc = xs.roc(obs3 > 0.5, (fct3 > 0.5).mean(\"member\"), return_results='all_as_metric_dim')\n", + "\n", + "plt.figure(figsize=(4, 4))\n", + "plt.plot([0, 1], [0, 1], 'k:')\n", + "roc.to_dataset(dim='metric').plot.scatter(y='true positive rate', x='false positive rate')\n", + "roc.sel(metric='area under curve').values[0]", + ], + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Contingency-Based\n", + "\n", + "To work with contingency-based scoring, first instantiate a `Contingency` object by passing in your observations, forecast, and observation/forecast bin edges. See https://www.cawcr.gov.au/projects/verification/#Contingency_table for more information.", + ], + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": {}, + "outputs": [], + "source": [ + 'dichotomous_category_edges = np.array([0, 0.5, 1]) # "dichotomous" mean two-category\n', + "dichotomous_contingency = xs.Contingency(\n", + ' obs, fct, dichotomous_category_edges, dichotomous_category_edges, dim=["lat", "lon"]\n', + ")\n", + "dichotomous_contingency_table = dichotomous_contingency.table\n", + "print(dichotomous_contingency_table)", + ], + }, + { + "cell_type": "code", + "execution_count": 26, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + '\n', + " \n", + " \n", + " \n", + " \n", + ' \n', + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
histogram_observations_forecasts
observations_category12
observations_category_bounds[0.0, 0.5)[0.5, 1.0]
forecasts_categoryforecasts_category_bounds
1[0.0, 0.5)5.334.67
2[0.5, 1.0]5.334.67
\n", + "
", + ], + "text/plain": [ + " histogram_observations_forecasts \\\n", + "observations_category 1 \n", + "observations_category_bounds [0.0, 0.5) \n", + "forecasts_category forecasts_category_bounds \n", + "1 [0.0, 0.5) 5.33 \n", + "2 [0.5, 1.0] 5.33 \n", + "\n", + " \n", + "observations_category 2 \n", + "observations_category_bounds [0.5, 1.0] \n", + "forecasts_category forecasts_category_bounds \n", + "1 [0.0, 0.5) 4.67 \n", + "2 [0.5, 1.0] 4.67 ", + ], + }, + "execution_count": 26, + "metadata": {}, + "output_type": "execute_result", + } + ], + "source": [ + "print(\n", + " dichotomous_contingency_table.to_dataframe()\n", + " .pivot_table(\n", + ' index=["forecasts_category", "forecasts_category_bounds"],\n', + ' columns=["observations_category", "observations_category_bounds"],\n', + " )\n", + " .round(2)\n", + ")", + ], + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Scores based on the constructed contingency table can be called via class methods. The available methods are:\n", + "\n", + "* Accuracy (`accuracy`)\n", + "* Bias Score (`bias_score`)\n", + "* Equitable Threat Score (`equit_threat_score`)\n", + "* False Alarms / False Positives (`false_alarms`)\n", + "* False Alarm Ratio / False Discovery Rate (`false_alarm_ratio`)\n", + "* False Alarm Rate / False Positive Rate / Fall-out (`false_alarm_rate`)\n", + "* Gerrity Score (`gerrity_score`)\n", + "* Heidke Score / Cohen's Kappa (`heidke_score`)\n", + "* Hit Rate / Recall / Sensitivity / True Positive Rate (`hit_rate`)\n", + "* Hits / True Positives (`hits`)\n", + "* Misses / False Negatives (`misses`)\n", + "* Odds Ratio (`odds_ratio`)\n", + "* Odds Ratio Skill Score (`odds_ratio_skill_score`)\n", + "* Peirce Score (`peirce_score`)\n", + "* Receiver Operating Characteristic (`roc`)\n", + "* Success Ratio / Precision / Positive Predictive Value (`success_ratio`)\n", + "* Threat Score / Critical Success Index (`threat_score`)\n", + "\n", + "Below, we share a few examples of these in action:", + ], + }, + { + "cell_type": "code", + "execution_count": 27, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "array([1. , 1.11111111, 1.1 ])\n", + "Coordinates:\n", + " * time (time) object 2000-01-01 00:00:00 ... 2000-01-03 00:00:00\n", + ], + } + ], + "source": ["print(dichotomous_contingency.bias_score())"], + }, + { + "cell_type": "code", + "execution_count": 28, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "array([0.33333333, 0.55555556, 0.6 ])\n", + "Coordinates:\n", + " * time (time) object 2000-01-01 00:00:00 ... 2000-01-03 00:00:00\n", + ], + } + ], + "source": ["print(dichotomous_contingency.hit_rate())"], + }, + { + "cell_type": "code", + "execution_count": 29, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "array([0.54545455, 0.45454545, 0.5 ])\n", + "Coordinates:\n", + " * time (time) object 2000-01-01 00:00:00 ... 2000-01-03 00:00:00\n", + ], + } + ], + "source": ["print(dichotomous_contingency.false_alarm_rate())"], + }, + { + "cell_type": "code", + "execution_count": 30, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "array([-0.41176471, 0.2 , 0.2 ])\n", + "Coordinates:\n", + " * time (time) object 2000-01-01 00:00:00 ... 2000-01-03 00:00:00\n", + ], + } + ], + "source": ["print(dichotomous_contingency.odds_ratio_skill_score())"], + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now we can leverage multi-category edges to make use of some scores." + ], + }, + { + "cell_type": "code", + "execution_count": 31, + "metadata": {}, + "outputs": [], + "source": [ + "multi_category_edges = np.array([0, 0.25, 0.75, 1])\n", + "multicategory_contingency = xs.Contingency(\n", + ' obs, fct, multi_category_edges, multi_category_edges, dim=["lat", "lon"]\n', + ")", + ], + }, + { + "cell_type": "code", + "execution_count": 32, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "array([0.25, 0.25, 0.5 ])\n", + "Coordinates:\n", + " * time (time) object 2000-01-01 00:00:00 ... 2000-01-03 00:00:00\n", + ], + } + ], + "source": ["print(multicategory_contingency.accuracy())"], + }, + { + "cell_type": "code", + "execution_count": 33, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "array([-0.14503817, -0.25 , 0.2481203 ])\n", + "Coordinates:\n", + " * time (time) object 2000-01-01 00:00:00 ... 2000-01-03 00:00:00\n", + ], + } + ], + "source": ["print(multicategory_contingency.heidke_score())"], + }, + { + "cell_type": "code", + "execution_count": 34, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "array([-0.1496063 , -0.24193548, 0.25 ])\n", + "Coordinates:\n", + " * time (time) object 2000-01-01 00:00:00 ... 2000-01-03 00:00:00\n", + ], + } + ], + "source": ["print(multicategory_contingency.peirce_score())"], + }, + { + "cell_type": "code", + "execution_count": 35, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "array([-0.15212912, -0.11160714, 0.25 ])\n", + "Coordinates:\n", + " * time (time) object 2000-01-01 00:00:00 ... 2000-01-03 00:00:00\n", + ], + } + ], + "source": ["print(multicategory_contingency.gerrity_score())"], + }, + { + "cell_type": "code", + "execution_count": 36, + "metadata": {}, + "outputs": [ + { + "data": {"text/plain": ["0.5035528250988777"]}, + "execution_count": 36, + "metadata": {}, + "output_type": "execute_result", + }, + { + "data": { + "image/png": "\n", + "text/plain": ["
"], + }, + "metadata": {"needs_background": "light"}, + "output_type": "display_data", + }, + ], + "source": [ + "# ROC for deterministic forecasts and bin_edges\n", + "roc = xs.roc(obs, fct, np.linspace(0, 1, 11), return_results='all_as_metric_dim')\n", + "\n", + "plt.figure(figsize=(4, 4))\n", + "plt.plot([0,1], [0,1], 'k:')\n", + "roc.to_dataset(dim='metric').plot.scatter(y='true positive rate', x='false positive rate')\n", + "roc.sel(metric='area under curve').values[0]", + ], + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Comparative\n", + "\n", + "Tests to compare whether one forecast is significantly better than another one.", + ], + }, + {"cell_type": "markdown", "metadata": {}, "source": ["### Sign test"]}, + { + "cell_type": "code", + "execution_count": 37, + "metadata": {}, + "outputs": [], + "source": [ + "length = 100\n", + "obs_1d = xr.DataArray(\n", + " np.random.rand(length),\n", + " coords=[\n", + " np.arange(length),\n", + " ],\n", + ' dims=["time"],\n', + " name='var'\n", + " )\n", + "fct_1d = obs_1d.copy()\n", + "fct_1d.values = np.random.rand(length)", + ], + }, + { + "cell_type": "code", + "execution_count": 38, + "metadata": {}, + "outputs": [], + "source": [ + "# given you want to test whether one forecast is better than another forecast\n", + "significantly_different, walk, confidence = xs.sign_test(\n", + ' fct_1d, fct_1d + 0.2, obs_1d, time_dim="time", metric="mae", orientation="negative"\n', + ")", + ], + }, + { + "cell_type": "code", + "execution_count": 39, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": ["[]"] + }, + "execution_count": 39, + "metadata": {}, + "output_type": "execute_result", + }, + { + "data": { + "image/png": "\n", + "text/plain": ["
"], + }, + "metadata": {"needs_background": "light"}, + "output_type": "display_data", + }, + ], + "source": [ + "walk.plot()\n", + "confidence.plot(c='gray')\n", + "(-1 * confidence).plot(c='gray')", + ], + }, + {"cell_type": "markdown", "metadata": {}, "source": ["### MAE test"]}, + { + "cell_type": "code", + "execution_count": 40, + "metadata": {}, + "outputs": [], + "source": [ + "# create a worse forecast with high but different to perfect correlation\n", + "fct_1d_worse = fct_1d.copy()\n", + "step = 3\n", + "fct_1d_worse[::step] = fct_1d[::step].values + 0.1", + ], + }, + { + "cell_type": "code", + "execution_count": 41, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "array(0.00966918)\n", + "\n", + "array(0.01083478)\n", + "MAEs significantly different at level 0.05 : False\n", + ], + } + ], + "source": [ + "# half-with of the confidence interval at level alpha is larger than the MAE differences,\n", + "# therefore not significant\n", + "alpha = 0.05\n", + "significantly_different, diff, hwci = xs.mae_test(\n", + ' fct_1d, fct_1d_worse, obs_1d, time_dim="time", dim=[], alpha=alpha\n', + ")\n", + "print(diff)\n", + "print(hwci)\n", + "print(\n", + ' f"MAEs significantly different at level {alpha} : {bool(significantly_different)}"\n', + ")", + ], + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Accessors\n", + "\n", + "You can also use `xskillscore` as a method of your `xarray` Dataset.", + ], + }, + { + "cell_type": "code", + "execution_count": 42, + "metadata": {}, + "outputs": [], + "source": [ + "ds = xr.Dataset()\n", + 'ds["obs_var"] = obs\n', + 'ds["fct_var"] = fct', + ], + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In the case that your Dataset contains both your observation and forecast variable, just pass them as strings into the function." + ], + }, + { + "cell_type": "code", + "execution_count": 43, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "array([[ 0.99509676, -0.88499394, 0.94083077, 0.96521259, -0.13696899],\n", + " [-0.90613709, 0.51585291, 0.72875703, 0.19331043, 0.79754067],\n", + " [-0.80112059, -0.95632624, -0.23640403, -0.57684283, 0.43389289],\n", + " [ 0.00230351, -0.58970109, -0.87332763, -0.99992557, -0.31404248]])\n", + "Coordinates:\n", + " * lat (lat) int64 0 1 2 3\n", + " * lon (lon) int64 0 1 2 3 4\n", + ], + } + ], + "source": ['print(ds.xs.pearson_r("obs_var", "fct_var", dim="time"))'], + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can also pass in a separate Dataset that contains your observations or forecast variable." + ], + }, + { + "cell_type": "code", + "execution_count": 44, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "array([[ 0.99509676, -0.88499394, 0.94083077, 0.96521259, -0.13696899],\n", + " [-0.90613709, 0.51585291, 0.72875703, 0.19331043, 0.79754067],\n", + " [-0.80112059, -0.95632624, -0.23640403, -0.57684283, 0.43389289],\n", + " [ 0.00230351, -0.58970109, -0.87332763, -0.99992557, -0.31404248]])\n", + "Coordinates:\n", + " * lat (lat) int64 0 1 2 3\n", + " * lon (lon) int64 0 1 2 3 4\n", + ], + } + ], + "source": [ + 'ds = ds.drop_vars("fct_var")\n', + 'print(ds.xs.pearson_r("obs_var", fct, dim="time"))', + ], + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Resampling\n", + "- randomly resample the `time` dimension and then take mean over `time` to get resample threshold\n", + "- resample over `member` dimension to get uncertainty due to member sampling in hindcasts", + ], + }, + { + "cell_type": "code", + "execution_count": 45, + "metadata": {}, + "outputs": [], + "source": [ + "# create large one-dimensional array\n", + "s = 1000\n", + "f = xr.DataArray(\n", + ' np.random.normal(size=s), dims="member", coords={"member": np.arange(s)}, name="var"\n', + ")", + ], + }, + { + "cell_type": "code", + "execution_count": 46, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "65.1 ms ± 1.78 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)\n", + "1.44 ms ± 41.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n", + ], + } + ], + "source": [ + "# resample with replacement in that one dimension\n", + "iterations = 100\n", + "%timeit f_r = xs.resampling.resample_iterations(f, iterations, 'member', replace=True)\n", + "# resample_iterations_idx is much (50x) faster because it involves no loops\n", + "%timeit f_r = xs.resampling.resample_iterations_idx(f, iterations, 'member', replace=True)\n", + "# but both do the same resampling", + ], + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "- use `resample_iterations` for very large data, because very robust, chunksize stays contants and only more tasks are added\n", + "- use `resample_iterations_idx` for small data always and very large data only, when chunked to small chunks in the other dimensions, because the function increases the input chunksize by factor `iterations`", + ], + }, + { + "cell_type": "code", + "execution_count": 47, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [""] + }, + "execution_count": 47, + "metadata": {}, + "output_type": "execute_result", + }, + { + "data": { + "image/png": "\n", + "text/plain": ["
"], + }, + "metadata": {"needs_background": "light"}, + "output_type": "display_data", + }, + ], + "source": [ + "f_r = xs.resampling.resample_iterations_idx(f, iterations, 'member', replace=True)\n", + "f.plot.hist(label='distribution')\n", + "f_r.mean('iteration').plot.hist(label='resampled mean distribution')\n", + "plt.axvline(x=f.mean('member'), c='k', label='distribution mean')\n", + "plt.title('Gaussian distribution mean')\n", + "plt.legend()", + ], + }, + { + "cell_type": "code", + "execution_count": 48, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [""] + }, + "execution_count": 48, + "metadata": {}, + "output_type": "execute_result", + }, + { + "data": { + "image/png": "\n", + "text/plain": ["
"], + }, + "metadata": {"needs_background": "light"}, + "output_type": "display_data", + }, + ], + "source": [ + "# we can calculate the distribution of the RMSE of 0 and f resampled over member\n", + "xs.rmse(f_r, xr.zeros_like(f_r), dim='iteration').plot.hist(label='resampled RMSE distribution')\n", + "# the gaussian distribution should have an RMSE with 0 of one\n", + "plt.axvline(x=xs.rmse(f, xr.zeros_like(f)), c='k', label='RMSE')\n", + "plt.title('RMSE between gaussian distribution and 0')\n", + "plt.legend()", + ], + }, + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3", + }, + "language_info": { + "codemirror_mode": {"name": "ipython", "version": 3}, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8.6", + }, }, - { - "data": { - "image/png": "iVBORw0KGgoAAAANSUhEUgAAARIAAAEGCAYAAACpcBquAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/Il7ecAAAACXBIWXMAAAsTAAALEwEAmpwYAAAkBUlEQVR4nO3deXgV5fn/8fedjSYsCRCQfVcUBESigloBrYItgnJ5uVTUtlLKFy12+aFSCkX9WqFYWyko5qetWncLRcQFK4r6U1IIhk0Wq6iQIA2IIIQIJLl/f8ygh5hlknPmzFnu13WdK3PmzDnzSSB3nueZmWdEVTHGmHCkBB3AGBP/rJAYY8JmhcQYEzYrJMaYsFkhMcaELS3oAA2Vm5ur3bp1CzqGMUlnzZo1e1S1TU2vxV0h6datG4WFhUHHMCbpiMintb1mXRtjTNiskBhjwmaFxBgTNiskxpiwWSExxoTNt6M2IvJXYBRQqqqn1vC6APcB3wcOAT9S1ff8ymNMPFlcVMKcZVvZua+cDjmZTBnRm0sHdozZffjZInkEGFnH6xcDJ7qPCcADPmYxJm4sLiph6qINlOwrR4GSfeVMXbSBxUUlMbsP3wqJqr4F7K1jkzHAY+ooAHJEpL1feYyJF3OWbaX8aOVx68qPVjJn2daI76Oy7AuOTSUSzj6CHCPpCOwIeV7srvsWEZkgIoUiUrh79+6ohDMmKDv3lTdofWP3oarsXnQXX65aFPY+giwkUsO6GmdZUtV8Vc1T1bw2bWo8Q9eYhNEhJ7NB6xu7DxEhd/QtZJ04OOx9BFlIioHOIc87ATsDymJMzJgyojeZ6anHrctMT2XKiN4R+fxVq1bRdfvLfCcthbTstqS36hj2PoIsJEuA68QxGNivqp8FmMeYmHDpwI7cPbYfHXMyEaBjTiZ3j+0XsaM2zzzzDP9etojpF3WN2D7ErzlbReQpYBiQC/wX+B2QDqCqC9zDv/NwjuwcAn6sqvVejZeXl6d20Z4xDaeqiAhVVVXs3buX3NzcBr1fRNaoal5Nr/l2HomqXl3P6wrc6Nf+jTHfWL58OdOmTWPp0qXk5uY2uIjUx85sNSZJqCpVVVW+fLYVEmMS2LHTJS644AIKCgpo27atL/uxQmJMgnrttdfo1q0bb7zxBgDOsKQ/rJAYk6DOOOMMrr/+ek4//XTf92WFxJgEs2LFCioqKsjOzub+++8nOzvb931aITEmgWzcuJHzzz+fP/3pT1Hdb9xN/myMqd2pp57Kk08+yZgxY6K6X2uRGJMA8vPz+eCDDwC46qqryMyM3HU5XlghSXKLi0o4Z9brdL/tRc6Z9XpE57ww0bF3716mT5/OfffdF1gG69oksWOT2xyb++LY5DZAxGfjMv5p1aoVK1eupHPnzvVv7BNrkSSxaEygY/yhqsyYMYN58+YB0KNHD9LT0wPLYy2SJBaNCXSMP6qqqtiwYQO5ublfX4wXJCskSaxDTiYlNRSNSE6gYyJLVfnqq6/IzMzkmWeeIS0tLfAiAta1SWp+T6BjIu8Xv/gFI0eO5PDhw2RkZJCSEhu/wtYiSWLHBlT9vu2BiZwhQ4aQlZVFRkZG0FGO49vERn6xiY1MsqmsrGTr1q306dMn0Bx1TWwUG+0iY0ytpk2bxllnnUVJSeye42NdG2Ni3OTJk+nRowcdO8Zul9NaJMbEoMOHD/Pwww+jqnTo0IEJEyYEHalOVkiMiUGPPfYY48ePp6CgIOgonljXxpgYNH78ePr06cOQIUOCjuKJtUiMiREHDhzghhtuYNeuXYgI55xzTtCRPLNCYkyM+OCDD1i4cCGrV68OOkqDWdfGmIBVVlaSmprKoEGD+Pjjj2nZsmXQkRrMWiTGBOjzzz9nyJAh/OMf/wCIyyICVkiMCVSTJk3Izs6mefPmQUcJi3VtjAlAaWkpLVu2pFmzZrz66qsxcQVvOKxFYkyUHTp0iHPPPffrk8zivYiAtUiMibqsrCwmT57MoEGDgo4SMVZIjImSjz76iMOHD9OnTx9uuummoONElBUSY6JAVbn66qspLy9n3bp1MTMhUaT4WkhEZCRwH5AKPKSqs6q9ng08DnRxs9yjqn/zM5MxQRARHnvsMY4ePZpwRQR8HGwVkVRgPnAx0Ae4WkSqz8xyI7BJVQcAw4A/ikhsTf1kTBjWr1//9UzvJ598Mv369Qs4kT/8LI1nAh+q6jZVPQI8DVS/j6ACzcUZtm4G7AUqfMxkTFQ98MADzJ49m/379wcdxVd+FpKOwI6Q58XuulDzgFOAncAG4GZVrar+QSIyQUQKRaRw9+7dfuU1JuLmzp1LQUEB2dnZQUfxlZ+FpKaD49UniB0BrAU6AKcB80SkxbfepJqvqnmqmtemTZtI5zQmot555x1GjBjBl19+SXp6ekzPbBYpfhaSYiD0HoKdcFoeoX4MLFLHh8DHwMk+ZjLGd3v27KGkpISDBw8GHSVq/Cwkq4ETRaS7O4B6FbCk2jbbgQsAROQEoDewzcdMxvjmyy+/BGDMmDGsXbuWDh06BJwoenwrJKpaAdwELAM2A8+q6vsiMlFEJrqb3QmcLSIbgOXAraq6x69MxvjljTfeoHv37qxcuRKAtLTkOkXL1+9WVV8CXqq2bkHI8k7gIj8zGBMNffv2ZeTIkZx00klBRwlE4p0ZY0wUFRYWoqq0bduWJ554gtatWwcdKRBWSIxppHXr1jF48GDmzp0bdJTAJVdHzpgI6t+/P/PmzWPcuHFBRwmctUiMaaAnn3yS4uJiRISJEyfSrFmzoCMFzgqJMQ2wZ88eJk2axO9///ugo8QU69oY0wC5ubm8/fbbSXt0pjbWIjHGgz/84Q88/vjjAPTr148mTZoEnCi2WCExph4VFRW88sorLFu2LOgoMcu6NsbUQlWpqKggPT2dpUuXWiukDtYiMaYWt912G2PHjuXo0aNkZWWRmpoadKSYZS0SY2rRvXt3Dh06ZAXEAyskxoSoqqpix44ddO3alYkTJ9b/BgN46NqIY5yIzHCfdxGRM/2PZkz0TZs2jUGDBrFr166go8QVLy2S+4Eq4HzgDuAAsBA4w8dcxgTihhtuIDc3lxNOOCHoKHHFy2DrWap6I/AVgKp+AdhM7yZhHD16lIULFwLQq1cvfv3rXyfEbTSjyUshOereWkIBRKQNTgvFmISQn5/P5ZdfzqpVq4KOEre8dG3mAv8E2orIXcDlwHRfUxkTRRMnTqRnz56ceaYN/TVWvS0SVX0CuAW4G/gMuFRVn/U7mDF+OnToEL/85S/Zt28fqampjBw5MuhIcc3LUZu/q+oWVZ2vqvNUdbOI/D0a4YzxS1FREQsWLODNN98MOkpC8NK16Rv6xB0vGeRPHGP8paqICOeccw7btm2jffv2QUdKCLW2SERkqogcAPqLyJcicsB9Xgo8H7WExkTI/v37Of/883n11VcBrIhEUK2FRFXvVtXmwBxVbaGqzd1Ha1WdGsWMxkREZWUlZWVllJWVBR0l4dTbtVHVqSLSEjgR+E7I+rf8DGZMpOzbt4/mzZvTqlUrCgoKSEmxa1Ujzctg63jgLZwbXd3ufp3pbyxjIqOsrIzvfve73HzzzQBWRHzi5ad6M87p8J+q6nBgILDb11TGREjTpk258sorGTt2bNBREpqXozZfqepXIoKINFHVLSLS2/dkxoRhx44dVFRU0L17d377298GHSfheSkkxSKSAywG/iUiXwA7/QxlTDhUlcsuu4yKigree+89685EgZfB1svcxZki8gaQDbziaypjwiAiPPjgg6iqFZEoqbOQiEgKsF5VTwVQVTsN0MSsLVu2sHr1aq699loGDbJzJqOpznKtqlXAOhHpEqU8xjTa3XffzS233MKBAweCjpJ0vIyRtAfeF5FVwNdn8qjq6PreKCIjgfuAVOAhVZ1VwzbDgD8D6cAeVR3qJbgx1S1YsICdO3fSvHnzoKMkHS+F5PbGfLB7Tc584EKgGFgtIktUdVPINjk4M7CNVNXtItK2MfsyyauwsJA5c+bw6KOPkpmZSc+ePYOOlJS8DLY2dlzkTOBDVd0GICJPA2OATSHb/BBYpKrb3X2VNnJfJklt2bKFwsJC9uzZQ6dOnYKOk7T8HNLuCOwIeV7srgt1EtBSRFaIyBoRua6mDxKRCSJSKCKFu3fbuXAGDh8+DMC4cePYuHGjFZGA+VlIapr0Uqs9T8OZkuAHwAhguoh86+7MqpqvqnmqmtemTZvIJzVx5e2336Znz56sXbsWgMzMzGADGW+FREQyG3E2azHQOeR5J759Ilsx8IqqlqnqHpxregY0cD8myXTt2pUBAwbQrl27oKMYl5eL9i4B1uKehCYip4nIEg+fvRo4UUS6i0gGcBVQ/X3PA98VkTQRyQLOAjY3IL9JIlu2bEFV6dKlCy+++KIVkhjipUUyE2fgdB+Aqq4FutX3JlWtAG7CuVp4M/Csqr4vIhNFZKK7zWacArUeWIVziHhjQ78Jk/iKioro378/+fn5QUcxNfBy+LdCVfc35j4fqvoS8FK1dQuqPZ8DzGnwh5ukMmDAAO644w6uvPLKoKOYGnhpkWwUkR8CqSJyooj8BXjX51zGAPD888+zZ88eUlJSuO2228jJyQk6kqmBl0Lyc5wJoA8DTwL7gV/4mMkYAEpLS7nmmmuYOXNm0FFMPbx0bXqr6jRgmt9hjAnVtm1bXnvtNfr37x90FFMPLy2Se0Vki4jcKSJ969/cmPDMnz+fpUuXAjB48GCysrICTmTq4+VOe8OBYTjTK+aLyAYRsSmnjC+OHj3Ko48+yuOPPx50FNMAolr9ZNM6Nhbph3P7zitVNcO3VHXIy8vTwsLCIHZtfFZVVUVKSgr79u2jadOmpKenBx3JhBCRNaqaV9NrXk5IO0VEZorIRmAezhEbu7DBRNTtt9/OddddR2VlJTk5OVZE4oyXwda/AU8BF6mqzdVqfJGenk5GRiCNXBMBDeraxALr2iQOVaW0tJQTTjjh6+eNOfHRREejujYi8qz7dYOIrA95bBCR9X6FNcljxowZDBo0iNJSZxoaKyLxq66uzc3u11HRCGKSzxVXXEFKSgo2NUT8q+sm4p+5i5NU9dPQBzApOvFMoqmsrOS1114DoF+/ftx+++3WEkkAXk5Iu7CGdRdHOohJDvPnz+fCCy9kzZo1QUcxEVRr10ZE/gen5dGj2phIc+Adv4OZxPSzn/2Mdu3a2X1nEkxdLZIngUtwJiO6JOQxSFXHRSGbSRCHDx/md7/7HWVlZTRp0oQrrrgi6EgmwuoqJKqqnwA3AgdCHohIK/+jmUTx7rvvctddd/Hqq68GHcX4pK6jNk/iHLFZgzNpc+iImAI9fMxlEsjw4cPZsmULvXr1CjqK8UldR21GuV+7q2oP9+uxhxURU6eDBw8yZswYVq5cCWBFJMF5udbmHBFp6i6PE5F77V7Apj4HDhzggw8+4NNPPw06iokCL9faPAAMEJEBOFf+Pgz8HbB79JpvKSsrIysri/bt27Nu3Tq7fiZJeDmPpEKdC3LGAPep6n04h4CNOc7Bgwc577zzmDbNmUzPikjy8NIiOSAiU4Frce5BkwrYNd7mW5o2bcrQoUM599xzg45iosxLIbkS52bfP1HVXe74iN0+wnxt165dALRr145777034DQmCF6mWtwFPAFki8go4CtVfcz3ZCYuVFVVMWrUKMaMGUO8TUlhIqfeFomIXIHTAlmBcy7JX0Rkiqr+w+dsJg6kpKRwzz33kJaWZhffJTEvXZtpwBmqWgogIm2A1wArJEls27ZtbNy4kdGjRzNs2LCg45iAeTlqk3KsiLg+9/g+k8B+85vfMGHCBMrKyoKOYmKAlxbJKyKyDGfeVnAGX1+qY3uTBPLz8ykuLqZp06ZBRzExwMtg6xTgQaA/MADIV9Vb/Q5mYs/GjRuZOHEiFRUVtGjRgj59+gQdycQIr12Ud4E3gdeBlf7FMbHsrbfe4oUXXmDnTruZgDmel2ttxgOrgMuAy4ECEfmJ38FM7KisrARg0qRJvP/++3TpYpdameN5aZFMAQaq6o9U9XpgEOCpayMiI0Vkq4h8KCK31bHdGSJSKSKXe4ttoqWgoIC+ffuydetWAHJycoINZGKSl0JSjDuhkesAsKO+N7mn0s/Hmd+1D3C1iHyrU+1uNxtY5iWwia7s7Gxyc3PtRt6mTl6O2pQA/xaR53EmNBoDrBKRXwGoam3nRJ8JfKiq2wBE5Gn3vZuqbfdzYCFwRsPjG7/s2LGDzp07c8opp/D222/byWamTl5aJB8Bi3GKCMDzwGc4VwDXdRVwR45vuRS7674mIh1xxl4W1BVARCaISKGIFO7evdtDZBOOoqIievfuzWOPOVdCWBEx9am3RaKqtzfys2v631f9Yow/A7eqamVd/1lVNR/IB+eWnY3MYzw69dRTmTx5MhdfbHcdMd546do0VjHQOeR5J6D6ccM84Gm3iOQC3xeRClVd7GMuU4vly5dzxhln0KJFC2bNmhV0HBNH/DzVfTVwooh0F5EM4CqcW1t8zZ3/tZuqdsO5dmeSFZFg7Nq1i1GjRvHb3/426CgmDvnWIlHVChG5CedoTCrwV1V9X0Qmuq/XOS5ioqtdu3YsWbKEwYMHBx3FxCGpbw4JETkJZ97WE1T1VBHpD4xW1f+NRsDq8vLytLCwMIhdJ6RHHnmE7t27M3SoTcFr6iYia1Q1r6bXvHRt/i8wFTgKoKrrcbopJs4dOXKEe+65h7/85S9BRzFxzkvXJktVV1U7qlLhUx4TJapKRkYGr7/+Oi1atAg6jolzXloke0SkJ+6hW/c09s98TWV8dc899zB58mRUlbZt2/Kd73wn6EgmznlpkdyIcw7HySJSAnwM2E3E49iuXbsoLS2lsrKStDQ/zwAwyaLewdavN3Tutpeiqgfq3dhHNtjaOKrKl19+SXZ2NqpKVVUVqampQccycaSuwVYvkz/PqPYcAFW9IyLpTFTceeedPP7446xcuZLWrVtbETER5aVdGzop53eAUcBmf+IYv1x44YXs37+fli1bBh3FJCDPXZuv3yDSBFiiqiP8iVQ369p4V1VVxerVqznrrLOCjmISQLjnkVSXBfQIL5KJhrlz53L22Wezdu3aoKOYBOdljGQD31y1mwq0AWx8JA789Kc/pVmzZgwYMCDoKCbBeRkjGRWyXAH8V1XthLQYVVFRwdy5c7npppto2rQp48ePDzqSSQJ1dm1EJAV4UVU/dR8lVkRi2/Lly/n1r3/Niy++GHQUk0TqbJGoapWIrBORLqq6PVqhTOONGDGCoqIiTjvttKCjmCTiZbC1PfC+iCwXkSXHHn4HM96Vl5czbtw4NmzYAGBFxESdlzGSxk61aKKktLSUt956i4suuoh+/foFHcckIS+F5PvVb9EpIrNx7rxnAnTkyBEyMjLo2rUrmzdvtvvwmsB46dpcWMM6mxU4YAcPHmT48OHMnj0bwIqICVStLRIR+R9gEtBDRNaHvNQceMfvYKZumZmZnHTSSfTq1SvoKMbU2bV5EngZuBsIvd3mAVXd62sqU6vPP/+clJQUWrZsyd/+9reg4xgD1FFIVHU/sB+4OnpxTF2qqqoYOXIkWVlZrFixwm5cZWJGws1qs7iohDnLtrJzXzkdcjKZMqI3lw7sWP8b40BKSgrTp08nKyvLioiJKQlVSBYXlTB10QbKj1YCULKvnKmLnHMr4rmYlJSU8J///Idhw4YxevTooOMY8y1+3iAr6uYs2/p1ETmm/Gglc5ZtDShRZEyePJmrrrqKQ4cOBR3FmBolVItk577yBq2PFwsWLGD79u1kZWUFHcWYGiVUi6RDTmaD1sey//znP0ydOpWqqiratGnDoEGDgo5kTK0SqpBMGdGbzPTj5yLNTE9lyojeASVqvH/+85889NBDFBcXBx3FmHo1eKrFoNU31WK8H7VRVUQEVeW///0v7dq1CzqSMUCYs8jHm0sHdoyrwhGqqKiICRMmsHDhQrp06WJFxMSNhOraxLuqqiqOHDnC0aNHg45iTIMkXIskHu3Zs4fc3FwGDRpEUVERKSlW30188fV/rIiMFJGtIvKhiNxWw+vXiMh69/GuiCTdLMVFRUX07NmT5557DsCKiIlLvv2vFZFUYD7OlAN9gKtFpE+1zT4Ghqpqf+BOnHsMJ5WTTz6ZH/7wh5x77rlBRzGm0fz883cm8KGqblPVI8DTwJjQDVT1XVX9wn1aAHTyMU9MWblyJV999RWZmZk88MADtG/fPuhIxjSan4WkI7Aj5Hmxu642N+BMW/AtIjJBRApFpHD37t0RjBiMnTt3cv755zNt2rSgoxgTEX4OttZ0eWqNJ62IyHCcQlJj+15V83G7PXl5efF14ksNOnTowBNPPMHw4cODjmJMRPjZIikGOoc87wTsrL6RiPQHHgLGqOrnPuYJ3LPPPsuaNWsAGDt2rN3Q2yQMPwvJauBEEekuIhnAVcBxt7EQkS7AIuBaVf3AxyyBO3z4MFOnTuWuu+4KOooxEedb10ZVK0TkJmAZzj2D/6qq74vIRPf1BcAMoDVwvztRT0Vtp+DGuyZNmvDGG2/QunXroKMYE3EJd61NrLn//vspLS1l5syZQUcxJixJda1NLFFV3nvvPUpLS6moqCAtzX7cJjHZ/2yflJeXk5mZyYMPPkhlZaUVEZPQ7HxsH8yaNYuzzz6b/fv3k5qaSkZGRtCRjPGV/Zn0wemnn862bdto1qxZ0FGMiQprkUSIqrJp0yYALrroIvLz80lNTa3nXcYkBiskEXLfffcxcOBANm7cGHQUY6LOujYRcv3111NVVUXfvn2DjmJM1FmLJAyVlZU8/PDDVFZW0rJlS371q1/ZHfBMUrJCEoaXX36Z8ePHs2TJkvo3NiaBWdcmDKNGjeLNN9/kvPPOCzqKMYGyFkkDHTlyhEmTJvHRRx8BWBExBiskDbZ9+3aee+45VqxYEXQUY2KGdW08qqqqIiUlhV69erF161ZatWoVdCRjYoa1SDw4dOgQI0aM4MEHHwSwImJMNVZIPEhLSyMrK4usrKygoxgTk6xrU4f9+/eTlpZG06ZNWbx4sZ0jYkwtrEVSi8rKSkaOHMnll1/+9Y29jTE1sxZJLVJTU7nxxhvJzs62ImJMPayQVFNaWsr27dvJy8tj3LhxQccxJi5Y16aa8ePHM3r0aMrLy4OOYkzcsBZJNfPnz+fTTz8lMzMz6CjGxA1rkQCffPIJs2fPRlXp3Lmz3dDbmAayQgI88sgjzJo1i5KSkqCjGBOXrJAAM2bMoKioiE6dOgUdxZi4lLSFZNOmTXzve99j9+7dpKSk0K1bt6AjGRO3kraQ7N69m08++YS9e/cGHcWYuJd0R20OHjxIs2bNGDp0KJs3byY9PT3oSMbEvaRqkaxbt44ePXrw0ksvAVgRMSZCkqqQdO/enQsuuMBmejcmwpKikKxbt46KigpatGjBU089RdeuXYOOZExCSfhCsmPHDoYMGcL06dODjmJMwvJ1sFVERgL3AanAQ6o6q9rr4r7+feAQ8CNVfS+cfS4uKmHOsq3s3FdOh5xMpozozbx587jkkkvC+VhjTB18a5GISCowH7gY6ANcLSJ9qm12MXCi+5gAPBDOPhcXlTB10QZK9pVT9uEqPvnoA6Yu2kCrgSNo06ZNOB9tjKmDn12bM4EPVXWbqh4BngbGVNtmDPCYOgqAHBFp39gdzlm2lfKjlVQdPczeZfP54s1HKT9ayZxlWxv/XRhj6uVn16YjsCPkeTFwlodtOgKfhW4kIhNwWix06dKl1h3u3Odc+p+S3oQTrrqL1GatjltvjPGHny2SmqYV00Zsg6rmq2qequbV1UXpkPPNpf/prTuR0iTrW+uNMZHnZyEpBjqHPO8E7GzENp5NGdGbzPTU49ZlpqcyZUTvxn6kMcYDPwvJauBEEekuIhnAVUD1u20vAa4Tx2Bgv6p+Vv2DvLp0YEfuHtuPjjmZCNAxJ5O7x/bj0oEdG/1NGGPq59sYiapWiMhNwDKcw79/VdX3RWSi+/oC4CWcQ78f4hz+/XG4+710YEcrHMZEma/nkajqSzjFInTdgpBlBW70M4Mxxn8Jf2arMcZ/VkiMMWGzQmKMCZsVEmNM2MQZ74wfIrIb+NTDprnAHp/jhMsyhi/W80HsZ/Sar6uq1nhGaNwVEq9EpFBV84LOURfLGL5YzwexnzES+axrY4wJmxUSY0zYErmQ5AcdwAPLGL5YzwexnzHsfAk7RmKMiZ5EbpEYY6LECokxJmxxX0hEZKSIbBWRD0XkthpeFxGZ676+XkROj8GM17jZ1ovIuyIyIJbyhWx3hohUisjl0czn7rvejCIyTETWisj7IvJmLOUTkWwReUFE1rn5wr7SvYH5/ioipSKysZbXw/s9UdW4feBMT/AR0APIANYBfapt833gZZzZ2AYD/47BjGcDLd3li6OZ0Uu+kO1ex7ma+/IY/BnmAJuALu7ztjGW7zfAbHe5DbAXyIhixvOA04GNtbwe1u9JvLdIoj7BtB8ZVfVdVf3CfVqAM1NczORz/RxYCJRGMdsxXjL+EFikqtsBVDWaOb3kU6C5ewuWZjiFpCJaAVX1LXeftQnr9yTeC0ltk0c3dBs/NXT/N+D8ZYiWevOJSEfgMmABwfDyMzwJaCkiK0RkjYhcF7V03vLNA07BmUp0A3CzqlZFJ54nYf2e+DqxURREbIJpH3nev4gMxykk5/qaqNpua1hXPd+fgVtVtdL5gxp1XjKmAYOAC4BMYKWIFKjqB36Hw1u+EcBa4HygJ/AvEXlbVb/0OZtXYf2exHshifoE043gaf8i0h94CLhYVT+PUjbwli8PeNotIrnA90WkQlUXRyWh93/nPapaBpSJyFvAACAahcRLvh8Ds9QZkPhQRD4GTgZWRSGfF+H9nkRrsMenAaQ0YBvQnW8GufpW2+YHHD+ItCoGM3bBmbf27Fj8GVbb/hGiP9jq5Wd4CrDc3TYL2AicGkP5HgBmussnACVAbpR/jt2ofbA1rN+TuG6RaEATTPuQcQbQGrjf/atfoVG6WtRjvkB5yaiqm0XkFWA9UIVzr+kaD3UGkQ+4E3hERDbg/LLeqqpRm1pARJ4ChgG5IlIM/A5ID8kX1u+JnSJvjAlbvB+1McbEACskxpiwWSExxoTNCokxJmxWSIwxYbNCEudEZLKIbBaRJ+rYZpiILI1mrtqIyOhjV8eKyKUi0ifktTtE5HtRzDJMRM6O1v4SWVyfR2IAmIRzNuzHQQfxQlWXAEvcp5cCS3Gu2kVVZ0R6fyKSpqq1XRw3DDgIvBvp/SYba5HEMRFZgHPp+hIR+aWInOnOZ1Lkfu1dw3uGunN2rHW3a+6unyIiq925KG6vZX8HReSPIvKeiCwXkTbu+tNEpMB97z9FpKW7frKIbHLXP+2u+5GIzHNbAqOBOW6WniLyiIhcLiIXi8izIfsdJiIvuMsXichKN8NzItKshpwrROT37pwkN4vIJSLyb/f7fU1EThCRbsBE4Jfu/r8rIm1EZKH7c1gtIueE8++TVKJ5iq49fDnt+RPcU62BFkCau/w9YKG7PAxY6i6/AJzjLjfDaZVehDMBsOD8cVkKnFfDvhS4xl2eAcxzl9cDQ93lO4A/u8s7gSbuco779Uch73uEkNPtjz13M20HmrrrHwDG4Vzn81bI+luBGTXkXAHcH/K8Jd+cfDke+KO7PBP4PyHbPQmc6y53ATYH/e8bLw/r2iSWbOBRETkR55c+vYZt3gHudcdUFqlqsYhchFNMitxtmgEn4vzShqoCnnGXHwcWiUg2TpE4NiPZo8Bz7vJ64AkRWQws9vpNqHPK+SvAJSLyD5zrQG4BhgJ9gHfcSwkygJW1fMwzIcudgGfc+TUygNq6gd8D+oRc4dxCRJqr6gGv2ZOVFZLEcifwhqpe5jbdV1TfQFVniciLONdVFLiDmwLcraoPNnB/9V1f8QOcmblGA9NFpG8DPvsZ4EacyXhWq+oBd1Kgf6nq1R7eXxay/BfgXlVdIiLDcFoiNUkBhqhqeQNyGmyMJNFk41xVCk4X4ltEpKeqblDV2UAhzqXsy4CfHBtvEJGOItK2hren4HQ9wJmR7P+p6n7gCxH5rrv+WuBNEUkBOqvqGzitiRyclk6oA0DzWr6XFThTA/6Ub1oXBcA5ItLLzZklIifV8v5QoT+X6+vY/6vATceeiMhpHj7bYIUk0fwBuFtE3sG5CrUmvxCRjSKyDigHXlbVV3HGB1a6V6f+g5p/wcuAviKyBmeCnjvc9dfjDJquB05z16cCj7ufVwT8SVX3Vfu8p4Ep7iBoz9AXVLUSZ6zmYvcrqrobp0A+5e6rAKcQ1mcm8JyIvM3xN8t+Abjs2GArMBnIcweHN+EMxhoP7Opf45mIHFTVbx0lMcZaJMaYsFmLxBgTNmuRGGPCZoXEGBM2KyTGmLBZITHGhM0KiTEmbP8fEYuHxSgxtMgAAAAASUVORK5CYII=\n", - "text/plain": [ - "
" - ] - }, - "metadata": { - "needs_background": "light" - }, - "output_type": "display_data" - } - ], - "source": [ - "# ROC for probabilistic forecasts and bin_edges='continuous' default\n", - "roc = xs.roc(obs3 > 0.5, (fct3 > 0.5).mean(\"member\"), return_results='all_as_metric_dim')\n", - "\n", - "plt.figure(figsize=(4, 4))\n", - "plt.plot([0, 1], [0, 1], 'k:')\n", - "roc.to_dataset(dim='metric').plot.scatter(y='true positive rate', x='false positive rate')\n", - "roc.sel(metric='area under curve').values[0]" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Contingency-Based\n", - "\n", - "To work with contingency-based scoring, first instantiate a `Contingency` object by passing in your observations, forecast, and observation/forecast bin edges. See https://www.cawcr.gov.au/projects/verification/#Contingency_table for more information." - ] - }, - { - "cell_type": "code", - "execution_count": 24, - "metadata": {}, - "outputs": [], - "source": [ - "dichotomous_category_edges = np.array([0, 0.5, 1]) # \"dichotomous\" mean two-category\n", - "dichotomous_contingency = xs.Contingency(\n", - " obs, fct, dichotomous_category_edges, dichotomous_category_edges, dim=[\"lat\", \"lon\"]\n", - ")\n", - "dichotomous_contingency_table = dichotomous_contingency.table\n", - "print(dichotomous_contingency_table)" - ] - }, - { - "cell_type": "code", - "execution_count": 26, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
histogram_observations_forecasts
observations_category12
observations_category_bounds[0.0, 0.5)[0.5, 1.0]
forecasts_categoryforecasts_category_bounds
1[0.0, 0.5)5.334.67
2[0.5, 1.0]5.334.67
\n", - "
" - ], - "text/plain": [ - " histogram_observations_forecasts \\\n", - "observations_category 1 \n", - "observations_category_bounds [0.0, 0.5) \n", - "forecasts_category forecasts_category_bounds \n", - "1 [0.0, 0.5) 5.33 \n", - "2 [0.5, 1.0] 5.33 \n", - "\n", - " \n", - "observations_category 2 \n", - "observations_category_bounds [0.5, 1.0] \n", - "forecasts_category forecasts_category_bounds \n", - "1 [0.0, 0.5) 4.67 \n", - "2 [0.5, 1.0] 4.67 " - ] - }, - "execution_count": 26, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "print(\n", - " dichotomous_contingency_table.to_dataframe()\n", - " .pivot_table(\n", - " index=[\"forecasts_category\", \"forecasts_category_bounds\"],\n", - " columns=[\"observations_category\", \"observations_category_bounds\"],\n", - " )\n", - " .round(2)\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Scores based on the constructed contingency table can be called via class methods. The available methods are:\n", - "\n", - "* Accuracy (`accuracy`)\n", - "* Bias Score (`bias_score`)\n", - "* Equitable Threat Score (`equit_threat_score`)\n", - "* False Alarms / False Positives (`false_alarms`)\n", - "* False Alarm Ratio / False Discovery Rate (`false_alarm_ratio`)\n", - "* False Alarm Rate / False Positive Rate / Fall-out (`false_alarm_rate`)\n", - "* Gerrity Score (`gerrity_score`)\n", - "* Heidke Score / Cohen's Kappa (`heidke_score`)\n", - "* Hit Rate / Recall / Sensitivity / True Positive Rate (`hit_rate`)\n", - "* Hits / True Positives (`hits`)\n", - "* Misses / False Negatives (`misses`)\n", - "* Odds Ratio (`odds_ratio`)\n", - "* Odds Ratio Skill Score (`odds_ratio_skill_score`)\n", - "* Peirce Score (`peirce_score`)\n", - "* Receiver Operating Characteristic (`roc`)\n", - "* Success Ratio / Precision / Positive Predictive Value (`success_ratio`)\n", - "* Threat Score / Critical Success Index (`threat_score`)\n", - "\n", - "Below, we share a few examples of these in action:" - ] - }, - { - "cell_type": "code", - "execution_count": 27, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "array([1. , 1.11111111, 1.1 ])\n", - "Coordinates:\n", - " * time (time) object 2000-01-01 00:00:00 ... 2000-01-03 00:00:00\n" - ] - } - ], - "source": [ - "print(dichotomous_contingency.bias_score())" - ] - }, - { - "cell_type": "code", - "execution_count": 28, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "array([0.33333333, 0.55555556, 0.6 ])\n", - "Coordinates:\n", - " * time (time) object 2000-01-01 00:00:00 ... 2000-01-03 00:00:00\n" - ] - } - ], - "source": [ - "print(dichotomous_contingency.hit_rate())" - ] - }, - { - "cell_type": "code", - "execution_count": 29, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "array([0.54545455, 0.45454545, 0.5 ])\n", - "Coordinates:\n", - " * time (time) object 2000-01-01 00:00:00 ... 2000-01-03 00:00:00\n" - ] - } - ], - "source": [ - "print(dichotomous_contingency.false_alarm_rate())" - ] - }, - { - "cell_type": "code", - "execution_count": 30, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "array([-0.41176471, 0.2 , 0.2 ])\n", - "Coordinates:\n", - " * time (time) object 2000-01-01 00:00:00 ... 2000-01-03 00:00:00\n" - ] - } - ], - "source": [ - "print(dichotomous_contingency.odds_ratio_skill_score())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now we can leverage multi-category edges to make use of some scores." - ] - }, - { - "cell_type": "code", - "execution_count": 31, - "metadata": {}, - "outputs": [], - "source": [ - "multi_category_edges = np.array([0, 0.25, 0.75, 1])\n", - "multicategory_contingency = xs.Contingency(\n", - " obs, fct, multi_category_edges, multi_category_edges, dim=[\"lat\", \"lon\"]\n", - ")" - ] - }, - { - "cell_type": "code", - "execution_count": 32, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "array([0.25, 0.25, 0.5 ])\n", - "Coordinates:\n", - " * time (time) object 2000-01-01 00:00:00 ... 2000-01-03 00:00:00\n" - ] - } - ], - "source": [ - "print(multicategory_contingency.accuracy())" - ] - }, - { - "cell_type": "code", - "execution_count": 33, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "array([-0.14503817, -0.25 , 0.2481203 ])\n", - "Coordinates:\n", - " * time (time) object 2000-01-01 00:00:00 ... 2000-01-03 00:00:00\n" - ] - } - ], - "source": [ - "print(multicategory_contingency.heidke_score())" - ] - }, - { - "cell_type": "code", - "execution_count": 34, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "array([-0.1496063 , -0.24193548, 0.25 ])\n", - "Coordinates:\n", - " * time (time) object 2000-01-01 00:00:00 ... 2000-01-03 00:00:00\n" - ] - } - ], - "source": [ - "print(multicategory_contingency.peirce_score())" - ] - }, - { - "cell_type": "code", - "execution_count": 35, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "array([-0.15212912, -0.11160714, 0.25 ])\n", - "Coordinates:\n", - " * time (time) object 2000-01-01 00:00:00 ... 2000-01-03 00:00:00\n" - ] - } - ], - "source": [ - "print(multicategory_contingency.gerrity_score())" - ] - }, - { - "cell_type": "code", - "execution_count": 36, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "0.5035528250988777" - ] - }, - "execution_count": 36, - "metadata": {}, - "output_type": "execute_result" - }, - { - "data": { - "image/png": "iVBORw0KGgoAAAANSUhEUgAAARIAAAEGCAYAAACpcBquAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMywgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/Il7ecAAAACXBIWXMAAAsTAAALEwEAmpwYAAAlKUlEQVR4nO3deXwV5dnw8d+VjSYsCRiQfRVRFBCJCqKCWgVbBOXlU9Gito+U8qjFtr6olMKD+lig2EUKinm1VetupYi4YEVRq6QQDZssFlEhQRoQwxohy/X+MQMewkkyOSdzJifn+n4++WRmzpy5L0LOlXvuuRdRVYwxJhpJQQdgjIl/lkiMMVGzRGKMiZolEmNM1CyRGGOilhJ0AHWVnZ2tXbt2DToMYxLOhx9+uFtVW4d7Le4SSdeuXcnPzw86DGMSjoh8Ud1rdmtjjImaJRJjTNQskRhjomaJxBgTNUskxpio+ZZIROTPIlIsIuureV1EZK6IbBGRtSJytl+xGGNOtKigiMGz3qLbXa8weNZbLCooivhaftZIHgOG1/D6FUBP92sC8JCPsRhjQiwqKGLKwnUUlZSiQFFJKVMWros4mfiWSFT1XWBPDaeMAp5QRx6QJSLt/IrHGPOtOUs3U1pWQcXBrzk6lUhpWQVzlm6O6HpBtpF0ALaH7Be6x04gIhNEJF9E8nft2hWT4IxpzHaUlKKq7Fp4H/tWLjzueCSCTCQS5ljYWZZUNVdVc1Q1p3XrsD10jTF10D4rHREhe+QdZPQceNzxSASZSAqBTiH7HYEdAcViTMJYuXIlXba9xndSkkjJbENqK+dGID01mcnDekV0zSATyWLgBvfpzUBgr6p+GWA8xiSE5557jn8tXci0y7vQISsdATpkpTNzdB+u6h+2daFW4tecrSLyDDAUyAb+A/wPkAqgqgtERIB5OE92DgE/VtVaR+Pl5OSoDdozpu5UFRGhsrKSPXv2kJ2dXaf3i8iHqpoT7jXfRv+q6rW1vK7ALX6Vb4z51rJly5g6dSpLliwhOzu7zkmkNtaz1ZgEoapUVlb6cm1LJMY0Yke7S1x66aXk5eXRpk0bX8qxRGJMI/Xmm2/StWtX3n77bQCcZkl/WCIxppE655xzuPHGGzn7bP+HsVkiMaaRWb58OeXl5WRmZvLggw+SmZnpe5mWSIxpRNavX88ll1zCH/7wh5iWG3eTPxtjqnfmmWfy9NNPM2rUqJiWazUSYxqB3NxcPvnkEwDGjh1LenpkY2YiZYnEmDi3Z88epk2bxgMPPBBYDHZrY0yca9WqFStWrKBTp061n+wTq5EYE4dUlenTpzNv3jwAunfvTmpqamDxWI3EmDhUWVnJunXryM7OPjYYL0iWSIyJI6rKN998Q3p6Os899xwpKSmBJxGwWxtj4srPf/5zhg8fzuHDh0lLSyMpqWF8hK1GYkwcGTRoEBkZGaSlpQUdynF8m9jILzaxkUk0FRUVbN68md69ewcaR00TGzWMepExplpTp07lvPPOo6go8gWs/Ga3NsY0cJMmTaJ79+506BDZfKqxYDUSYxqgw4cP8+ijj6KqtG/fngkTJtR7GfGyZKcxJkJPPPEE48ePJy8vz5frx82SncaYyI0fP55//vOfDBo0yJfrH12yM1S8LtlpjAmxf/9+brrpJnbu3ImIMHjwYN/Kqm5pznhcstMYE+KTTz7hxRdfZNWqVb6XVd3SnPG4ZKcxBqefCMCAAQP47LPPuPLKK30vc/KwXqSnJh93LF6X7DQm4X311VcMGjSIv/3tbwC0bNkyJuVe1b8DM0f3qbclO60fiTEBatKkCZmZmTRv3jzmZV/Vv0PEiaMqSyTGBKC4uJiWLVvSrFkz3njjjQYxgjcadmtjTIwdOnSICy644Fgns3hPImA1EmNiLiMjg0mTJjFgwICgQ6k3lkiMiZFPP/2Uw4cP07t3b2699dagw6lXlkiMiQFV5dprr6W0tJQ1a9Y0mAmJ6ouviUREhgMPAMnAI6o6q8rrmcCTQGc3lvtV9S9+xmRMEESEJ554grKyskaXRMDHxlYRSQbmA1cAvYFrRaTqzCy3ABtUtR8wFPidiDSsqZ+MicLatWuPzfR+2mmn0adPn4Aj8oefqfFcYIuqblXVI8CzQNV1BBVoLk6zdTNgD1DuY0zGxNRDDz3E7Nmz2bt3b9Ch+MrPRNIB2B6yX+geCzUPOB3YAawDblPVyqoXEpEJIpIvIvm7du3yK15j6t3cuXPJy8sjMzMz6FB85WciCfdwvOoEscOA1UB74Cxgnoi0OOFNqrmqmqOqOa1bt67vOI2pV++//z7Dhg1j3759pKamNuiZzeqLn4mkEAhdQ7AjTs0j1I+BherYAnwGnOZjTMb4bvfu3RQVFXHgwIGgQ4kZPxPJKqCniHRzG1DHAournLMNuBRARE4GegFbfYzJGN/s27cPgFGjRrF69Wrat28fcESx41siUdVy4FZgKbAReF5VPxaRiSIy0T3tXuB8EVkHLAPuVNXdfsVkjF/uzX2ek9p2pN319zN41lssWfefoEOKKV/7kajqq8CrVY4tCNneAVzuZwzG+G1RQRFPbKokrdvZpLTqcGz+U6DeRtc2dI2vZ4wxMZSfn89vX99EWVpzWl85meR051lBNPOfxiNLJMZEaM2aNQwcOJDNy54L+3qk85/GI0skxkSob9++zJs3j54XVu1n6Yh0/tN4ZInEmDp6+umnKSwsRESYOHEid408q17nP41HlkiMqYPdu3dz880385vf/ObYsfqe/zQe2TQCxtRBdnY27733Hqeeeupxx+tz/tN4ZDUSYzz47W9/y5NPPglAnz59aNKkScARNSyWSIypRXl5Oa+//jpLly4NOpQGy25tjKmGqlJeXk5qaipLliyxWkgNrEZiTDXuuusuRo8eTVlZGRkZGSQnJ9f+pgRlNRJjqtGtWzcOHTpkCcQDSyTGhKisrGT79u106dKFiRMn1v4GA3i4tRHHOBGZ7u53FpFz/Q/NmNibOnUqAwYMYOfOnUGHEle81EgeBCqBS4B7gP3Ai8A5PsZlTCBuuukmsrOzOfnkk4MOJa54aWw9T1VvAb4BUNWvAZvp3TQaZWVlvPjiiwCccsop3H777Y1iGc1Y8pJIytylJRRARFrj1FCMaRRyc3MZM2YMK1euDDqUuOXl1mYu8HegjYjcB4wBpvkalTExNHHiRHr06MG551rTX6RqrZGo6lPAHcBM4EvgKlV93u/AjPHToUOH+MUvfkFJSQnJyckMHz486JDimpenNn9V1U2qOl9V56nqRhH5ayyCM8YvBQUFLFiwgHfeeSfoUBoFL7c2Z4TuuO0lA/wJxxh/qSoiwuDBg9m6dSvt2rWr8fxFBUXMWbqZHSWltM9KZ/KwXgk9yrc61dZIRGSKiOwH+orIPhHZ7+4XAy/FLEJj6snevXu55JJLeOONNwA8JZEpC9dRVFKKwrFJnRcVFMUg2vhSbSJR1Zmq2hyYo6otVLW5+3WSqk6JYYzG1IuKigoOHjzIwYMHPZ0/Z+lmSssqjjuWaJM6e1XrrY2qThGRlkBP4Dshx9/1MzBj6ktJSQnNmzenVatW5OXlkZTkbaxqdZM3J9Kkzl55aWwdD7yLs9DV3e73Gf6GZUz9OHjwIBdeeCG33XYbgOckAtVP3pxIkzp75eWnehtOd/gvVPVioD+wy9eojKknTZs25ZprrmH06NF1fu/kYb0SflJnr7w8tflGVb8REUSkiapuEhH7SZoGbfv27ZSXl9OtWzd+/etfR3SNo09n7KlN7bwkkkIRyQIWAf8Qka+BHX4GZUw0VJWrr76a8vJyPvroozrdzlSV6JM6e+WlsfVqd3OGiLwNZAKv+xqVMVEQER5++GFUNaokYryrMZGISBKwVlXPBFBV6wZoGqxNmzaxatUqrr/+egYMsD6TsVRjulbVSmCNiHSOUTzGRGzmzJnccccd7N+/P+hQEo6XNpJ2wMcishI41pNHVUfW9kYRGQ48ACQDj6jqrDDnDAX+CKQCu1V1iJfAjalqwYIF7Nixg+bNmwcdSsLxkkjujuTC7pic+cBlQCGwSkQWq+qGkHOycGZgG66q20SkTSRlmcSVn5/PnDlzePzxx0lPT6dHjx5Bh5SQvDS2Rtouci6wRVW3AojIs8AoYEPIOdcBC1V1m1tWcYRlmQS1adMm8vPz2b17Nx07dgw6nITlZ5N2B2B7yH6heyzUqUBLEVkuIh+KyA3hLiQiE0QkX0Tyd+2yvnAGDh8+DMC4ceNYv369JZGA+ZlIwk16qVX2U3CmJPg+MAyYJiKnnvAm1VxVzVHVnNatW9d/pCauvPfee/To0YPVq1cDkJ5uXdaD5imRiEh6BL1ZC4FOIfsdObEjWyHwuqoeVNXdOGN6+tWxHJNgunTpQr9+/Wjbtm3QoRiXl0F7VwKrcTuhichZIrLYw7VXAT1FpJuIpAFjgarvewm4UERSRCQDOA/YWIf4TQLZtGkTqkrnzp155ZVXLJE0IF5qJDNwGk5LAFR1NdC1tjepajlwK85o4Y3A86r6sYhMFJGJ7jkbcRLUWmAlziPi9XX9R5jGr6CggL59+5Kbmxt0KCYML49/y1V1byTrfKjqq8CrVY4tqLI/B5hT54ubhNKvXz/uuecerrnmmqBDMWF4qZGsF5HrgGQR6SkifwI+8DkuYwB46aWX2L17N0lJSdx1111kZWUFHZIJw0si+RnOBNCHgaeBvcDPfYzJGACKi4v54Q9/yIwZM4IOxdTCy61NL1WdCkz1OxhjQrVp04Y333yTvn37Bh2KqYWXGsnvRWSTiNwrImfUfrox0Zk/fz5LliwBYODAgWRkZAQckamNl5X2LgaG4kyvmCsi60QksimnjKlFWVkZjz/+OE8++WTQoZg6ENWqnU1rOFmkD87yndeoappvUdUgJydH8/Pzgyja+KyyspKkpCRKSkpo2rQpqampQYdkQojIh6qaE+41Lx3STheRGSKyHpiH88TGBjaYenX33Xdzww03UFFRQVZWliWROOOlsfUvwDPA5apqc7UaX6SmppKWFkgl19SDOt3aNAR2a9N4qCrFxcWcfPLJx/Yj6fhoYiOiWxsRed79vk5E1oZ8rRORtX4FaxLH9OnTGTBgAMXFzjQ0lkTiV023Nre530fEIhCTeH7wgx+QlJSETQ0R/6pNJKr6pbt5s6reGfqaiMwG7jzxXcaEt6igiDlLN1O05wBN92zmvluu46r+fejTp4/vZdriVv7z0iHtsjDHrqjvQEzjtaigiCkL11FUUsq+j15hw6N38Iv5C1lUUBSTMhUoKillysJ1vpaZyGpqI/lvEVkH9KrSRvIZzrB/YzyZs3QzpWUVADQ/6wqyR96JZndnztLNMSnzqNKyCl/LTGQ1tZE8DbwGzATuCjm+X1X3+BqVaVSKdu+jZMXztDjv/5CU9h2ann4hADtKSn0rs7pr+1lmIqvp1kZV9XPgFmB/yBci0sr/0Exj0Wzvp+xd8RzffP7RccfbZ/k312p11/azzERWW41kBPAhzqTNoc/mFOjuY1ymEfnfm8dye1pLKpqffOxYemoyk4fVdRpg7yYP68WUheuOu73xu8xEVm2NRFVHuN+7qWp39/vRL0sipkYHDhxg1KhRrFixgqv6d+B344fRISsdATpkpTNzdB9fn6Bc1b8DM0f3iWmZiazWLvIiMhhYraoHRWQccDbwx6OLWhkTzv79+/nkk0/44osvGDRoEFf17xDzD3EQZSYqL2NtHgL6iUg/nJG/jwJ/BWyNXnOCgwcPkpGRQbt27VizZo2Nn0kQXvqRlKszIGcU8ICqPgDYKs3mBAcOHOCiiy5i6lRnMj1LIonDS41kv4hMAa7HWYMmGbAx3uYETZs2ZciQIVxwwQVBh2JizEsiuQZnse//UtWdItIZWz7ChNi5cycAbdu25fe//33A0Zgg1JpI3OTxFHCOiIwAVqrqE/6HZiIVyzEmlZWVjBgxguTkZPLy8mwEb4Ly8tTmBzg1kOU4fUn+JCKTVfVvPsdmInB0jMnR/hNHx5gAviSTpKQk7r//flJSUiyJJDAvtzZTgXNUtRhARFoDbwKWSBqgmsaY1Gci2bp1K+vXr2fkyJEMHTq03q5r4pOXpzZJR5OI6yuP7zMBiNUYk1/96ldMmDCBgwcP1ut1TXzyUiN5XUSW4szbCk7j66s1nG8C1D4rnaIwSaO+x5jk5uZSWFhI06ZN6/W6Jj55WddmMvAw0BfoB+RWnejINByTh/UiPTX5uGP1NcZk/fr1TJw4kfLyclq0aEHv3r2jvqZpHLzeonwAvAO8BazwLxwTLT/HmLz77ru8/PLL7NhhiwmY49U6i7yIjAem4yQRwekaf4+q/tn/8E5ks8jHXkVFBcnJTi2npKSErKysYAMygYhqgSxgMtBfVX+kqjcCA/A4X6uIDBeRzSKyRUTuquG8c0SkQkTGeLmuiZ28vDzOOOMMNm92ZhazJGLC8ZJICnEnNHLtB7bX9ia3K/18nPldewPXisgJN9XuebOBpV4CNrGVmZlJdna2LeRtauTlqU0R8C8ReQlnQqNRwEoR+SWAqlbXJ/pcYIuqbgUQkWfd926oct7PgBeBc+oevvHL9u3b6dSpE6effjrvvfeedTYzNfJSI/kUWISTRABeAr7EGQFc0yjgDhxfcyl0jx0jIh2Aq4EFNQUgIhNEJF9E8nft2uUhZBONgoICevXqxRNPOCMhLImY2ngZa3N3hNcO99tXtWX3j8CdqlpR0y+rquYCueA0tkYYj/HozDPPZNKkSVxxRfhVR2y9GFOVl1ubSBUCnUL2OwJVnxvmAM+6SSQb+J6IlKvqIh/jMtVYtmwZ55xzDi1atGDWrFlhz4n1WB4TH/zs6r4K6Cki3UQkDRgLLA49wZ3/tauqdsUZu3OzJZFg7Ny5kxEjRvDrX/+6xvNsvRgTjm81ElUtF5FbcZ7GJAN/VtWPRWSi+3qN7SImttq2bcvixYsZOHBgjefZejEmHC/TCJyKM2/ryap6poj0BUaq6v/W9l5VfZUq43KqSyCq+iNPEZt69dhjj9GtWzeGDBnCZZeFW531eLEay2Pii5dbm/8HTAHKAFR1Lc5tiolzR44c4f777+dPf/qT5/f4OZbHxC8vtzYZqrqyylOVcp/iMTGiqqSlpfHWW2/RokULz+872qBqT21MKC+JZLeI9MB9dOt2Y//S16iMr+6//36++OIL5s6dS5s2ber8flsvxlTlJZHcgtOH4zQRKQI+A8b5GpXx1c6dOykuLqaiooKUFD97AJhEUevo32MnijTFmS1tf60n+yheRv82tE5bqsq+ffvIzMxEVamsrDw2otcYL2oa/evlqc30KvsAqOo99RJdI9QQO23de++9PPnkk6xYsYKTTjrJkoipV17qtaGTcn4HGAFs9CecxiFWEzDXxWWXXcbevXtp2bJlIOWbxs3LWJvfhe6LyP1U6aFqjtdQOm1VVlayatUqzjvvPAYNGsSgQYNiWr5JHJF0kc8Autd3II1JdZ2zYt1pa+7cuZx//vmsXr06puWaxOOljWQd347aTQZaA9Y+UoPJw3od10YCwXTa+slPfkKzZs3o169fTMs1icdLG8mIkO1y4D+qah3SahBkp63y8nLmzp3LrbfeStOmTRk/frzvZRpTYyIRkSTgFVU9M0bxNBpBddpatmwZt99+O926dePqq6+OefkmMdWYSFS1UkTWiEhnVd0Wq6BM5IYNG0ZBQQFnnXVW0KGYBOKlsbUd8LGILBORxUe//A7MeFdaWsq4ceNYt87pq2JJxMSalzaSSKdaNDFSXFzMu+++y+WXX06fPn2CDsckIC+J5HtVl+gUkdk4K++ZAB05coS0tDS6dOnCxo0bbR1eExgvtzbhZrsJPyuwiZkDBw5w8cUXM3v2bABLIiZQ1dZIROS/gZuB7iKyNuSl5sD7fgdmapaens6pp57KKaecEnQoxtR4a/M08BowEwhdbnO/qu7xNSpTra+++oqkpCRatmzJX/7yl6DDMQaoIZGo6l5gL3Bt7MIxNamsrGT48OFkZGSwfPlyW7jKNBg2q00cSUpKYtq0aWRkZFgSMQ2KJZI4UFRUxL///W+GDh3KyJEjgw7HmBP4uUCWqSeTJk1i7NixHDp0KOhQjAnLaiRxYMGCBWzbto2MjIygQzEmLKuRNFD//ve/mTJlCpWVlbRu3ZoBAwYEHZIx1bJE0kD9/e9/55FHHqGwsDDoUIypledZ5BuKeJlFPlKqioigqvznP/+hbdu2QYdkDFDzLPJWI2lACgoKOPfcc9m2bRsiYknExA1LJA1IZWUlR44coaysLOhQjKkTe2rTAOzevZvs7GwGDBhAQUEBSUmW30188fU3VkSGi8hmEdkiIneFef2HIrLW/fpARBJuluKCggJ69OjBCy+8AGBJxMQl335rRSQZmI8z5UBv4FoR6V3ltM+AIaraF7gXZ43hhHLaaadx3XXXccEFFwQdijER8/PP37nAFlXdqqpHgGeBUaEnqOoHqvq1u5sHdPQxngZlxYoVfPPNN6Snp/PQQw/Rrl27oEMyJmJ+JpIOwPaQ/UL3WHVuwpm24AQiMkFE8kUkf9euXfUYYjB27NjBJZdcwtSpU4MOxZh64Wdja7jhqWE7rYjIxTiJJGz9XlVzcW97cnJy4qvjSxjt27fnqaee4uKLLw46FGPqhZ81kkKgU8h+R2BH1ZNEpC/wCDBKVb/yMZ7APf/883z44YcAjB492hb0No2Gn4lkFdBTRLqJSBowliqLj4tIZ2AhcL2qfuJjLIE7fPgwU6ZM4b777gs6FGPqnW+3NqpaLiK3Aktx1gz+s6p+LCIT3dcXANOBk4AH3Yl6yqvrghvvmjRpwttvv81JJ50UdCjG1Dsba+OzBx98kOLiYmbMmBF0KMZEpaaxNtaz1UeqykcffURxcTHl5eWkpNiP2zRO9pvtk9LSUtLT03n44YepqKiwJGIaNeuP7YNZs2Zx/vnns3fvXpKTk0lLSws6JGN8ZX8mfXD22WezdetWmjVrFnQoxsSE1UjqiaqyYcMGAC6//HJyc3NJTk4OOCpjYsMSST154IEH6N+/P+vXrw86FGNiLuFubRYVFDFn6WZ2lJTSPiudycN6cVX/moYAeXPjjTdSWVnJGWecUQ9RGhNfEqpGsqigiCkL11FUUooCRSWlTFm4jkUFRRFdr6KigkcffZSKigpatmzJL3/5S1sBzySkhEokc5ZuprSs4rhjpWUVzFm6OaLrvfbaa4wfP57FixfXfrIxjVhC3drsKCmt0/HajBgxgnfeeYeLLroomrCMiXsJVSNpn5Vep+PhHDlyhJtvvplPP/0UwJKIMSRYIpk8rBfpqcc/kk1PTWbysF6er7Ft2zZeeOEFli9fXs/RGRO/EurW5ujTmUie2lRWVpKUlMQpp5zC5s2badWqld/hGhM3EiqRgJNM6vq499ChQ4waNYoxY8bw05/+1JKIMVUk1K1NpFJSUsjIyCAjIyPoUIxpkBKuRlIXe/fuJSUlhaZNm7Jo0SLrI2JMNaxGUo2KigqGDx/OmDFjji3sbYwJz2ok1UhOTuaWW24hMzPTkogxtbBEUkVxcTHbtm0jJyeHcePGBR2OMXHBbm2qGD9+PCNHjqS0NLLersYkIquRVDF//ny++OIL0tO993Y1JtFZjQT4/PPPmT17NqpKp06dbEFvY+rIEgnw2GOPMWvWLIqKIptOwJhEZ4kEmD59OgUFBXTs2DHoUIyJSwmbSDZs2MB3v/tddu3aRVJSEl27dg06JGPiVsImkl27dvH555+zZ8+eoEMxJu4l3FObAwcO0KxZM4YMGcLGjRtJTU0NOiRj4l5C1UjWrFlD9+7defXVVwEsiRhTTxIqkXTr1o1LL73UZno3pp4lRCJZs2YN5eXltGjRgmeeeYYuXboEHZIxjUqjTyTbt29n0KBBTJs2LehQjGm0fG1sFZHhwANAMvCIqs6q8rq4r38POAT8SFU/iqbMcAtgzZs3jyuvvDKayxpjauBbjUREkoH5wBVAb+BaEeld5bQrgJ7u1wTgoWjKDF0A6+CWlXz+6SdMWbiOVv2H0bp162gubYypgZ+3NucCW1R1q6oeAZ4FRlU5ZxTwhDrygCwRaRdpgUcXwKosO8yepfP5+p3Ho1oAyxjjjZ+3Nh2A7SH7hcB5Hs7pAHwZepKITMCpsdC5c+dqCzy60FVSahNOHnsfyc1aHXfcGOMPP2sk4aYV0wjOQVVzVTVHVXNqukUJXegq9aSOJDXJOOG4Mab++ZlICoFOIfsdgR0RnONZfSyAZYypOz8TySqgp4h0E5E0YCxQdbXtxcAN4hgI7FXVL6teyKur+ndg5ug+dMhKR4AOWenMHN2nzuvYGGPqxrc2ElUtF5FbgaU4j3//rKofi8hE9/UFwKs4j3634Dz+/XG05UayAJYxJjq+9iNR1VdxkkXosQUh2wrc4mcMxhj/NfqercYY/1kiMcZEzRKJMSZqlkiMMVETp70zfojILuALD6dmA7t9DidaFmP0Gnp80PBj9BpfF1UN2yM07hKJVyKSr6o5QcdRE4sxeg09Pmj4MdZHfHZrY4yJmiUSY0zUGnMiyQ06AA8sxug19Pig4ccYdXyNto3EGBM7jblGYoyJEUskxpioxX0iEZHhIrJZRLaIyF1hXhcRmeu+vlZEzm6AMf7QjW2tiHwgIv0aUnwh550jIhUiMiaW8bll1xqjiAwVkdUi8rGIvNOQ4hORTBF5WUTWuPFFPdK9jvH9WUSKRWR9Na9H9zlR1bj9wpme4FOgO5AGrAF6Vznne8BrOLOxDQT+1QBjPB9o6W5fEcsYvcQXct5bOKO5xzTAn2EWsAHo7O63aWDx/QqY7W63BvYAaTGM8SLgbGB9Na9H9TmJ9xpJzCeY9iNGVf1AVb92d/NwZoprMPG5fga8CBTHMLajvMR4HbBQVbcBqGos4/QSnwLN3SVYmuEkkvJYBaiq77plVieqz0m8J5LqJo+u6zl+qmv5N+H8ZYiVWuMTkQ7A1cACguHlZ3gq0FJElovIhyJyQ8yi8xbfPOB0nKlE1wG3qWplbMLzJKrPia8TG8VAvU0w7SPP5YvIxTiJ5AJfI6pSbJhjVeP7I3CnqlY4f1BjzkuMKcAA4FIgHVghInmq+onfweEtvmHAauASoAfwDxF5T1X3+RybV1F9TuI9kcR8gukIeCpfRPoCjwBXqOpXMYoNvMWXAzzrJpFs4HsiUq6qi2ISoff/592qehA4KCLvAv2AWCQSL/H9GJilToPEFhH5DDgNWBmD+LyI7nMSq8YenxqQUoCtQDe+beQ6o8o53+f4RqSVDTDGzjjz1p7fEH+GVc5/jNg3tnr5GZ4OLHPPzQDWA2c2oPgeAma42ycDRUB2jH+OXam+sTWqz0lc10g0oAmmfYhxOnAS8KD7V79cYzRa1GN8gfISo6puFJHXgbVAJc5a02EfdQYRH3Av8JiIrMP5sN6pqjGbWkBEngGGAtkiUgj8D5AaEl9UnxPrIm+MiVq8P7UxxjQAlkiMMVGzRGKMiZolEmNM1CyRGGOiZokkzonIJBHZKCJP1XDOUBFZEsu4qiMiI4+OjhWRq0Skd8hr94jId2MYy1AROT9W5TVmcd2PxABwM05v2M+CDsQLVV0MLHZ3rwKW4IzaRVWn13d5IpKiqtUNjhsKHAA+qO9yE43VSOKYiCzAGbq+WER+ISLnuvOZFLjfe4V5zxB3zo7V7nnN3eOTRWSVOxfF3dWUd0BEficiH4nIMhFp7R4/S0Ty3Pf+XURauscnicgG9/iz7rEficg8tyYwEpjjxtJDRB4TkTEicoWIPB9S7lARedndvlxEVrgxvCAizcLEuVxEfuPOSXKbiFwpIv9y/71visjJItIVmAj8wi3/QhFpLSIvuj+HVSIyOJr/n4QSyy669uVLt+fPcbtaAy2AFHf7u8CL7vZQYIm7/TIw2N1uhlMrvRxnAmDB+eOyBLgoTFkK/NDdng7Mc7fXAkPc7XuAP7rbO4Am7naW+/1HIe97jJDu9kf33Zi2AU3d4w8B43DG+bwbcvxOYHqYOJcDD4bst+Tbzpfjgd+52zOA/xty3tPABe52Z2Bj0P+/8fJltzaNSybwuIj0xPnQp4Y5533g926bykJVLRSRy3GSSYF7TjOgJ86HNlQl8Jy7/SSwUEQycZLE0RnJHgdecLfXAk+JyCJgkdd/hDpdzl8HrhSRv+GMA7kDGAL0Bt53hxKkASuqucxzIdsdgefc+TXSgOpuA78L9A4Z4dxCRJqr6n6vsScqSySNy73A26p6tVt1X171BFWdJSKv4IyryHMbNwWYqaoP17G82sZXfB9nZq6RwDQROaMO134OuAVnMp5VqrrfnRToH6p6rYf3HwzZ/hPwe1VdLCJDcWoi4SQBg1S1tA5xGqyNpLHJxBlVCs4txAlEpIeqrlPV2UA+zlD2pcB/HW1vEJEOItImzNuTcG49wJmR7J+quhf4WkQudI9fD7wjIklAJ1V9G6c2kYVT0wm1H2hezb9lOc7UgD/h29pFHjBYRE5x48wQkVOreX+o0J/LjTWU/wZw69EdETnLw7UNlkgam98CM0XkfZxRqOH8XETWi8gaoBR4TVXfwGkfWOGOTv0b4T/gB4EzRORDnAl67nGP34jTaLoWOMs9ngw86V6vAPiDqpZUud6zwGS3EbRH6AuqWoHTVnOF+x1V3YWTIJ9xy8rDSYS1mQG8ICLvcfxi2S8DVx9tbAUmATlu4/AGnMZY44GN/jWeicgBVT3hKYkxViMxxkTNaiTGmKhZjcQYEzVLJMaYqFkiMcZEzRKJMSZqlkiMMVH7/0GdgVLmimpwAAAAAElFTkSuQmCC\n", - "text/plain": [ - "
" - ] - }, - "metadata": { - "needs_background": "light" - }, - "output_type": "display_data" - } - ], - "source": [ - "# ROC for deterministic forecasts and bin_edges\n", - "roc = xs.roc(obs, fct, np.linspace(0, 1, 11), return_results='all_as_metric_dim')\n", - "\n", - "plt.figure(figsize=(4, 4))\n", - "plt.plot([0,1], [0,1], 'k:')\n", - "roc.to_dataset(dim='metric').plot.scatter(y='true positive rate', x='false positive rate')\n", - "roc.sel(metric='area under curve').values[0]" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Comparative\n", - "\n", - "Tests to compare whether one forecast is significantly better than another one." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Sign test" - ] - }, - { - "cell_type": "code", - "execution_count": 37, - "metadata": {}, - "outputs": [], - "source": [ - "length = 100\n", - "obs_1d = xr.DataArray(\n", - " np.random.rand(length),\n", - " coords=[\n", - " np.arange(length),\n", - " ],\n", - " dims=[\"time\"],\n", - " name='var'\n", - " )\n", - "fct_1d = obs_1d.copy()\n", - "fct_1d.values = np.random.rand(length)" - ] - }, - { - "cell_type": "code", - "execution_count": 38, - "metadata": {}, - "outputs": [], - "source": [ - "# given you want to test whether one forecast is better than another forecast\n", - "significantly_different, walk, confidence = xs.sign_test(\n", - " fct_1d, fct_1d + 0.2, obs_1d, time_dim=\"time\", metric=\"mae\", orientation=\"negative\"\n", - ")" - ] - }, - { - "cell_type": "code", - "execution_count": 39, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "[]" - ] - }, - "execution_count": 39, - "metadata": {}, - "output_type": "execute_result" - }, - { - "data": { - "image/png": "\n", - "text/plain": [ - "
" - ] - }, - "metadata": { - "needs_background": "light" - }, - "output_type": "display_data" - } - ], - "source": [ - "walk.plot()\n", - "confidence.plot(c='gray')\n", - "(-1 * confidence).plot(c='gray')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### MAE test" - ] - }, - { - "cell_type": "code", - "execution_count": 40, - "metadata": {}, - "outputs": [], - "source": [ - "# create a worse forecast with high but different to perfect correlation\n", - "fct_1d_worse = fct_1d.copy()\n", - "step = 3\n", - "fct_1d_worse[::step] = fct_1d[::step].values + 0.1" - ] - }, - { - "cell_type": "code", - "execution_count": 41, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "array(0.00966918)\n", - "\n", - "array(0.01083478)\n", - "MAEs significantly different at level 0.05 : False\n" - ] - } - ], - "source": [ - "# half-with of the confidence interval at level alpha is larger than the MAE differences,\n", - "# therefore not significant\n", - "alpha = 0.05\n", - "significantly_different, diff, hwci = xs.mae_test(\n", - " fct_1d, fct_1d_worse, obs_1d, time_dim=\"time\", dim=[], alpha=alpha\n", - ")\n", - "print(diff)\n", - "print(hwci)\n", - "print(\n", - " f\"MAEs significantly different at level {alpha} : {bool(significantly_different)}\"\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Accessors\n", - "\n", - "You can also use `xskillscore` as a method of your `xarray` Dataset." - ] - }, - { - "cell_type": "code", - "execution_count": 42, - "metadata": {}, - "outputs": [], - "source": [ - "ds = xr.Dataset()\n", - "ds[\"obs_var\"] = obs\n", - "ds[\"fct_var\"] = fct" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "In the case that your Dataset contains both your observation and forecast variable, just pass them as strings into the function." - ] - }, - { - "cell_type": "code", - "execution_count": 43, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "array([[ 0.99509676, -0.88499394, 0.94083077, 0.96521259, -0.13696899],\n", - " [-0.90613709, 0.51585291, 0.72875703, 0.19331043, 0.79754067],\n", - " [-0.80112059, -0.95632624, -0.23640403, -0.57684283, 0.43389289],\n", - " [ 0.00230351, -0.58970109, -0.87332763, -0.99992557, -0.31404248]])\n", - "Coordinates:\n", - " * lat (lat) int64 0 1 2 3\n", - " * lon (lon) int64 0 1 2 3 4\n" - ] - } - ], - "source": [ - "print(ds.xs.pearson_r(\"obs_var\", \"fct_var\", dim=\"time\"))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You can also pass in a separate Dataset that contains your observations or forecast variable." - ] - }, - { - "cell_type": "code", - "execution_count": 44, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "array([[ 0.99509676, -0.88499394, 0.94083077, 0.96521259, -0.13696899],\n", - " [-0.90613709, 0.51585291, 0.72875703, 0.19331043, 0.79754067],\n", - " [-0.80112059, -0.95632624, -0.23640403, -0.57684283, 0.43389289],\n", - " [ 0.00230351, -0.58970109, -0.87332763, -0.99992557, -0.31404248]])\n", - "Coordinates:\n", - " * lat (lat) int64 0 1 2 3\n", - " * lon (lon) int64 0 1 2 3 4\n" - ] - } - ], - "source": [ - "ds = ds.drop_vars(\"fct_var\")\n", - "print(ds.xs.pearson_r(\"obs_var\", fct, dim=\"time\"))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Resampling\n", - "- randomly resample the `time` dimension and then take mean over `time` to get resample threshold\n", - "- resample over `member` dimension to get uncertainty due to member sampling in hindcasts" - ] - }, - { - "cell_type": "code", - "execution_count": 45, - "metadata": {}, - "outputs": [], - "source": [ - "# create large one-dimensional array\n", - "s = 1000\n", - "f = xr.DataArray(\n", - " np.random.normal(size=s), dims=\"member\", coords={\"member\": np.arange(s)}, name=\"var\"\n", - ")" - ] - }, - { - "cell_type": "code", - "execution_count": 46, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "65.1 ms ± 1.78 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)\n", - "1.44 ms ± 41.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n" - ] - } - ], - "source": [ - "# resample with replacement in that one dimension\n", - "iterations = 100\n", - "%timeit f_r = xs.resampling.resample_iterations(f, iterations, 'member', replace=True)\n", - "# resample_iterations_idx is much (50x) faster because it involves no loops\n", - "%timeit f_r = xs.resampling.resample_iterations_idx(f, iterations, 'member', replace=True)\n", - "# but both do the same resampling" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "- use `resample_iterations` for very large data, because very robust, chunksize stays contants and only more tasks are added\n", - "- use `resample_iterations_idx` for small data always and very large data only, when chunked to small chunks in the other dimensions, because the function increases the input chunksize by factor `iterations`" - ] - }, - { - "cell_type": "code", - "execution_count": 47, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "" - ] - }, - "execution_count": 47, - "metadata": {}, - "output_type": "execute_result" - }, - { - "data": { - "image/png": "\n", - "text/plain": [ - "
" - ] - }, - "metadata": { - "needs_background": "light" - }, - "output_type": "display_data" - } - ], - "source": [ - "f_r = xs.resampling.resample_iterations_idx(f, iterations, 'member', replace=True)\n", - "f.plot.hist(label='distribution')\n", - "f_r.mean('iteration').plot.hist(label='resampled mean distribution')\n", - "plt.axvline(x=f.mean('member'), c='k', label='distribution mean')\n", - "plt.title('Gaussian distribution mean')\n", - "plt.legend()" - ] - }, - { - "cell_type": "code", - "execution_count": 48, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "" - ] - }, - "execution_count": 48, - "metadata": {}, - "output_type": "execute_result" - }, - { - "data": { - "image/png": "\n", - "text/plain": [ - "
" - ] - }, - "metadata": { - "needs_background": "light" - }, - "output_type": "display_data" - } - ], - "source": [ - "# we can calculate the distribution of the RMSE of 0 and f resampled over member\n", - "xs.rmse(f_r, xr.zeros_like(f_r), dim='iteration').plot.hist(label='resampled RMSE distribution')\n", - "# the gaussian distribution should have an RMSE with 0 of one\n", - "plt.axvline(x=xs.rmse(f, xr.zeros_like(f)), c='k', label='RMSE')\n", - "plt.title('RMSE between gaussian distribution and 0')\n", - "plt.legend()" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.8.6" - } - }, - "nbformat": 4, - "nbformat_minor": 4 -} \ No newline at end of file + "nbformat": 4, + "nbformat_minor": 4, +} From 30d9b9ffc5678f1a115945296838e200425ee009 Mon Sep 17 00:00:00 2001 From: Ray Bell Date: Tue, 11 May 2021 00:25:45 -0400 Subject: [PATCH 8/9] lint2 --- docs/source/quick-start.ipynb | 2680 +++++++++++++++++---------------- 1 file changed, 1381 insertions(+), 1299 deletions(-) diff --git a/docs/source/quick-start.ipynb b/docs/source/quick-start.ipynb index ae227a3a..f37cd367 100644 --- a/docs/source/quick-start.ipynb +++ b/docs/source/quick-start.ipynb @@ -1,1301 +1,1383 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Quick Start\n", - "\n", - "See the [API](https://xskillscore.readthedocs.io/en/stable/api.html) for more detailed information, examples, formulas, and references for each function.", - ], - }, - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "import numpy as np\n", - "import xarray as xr\n", - "import xskillscore as xs\n", - "import matplotlib.pyplot as plt\n", - "np.random.seed(seed=42)", - ], - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Here, we generate some sample gridded data. Our data has three time steps, and a 4x5 latitude/longitude grid. `obs` denotes some verification data (sometimes termed `y`) and `fct` some forecast data (e.g. from a statistical or dynamical model; sometimes termed `yhat`)." - ], - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [], - "source": [ - "obs = xr.DataArray(\n", - " np.random.rand(3, 4, 5),\n", - " coords=[\n", - ' xr.cftime_range("2000-01-01", "2000-01-03", freq="D"),\n', - " np.arange(4),\n", - " np.arange(5),\n", - " ],\n", - ' dims=["time", "lat", "lon"],\n', - " name='var'\n", - " )\n", - "fct = obs.copy()\n", - "fct.values = np.random.rand(3, 4, 5)", - ], - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Deterministic Metrics\n", - "\n", - "`xskillscore` offers a suite of correlation-based and distance-based deterministic metrics:\n", - "\n", - "### Correlation-Based \n", - "\n", - "* Effective Sample Size (`effective_sample_size`)\n", - "* Pearson Correlation (`pearson_r`)\n", - "* Pearson Correlation effective p value (`pearson_r_eff_p_value`)\n", - "* Pearson Correlation p value (`pearson_r_p_value`)\n", - "* Slope of Linear Fit (`linslope`)\n", - "* Spearman Correlation (`spearman_r`)\n", - "* Spearman Correlation effective p value (`spearman_r_eff_p_value`)\n", - "* Spearman Correlation p value (`spearman_r_p_value`)\n", - "\n", - "### Distance-Based\n", - "\n", - "* Coefficient of Determination (`r2`)\n", - "* Mean Absolute Error (`mae`)\n", - "* Mean Absolute Percentage Error (`mape`)\n", - "* Mean Error (`me`)\n", - "* Mean Squared Error (`mse`)\n", - "* Median Absolute Error (`median_absolute_error`)\n", - "* Root Mean Squared Error (`rmse`)\n", - "* Symmetric Mean Absolute Percentage Error (`smape`)", - ], - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Calling the functions is very straight-forward. All deterministic functions take the form `func(a, b, dim=None, **kwargs)`. **Notice that the original dataset is reduced by the dimension passed.** I.e., since we passed `time` as the dimension here, we are returned an object with dimensions `(lat, lon)`. For correlation metrics `dim` cannot be `[]`." - ], - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "array([[ 0.99509676, -0.88499394, 0.94083077, 0.96521259, -0.13696899],\n", - " [-0.90613709, 0.51585291, 0.72875703, 0.19331043, 0.79754067],\n", - " [-0.80112059, -0.95632624, -0.23640403, -0.57684283, 0.43389289],\n", - " [ 0.00230351, -0.58970109, -0.87332763, -0.99992557, -0.31404248]])\n", - "Coordinates:\n", - " * lat (lat) int64 0 1 2 3\n", - " * lon (lon) int64 0 1 2 3 4\n", - ], - } - ], - "source": ["r = xs.pearson_r(obs, fct, dim='time')\n", "print(r)"], - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "array([[0.06306879, 0.30832471, 0.22009394, 0.1684121 , 0.91252786],\n", - " [0.2780348 , 0.6549502 , 0.48019675, 0.87615511, 0.41226788],\n", - " [0.40847506, 0.1888421 , 0.84806222, 0.60856901, 0.71427925],\n", - " [0.99853354, 0.59849112, 0.32391484, 0.00776728, 0.79663312]])\n", - "Coordinates:\n", - " * lat (lat) int64 0 1 2 3\n", - " * lon (lon) int64 0 1 2 3 4\n", - ], - } - ], - "source": ['p = xs.pearson_r_p_value(obs, fct, dim="time")\n', "print(p)"], - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You can also specify multiple axes for deterministic metrics. Here, we apply it over the latitude and longitude dimension (a pattern correlation)." - ], - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "array([-0.16920304, -0.06326809, 0.18040449])\n", - "Coordinates:\n", - " * time (time) object 2000-01-01 00:00:00 ... 2000-01-03 00:00:00\n", - ], - } - ], - "source": ['r = xs.pearson_r(obs, fct, dim=["lat", "lon"])\n', "print(r)"], - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "All deterministic metrics except for `effective_sample_size`, `pearson_r_eff_p_value` and `spearman_r_eff_p_value` can take the kwarg `weights=...`. `weights` should be a DataArray of the size of the reduced dimension (e.g., if time is being reduced it should be of length 3 in our example).\n", - "\n", - "Weighting is a common practice when working with observations and model simulations of the Earth system. When working with rectilinear grids, one can weight the data by the cosine of the latitude, which is maximum at the equator and minimum at the poles (as in the below example). More complicated model grids tend to be accompanied by a cell area varaible, which could also be passed into this function.", - ], - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [], - "source": [ - "obs2 = xr.DataArray(\n", - " np.random.rand(3, 180, 360),\n", - " coords=[\n", - ' xr.cftime_range("2000-01-01", "2000-01-03", freq="D"),\n', - " np.linspace(-89.5, 89.5, 180),\n", - " np.linspace(-179.5, 179.5, 360),\n", - " ],\n", - ' dims=["time", "lat", "lon"],\n', - " )\n", - "fct2 = obs2.copy()\n", - "fct2.values = np.random.rand(3, 180, 360)", - ], - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [], - "source": [ - "# make weights as cosine of the latitude and broadcast\n", - "weights = np.cos(np.deg2rad(obs2.lat))\n", - "_, weights = xr.broadcast(obs2, weights)\n", - "\n", - "# Remove the time dimension from weights\n", - "weights = weights.isel(time=0)", - ], - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "array([-0.0020303 , -0.00498588, -0.00401522])\n", - "Coordinates:\n", - " * time (time) object 2000-01-01 00:00:00 ... 2000-01-03 00:00:00\n", - ], - } - ], - "source": [ - 'r_weighted = xs.pearson_r(obs2, fct2, dim=["lat", "lon"], weights=weights)\n', - "print(r_weighted)", - ], - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "array([ 5.72646719e-05, -4.32380560e-03, 4.17909845e-05])\n", - "Coordinates:\n", - " * time (time) object 2000-01-01 00:00:00 ... 2000-01-03 00:00:00\n", - ], - } - ], - "source": [ - 'r_unweighted = xs.pearson_r(obs2, fct2, dim=["lat", "lon"], weights=None)\n', - "print(r_unweighted)", - ], - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You can also pass the optional boolean kwarg `skipna`. If `True`, ignore any NaNs (pairwise) in `obs` and `fct` when computing the result. If `False`, return NaNs anywhere there are pairwise NaNs." - ], - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "array([[[ nan, nan, nan, nan, nan],\n", - " [ nan, nan, nan, nan, nan],\n", - " [0.02058449, 0.96990985, 0.83244264, 0.21233911, 0.18182497],\n", - " [0.18340451, 0.30424224, 0.52475643, 0.43194502, 0.29122914]],\n", - "\n", - " [[ nan, nan, nan, nan, nan],\n", - " [ nan, nan, nan, nan, nan],\n", - " [0.60754485, 0.17052412, 0.06505159, 0.94888554, 0.96563203],\n", - " [0.80839735, 0.30461377, 0.09767211, 0.68423303, 0.44015249]],\n", - "\n", - " [[ nan, nan, nan, nan, nan],\n", - " [ nan, nan, nan, nan, nan],\n", - " [0.96958463, 0.77513282, 0.93949894, 0.89482735, 0.59789998],\n", - " [0.92187424, 0.0884925 , 0.19598286, 0.04522729, 0.32533033]]])\n", - "Coordinates:\n", - " * time (time) object 2000-01-01 00:00:00 ... 2000-01-03 00:00:00\n", - " * lat (lat) int64 0 1 2 3\n", - " * lon (lon) int64 0 1 2 3 4\n", - ], - } - ], - "source": [ - "obs_with_nans = obs.where(obs.lat > 1)\n", - "fct_with_nans = fct.where(fct.lat > 1)\n", - "print(obs_with_nans)", - ], - }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "array([0.51901116, 0.41623426, 0.32621064])\n", - "Coordinates:\n", - " * time (time) object 2000-01-01 00:00:00 ... 2000-01-03 00:00:00\n", - ], - } - ], - "source": [ - "mae_with_skipna = xs.mae(obs_with_nans, fct_with_nans, dim=['lat', 'lon'], skipna=True)\n", - "print(mae_with_skipna)", - ], - }, - { - "cell_type": "code", - "execution_count": 12, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "array([nan, nan, nan])\n", - "Coordinates:\n", - " * time (time) object 2000-01-01 00:00:00 ... 2000-01-03 00:00:00\n", - ], - } - ], - "source": [ - "mae_without_skipna = xs.mae(obs_with_nans, fct_with_nans, dim=['lat', 'lon'], skipna=False)\n", - "print(mae_without_skipna)", - ], - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Probabilistic Metrics\n", - "\n", - "`xskillscore` offers a suite of probabilistic metrics:\n", - "\n", - "* Brier Score (`brier_score`)\n", - "* Brier scores of an ensemble for exceeding given thresholds (`threshold_brier_score`)\n", - "* Continuous Ranked Probability Score with a gaussian distribution (`crps_gaussian`)\n", - "* Continuous Ranked Probability Score with numerical integration of the normal distribution (`crps_quadrature`)\n", - "* Continuous Ranked Probability Score with the ensemble distribution (`crps_ensemble`)\n", - "* Discrimination (`discrimination`)\n", - "* Rank Histogram (`rank_histogram`)\n", - "* Ranked Probability Score (`rps`)\n", - "* Receiver Operating Characteristic (`roc`)\n", - "* Reliability (`reliability`)", - ], - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We now create some data with an ensemble member dimension. In this case, we envision an ensemble forecast with multiple members to validate against our theoretical observations:" - ], - }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": {}, - "outputs": [], - "source": [ - "obs3 = xr.DataArray(\n", - " np.random.rand(4, 5),\n", - " coords=[np.arange(4), np.arange(5)],\n", - ' dims=["lat", "lon"],\n', - " name='var'\n", - " )\n", - "fct3 = xr.DataArray(\n", - " np.random.rand(3, 4, 5),\n", - " coords=[np.arange(3), np.arange(4), np.arange(5)],\n", - ' dims=["member", "lat", "lon"],\n', - " name='var'\n", - " )", - ], - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Continuous Ranked Probability Score with the ensemble distribution. Pass `dim=[]` to get the same behaviour as `properscoring.crps_ensemble` without any averaging over `dim`." - ], - }, - { - "cell_type": "code", - "execution_count": 14, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "array([[0.19918258, 0.10670612, 0.11858151, 0.15974459, 0.26841063],\n", - " [0.08038415, 0.13237479, 0.23778382, 0.18009214, 0.08326884],\n", - " [0.08589149, 0.11666573, 0.21579228, 0.09646599, 0.12855359],\n", - " [0.19891371, 0.10470738, 0.05289158, 0.107965 , 0.11143681]])\n", - "Coordinates:\n", - " * lat (lat) int64 0 1 2 3\n", - " * lon (lon) int64 0 1 2 3 4\n", - ], - } - ], - "source": [ - "crps_ensemble = xs.crps_ensemble(obs3, fct3, dim=[])\n", - "print(crps_ensemble)", - ], - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The CRPS with a Gaussian distribution requires two parameters: $\\mu$ and $\\sigma$ from the forecast distribution. Here, we just use the ensemble mean and ensemble spread." - ], - }, - { - "cell_type": "code", - "execution_count": 15, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "array([[0.19821619, 0.11640329, 0.14219455, 0.15912935, 0.28104703],\n", - " [0.08953392, 0.11758925, 0.25156378, 0.095484 , 0.10679842],\n", - " [0.05069082, 0.07081479, 0.24529056, 0.08700853, 0.09535839],\n", - " [0.1931706 , 0.11233935, 0.0783092 , 0.09593862, 0.11037143]])\n", - "Coordinates:\n", - " * lat (lat) int64 0 1 2 3\n", - " * lon (lon) int64 0 1 2 3 4\n", - ], - } - ], - "source": [ - 'crps_gaussian = xs.crps_gaussian(obs3, fct3.mean("member"), fct3.std("member"), dim=[])\n', - "print(crps_gaussian)", - ], - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The CRPS quadrature metric requires a callable distribution function. Here we use `norm` from `scipy.stats`." - ], - }, - { - "cell_type": "code", - "execution_count": 16, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "array([[0.52852898, 0.58042038, 0.46945497, 0.25013942, 0.23370234],\n", - " [0.39109762, 0.24071855, 0.25557803, 0.28994381, 0.23764056],\n", - " [0.40236669, 0.33477031, 0.24063375, 0.45538915, 0.48236113],\n", - " [0.42011508, 0.4174865 , 0.24837346, 0.43954946, 0.44689198]])\n", - "Coordinates:\n", - " * lat (lat) int64 0 1 2 3\n", - " * lon (lon) int64 0 1 2 3 4\n", - ], - } - ], - "source": [ - "from scipy.stats import norm\n", - "crps_quadrature = xs.crps_quadrature(obs3, norm, dim=[])\n", - "print(crps_quadrature)", - ], - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We can also use a threshold Brier Score, to score hits over a certain threshold. Ranked Probability Score for two categories yields the same result." - ], - }, - { - "cell_type": "code", - "execution_count": 17, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "array(0.15555556)\n", - "Coordinates:\n", - " threshold float64 0.5\n", - ], - } - ], - "source": [ - "threshold_brier_score = xs.threshold_brier_score(obs3, fct3, 0.5, dim=None)\n", - "print(threshold_brier_score)", - ], - }, - { - "cell_type": "code", - "execution_count": 18, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": ["\n", "array(0.15555556)\n"], - } - ], - "source": [ - "brier_score = xs.brier_score(obs3>.5, (fct3>.5).mean('member'))\n", - "print(brier_score)", - ], - }, - { - "cell_type": "code", - "execution_count": 19, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": ["\n", "array(0.15555556)\n"], - } - ], - "source": [ - "rps = xs.rps(obs3>.5, fct3>.5, category_edges=np.array([0.5]))\n", - "print(rps)", - ], - }, - { - "cell_type": "code", - "execution_count": 20, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "array([3, 8, 6, 3])\n", - "Coordinates:\n", - " * rank (rank) float64 1.0 2.0 3.0 4.0\n", - ], - } - ], - "source": [ - "rank_histogram = xs.rank_histogram(obs3, fct3)\n", - "print(rank_histogram)", - ], - }, - { - "cell_type": "code", - "execution_count": 21, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "array([[0. , 0.08333333, 0. , 0.66666667, 0.25 ],\n", - " [0.125 , 0.5 , 0. , 0.375 , 0. ]])\n", - "Coordinates:\n", - " * forecast_probability (forecast_probability) float64 0.1 0.3 0.5 0.7 0.9\n", - " * event (event) bool True False\n", - ], - } - ], - "source": [ - 'disc = xs.discrimination(obs3 > 0.5, (fct3 > 0.5).mean("member"))\n', - "print(disc)", - ], - }, - { - "cell_type": "code", - "execution_count": 22, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "array([0. , 0.2 , nan, 0.72727273, 1. ])\n", - "Coordinates:\n", - " * forecast_probability (forecast_probability) float64 0.1 0.3 0.5 0.7 0.9\n", - " samples (forecast_probability) float64 1.0 5.0 0.0 11.0 3.0\n", - ], - } - ], - "source": [ - 'rel = xs.reliability(obs3 > 0.5, (fct3 > 0.5).mean("member"))\n', - "print(rel)", - ], - }, - { - "cell_type": "code", - "execution_count": 23, - "metadata": {}, - "outputs": [ - { - "data": {"text/plain": ["0.8229166666666666"]}, - "execution_count": 23, - "metadata": {}, - "output_type": "execute_result", - }, - { - "data": { - "image/png": "\n", - "text/plain": ["
"], - }, - "metadata": {"needs_background": "light"}, - "output_type": "display_data", - }, - ], - "source": [ - "# ROC for probabilistic forecasts and bin_edges='continuous' default\n", - "roc = xs.roc(obs3 > 0.5, (fct3 > 0.5).mean(\"member\"), return_results='all_as_metric_dim')\n", - "\n", - "plt.figure(figsize=(4, 4))\n", - "plt.plot([0, 1], [0, 1], 'k:')\n", - "roc.to_dataset(dim='metric').plot.scatter(y='true positive rate', x='false positive rate')\n", - "roc.sel(metric='area under curve').values[0]", - ], - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Contingency-Based\n", - "\n", - "To work with contingency-based scoring, first instantiate a `Contingency` object by passing in your observations, forecast, and observation/forecast bin edges. See https://www.cawcr.gov.au/projects/verification/#Contingency_table for more information.", - ], - }, - { - "cell_type": "code", - "execution_count": 24, - "metadata": {}, - "outputs": [], - "source": [ - 'dichotomous_category_edges = np.array([0, 0.5, 1]) # "dichotomous" mean two-category\n', - "dichotomous_contingency = xs.Contingency(\n", - ' obs, fct, dichotomous_category_edges, dichotomous_category_edges, dim=["lat", "lon"]\n', - ")\n", - "dichotomous_contingency_table = dichotomous_contingency.table\n", - "print(dichotomous_contingency_table)", - ], - }, - { - "cell_type": "code", - "execution_count": 26, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - '\n', - " \n", - " \n", - " \n", - " \n", - ' \n', - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
histogram_observations_forecasts
observations_category12
observations_category_bounds[0.0, 0.5)[0.5, 1.0]
forecasts_categoryforecasts_category_bounds
1[0.0, 0.5)5.334.67
2[0.5, 1.0]5.334.67
\n", - "
", - ], - "text/plain": [ - " histogram_observations_forecasts \\\n", - "observations_category 1 \n", - "observations_category_bounds [0.0, 0.5) \n", - "forecasts_category forecasts_category_bounds \n", - "1 [0.0, 0.5) 5.33 \n", - "2 [0.5, 1.0] 5.33 \n", - "\n", - " \n", - "observations_category 2 \n", - "observations_category_bounds [0.5, 1.0] \n", - "forecasts_category forecasts_category_bounds \n", - "1 [0.0, 0.5) 4.67 \n", - "2 [0.5, 1.0] 4.67 ", - ], - }, - "execution_count": 26, - "metadata": {}, - "output_type": "execute_result", - } - ], - "source": [ - "print(\n", - " dichotomous_contingency_table.to_dataframe()\n", - " .pivot_table(\n", - ' index=["forecasts_category", "forecasts_category_bounds"],\n', - ' columns=["observations_category", "observations_category_bounds"],\n', - " )\n", - " .round(2)\n", - ")", - ], - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Scores based on the constructed contingency table can be called via class methods. The available methods are:\n", - "\n", - "* Accuracy (`accuracy`)\n", - "* Bias Score (`bias_score`)\n", - "* Equitable Threat Score (`equit_threat_score`)\n", - "* False Alarms / False Positives (`false_alarms`)\n", - "* False Alarm Ratio / False Discovery Rate (`false_alarm_ratio`)\n", - "* False Alarm Rate / False Positive Rate / Fall-out (`false_alarm_rate`)\n", - "* Gerrity Score (`gerrity_score`)\n", - "* Heidke Score / Cohen's Kappa (`heidke_score`)\n", - "* Hit Rate / Recall / Sensitivity / True Positive Rate (`hit_rate`)\n", - "* Hits / True Positives (`hits`)\n", - "* Misses / False Negatives (`misses`)\n", - "* Odds Ratio (`odds_ratio`)\n", - "* Odds Ratio Skill Score (`odds_ratio_skill_score`)\n", - "* Peirce Score (`peirce_score`)\n", - "* Receiver Operating Characteristic (`roc`)\n", - "* Success Ratio / Precision / Positive Predictive Value (`success_ratio`)\n", - "* Threat Score / Critical Success Index (`threat_score`)\n", - "\n", - "Below, we share a few examples of these in action:", - ], - }, - { - "cell_type": "code", - "execution_count": 27, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "array([1. , 1.11111111, 1.1 ])\n", - "Coordinates:\n", - " * time (time) object 2000-01-01 00:00:00 ... 2000-01-03 00:00:00\n", - ], - } - ], - "source": ["print(dichotomous_contingency.bias_score())"], - }, - { - "cell_type": "code", - "execution_count": 28, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "array([0.33333333, 0.55555556, 0.6 ])\n", - "Coordinates:\n", - " * time (time) object 2000-01-01 00:00:00 ... 2000-01-03 00:00:00\n", - ], - } - ], - "source": ["print(dichotomous_contingency.hit_rate())"], - }, - { - "cell_type": "code", - "execution_count": 29, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "array([0.54545455, 0.45454545, 0.5 ])\n", - "Coordinates:\n", - " * time (time) object 2000-01-01 00:00:00 ... 2000-01-03 00:00:00\n", - ], - } - ], - "source": ["print(dichotomous_contingency.false_alarm_rate())"], - }, - { - "cell_type": "code", - "execution_count": 30, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "array([-0.41176471, 0.2 , 0.2 ])\n", - "Coordinates:\n", - " * time (time) object 2000-01-01 00:00:00 ... 2000-01-03 00:00:00\n", - ], - } - ], - "source": ["print(dichotomous_contingency.odds_ratio_skill_score())"], - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now we can leverage multi-category edges to make use of some scores." - ], - }, - { - "cell_type": "code", - "execution_count": 31, - "metadata": {}, - "outputs": [], - "source": [ - "multi_category_edges = np.array([0, 0.25, 0.75, 1])\n", - "multicategory_contingency = xs.Contingency(\n", - ' obs, fct, multi_category_edges, multi_category_edges, dim=["lat", "lon"]\n', - ")", - ], - }, - { - "cell_type": "code", - "execution_count": 32, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "array([0.25, 0.25, 0.5 ])\n", - "Coordinates:\n", - " * time (time) object 2000-01-01 00:00:00 ... 2000-01-03 00:00:00\n", - ], - } - ], - "source": ["print(multicategory_contingency.accuracy())"], - }, - { - "cell_type": "code", - "execution_count": 33, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "array([-0.14503817, -0.25 , 0.2481203 ])\n", - "Coordinates:\n", - " * time (time) object 2000-01-01 00:00:00 ... 2000-01-03 00:00:00\n", - ], - } - ], - "source": ["print(multicategory_contingency.heidke_score())"], - }, - { - "cell_type": "code", - "execution_count": 34, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "array([-0.1496063 , -0.24193548, 0.25 ])\n", - "Coordinates:\n", - " * time (time) object 2000-01-01 00:00:00 ... 2000-01-03 00:00:00\n", - ], - } - ], - "source": ["print(multicategory_contingency.peirce_score())"], - }, - { - "cell_type": "code", - "execution_count": 35, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "array([-0.15212912, -0.11160714, 0.25 ])\n", - "Coordinates:\n", - " * time (time) object 2000-01-01 00:00:00 ... 2000-01-03 00:00:00\n", - ], - } - ], - "source": ["print(multicategory_contingency.gerrity_score())"], - }, - { - "cell_type": "code", - "execution_count": 36, - "metadata": {}, - "outputs": [ - { - "data": {"text/plain": ["0.5035528250988777"]}, - "execution_count": 36, - "metadata": {}, - "output_type": "execute_result", - }, - { - "data": { - "image/png": "\n", - "text/plain": ["
"], - }, - "metadata": {"needs_background": "light"}, - "output_type": "display_data", - }, - ], - "source": [ - "# ROC for deterministic forecasts and bin_edges\n", - "roc = xs.roc(obs, fct, np.linspace(0, 1, 11), return_results='all_as_metric_dim')\n", - "\n", - "plt.figure(figsize=(4, 4))\n", - "plt.plot([0,1], [0,1], 'k:')\n", - "roc.to_dataset(dim='metric').plot.scatter(y='true positive rate', x='false positive rate')\n", - "roc.sel(metric='area under curve').values[0]", - ], - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Comparative\n", - "\n", - "Tests to compare whether one forecast is significantly better than another one.", - ], - }, - {"cell_type": "markdown", "metadata": {}, "source": ["### Sign test"]}, - { - "cell_type": "code", - "execution_count": 37, - "metadata": {}, - "outputs": [], - "source": [ - "length = 100\n", - "obs_1d = xr.DataArray(\n", - " np.random.rand(length),\n", - " coords=[\n", - " np.arange(length),\n", - " ],\n", - ' dims=["time"],\n', - " name='var'\n", - " )\n", - "fct_1d = obs_1d.copy()\n", - "fct_1d.values = np.random.rand(length)", - ], - }, - { - "cell_type": "code", - "execution_count": 38, - "metadata": {}, - "outputs": [], - "source": [ - "# given you want to test whether one forecast is better than another forecast\n", - "significantly_different, walk, confidence = xs.sign_test(\n", - ' fct_1d, fct_1d + 0.2, obs_1d, time_dim="time", metric="mae", orientation="negative"\n', - ")", - ], - }, - { - "cell_type": "code", - "execution_count": 39, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": ["[]"] - }, - "execution_count": 39, - "metadata": {}, - "output_type": "execute_result", - }, - { - "data": { - "image/png": "\n", - "text/plain": ["
"], - }, - "metadata": {"needs_background": "light"}, - "output_type": "display_data", - }, - ], - "source": [ - "walk.plot()\n", - "confidence.plot(c='gray')\n", - "(-1 * confidence).plot(c='gray')", - ], - }, - {"cell_type": "markdown", "metadata": {}, "source": ["### MAE test"]}, - { - "cell_type": "code", - "execution_count": 40, - "metadata": {}, - "outputs": [], - "source": [ - "# create a worse forecast with high but different to perfect correlation\n", - "fct_1d_worse = fct_1d.copy()\n", - "step = 3\n", - "fct_1d_worse[::step] = fct_1d[::step].values + 0.1", - ], - }, - { - "cell_type": "code", - "execution_count": 41, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "array(0.00966918)\n", - "\n", - "array(0.01083478)\n", - "MAEs significantly different at level 0.05 : False\n", - ], - } - ], - "source": [ - "# half-with of the confidence interval at level alpha is larger than the MAE differences,\n", - "# therefore not significant\n", - "alpha = 0.05\n", - "significantly_different, diff, hwci = xs.mae_test(\n", - ' fct_1d, fct_1d_worse, obs_1d, time_dim="time", dim=[], alpha=alpha\n', - ")\n", - "print(diff)\n", - "print(hwci)\n", - "print(\n", - ' f"MAEs significantly different at level {alpha} : {bool(significantly_different)}"\n', - ")", - ], - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Accessors\n", - "\n", - "You can also use `xskillscore` as a method of your `xarray` Dataset.", - ], - }, - { - "cell_type": "code", - "execution_count": 42, - "metadata": {}, - "outputs": [], - "source": [ - "ds = xr.Dataset()\n", - 'ds["obs_var"] = obs\n', - 'ds["fct_var"] = fct', - ], - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "In the case that your Dataset contains both your observation and forecast variable, just pass them as strings into the function." - ], - }, - { - "cell_type": "code", - "execution_count": 43, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "array([[ 0.99509676, -0.88499394, 0.94083077, 0.96521259, -0.13696899],\n", - " [-0.90613709, 0.51585291, 0.72875703, 0.19331043, 0.79754067],\n", - " [-0.80112059, -0.95632624, -0.23640403, -0.57684283, 0.43389289],\n", - " [ 0.00230351, -0.58970109, -0.87332763, -0.99992557, -0.31404248]])\n", - "Coordinates:\n", - " * lat (lat) int64 0 1 2 3\n", - " * lon (lon) int64 0 1 2 3 4\n", - ], - } - ], - "source": ['print(ds.xs.pearson_r("obs_var", "fct_var", dim="time"))'], - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You can also pass in a separate Dataset that contains your observations or forecast variable." - ], - }, - { - "cell_type": "code", - "execution_count": 44, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "array([[ 0.99509676, -0.88499394, 0.94083077, 0.96521259, -0.13696899],\n", - " [-0.90613709, 0.51585291, 0.72875703, 0.19331043, 0.79754067],\n", - " [-0.80112059, -0.95632624, -0.23640403, -0.57684283, 0.43389289],\n", - " [ 0.00230351, -0.58970109, -0.87332763, -0.99992557, -0.31404248]])\n", - "Coordinates:\n", - " * lat (lat) int64 0 1 2 3\n", - " * lon (lon) int64 0 1 2 3 4\n", - ], - } - ], - "source": [ - 'ds = ds.drop_vars("fct_var")\n', - 'print(ds.xs.pearson_r("obs_var", fct, dim="time"))', - ], - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Resampling\n", - "- randomly resample the `time` dimension and then take mean over `time` to get resample threshold\n", - "- resample over `member` dimension to get uncertainty due to member sampling in hindcasts", - ], - }, - { - "cell_type": "code", - "execution_count": 45, - "metadata": {}, - "outputs": [], - "source": [ - "# create large one-dimensional array\n", - "s = 1000\n", - "f = xr.DataArray(\n", - ' np.random.normal(size=s), dims="member", coords={"member": np.arange(s)}, name="var"\n', - ")", - ], - }, - { - "cell_type": "code", - "execution_count": 46, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "65.1 ms ± 1.78 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)\n", - "1.44 ms ± 41.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n", - ], - } - ], - "source": [ - "# resample with replacement in that one dimension\n", - "iterations = 100\n", - "%timeit f_r = xs.resampling.resample_iterations(f, iterations, 'member', replace=True)\n", - "# resample_iterations_idx is much (50x) faster because it involves no loops\n", - "%timeit f_r = xs.resampling.resample_iterations_idx(f, iterations, 'member', replace=True)\n", - "# but both do the same resampling", - ], - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "- use `resample_iterations` for very large data, because very robust, chunksize stays contants and only more tasks are added\n", - "- use `resample_iterations_idx` for small data always and very large data only, when chunked to small chunks in the other dimensions, because the function increases the input chunksize by factor `iterations`", - ], - }, - { - "cell_type": "code", - "execution_count": 47, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [""] - }, - "execution_count": 47, - "metadata": {}, - "output_type": "execute_result", - }, - { - "data": { - "image/png": "\n", - "text/plain": ["
"], - }, - "metadata": {"needs_background": "light"}, - "output_type": "display_data", - }, - ], - "source": [ - "f_r = xs.resampling.resample_iterations_idx(f, iterations, 'member', replace=True)\n", - "f.plot.hist(label='distribution')\n", - "f_r.mean('iteration').plot.hist(label='resampled mean distribution')\n", - "plt.axvline(x=f.mean('member'), c='k', label='distribution mean')\n", - "plt.title('Gaussian distribution mean')\n", - "plt.legend()", - ], - }, - { - "cell_type": "code", - "execution_count": 48, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [""] - }, - "execution_count": 48, - "metadata": {}, - "output_type": "execute_result", - }, - { - "data": { - "image/png": "\n", - "text/plain": ["
"], - }, - "metadata": {"needs_background": "light"}, - "output_type": "display_data", - }, - ], - "source": [ - "# we can calculate the distribution of the RMSE of 0 and f resampled over member\n", - "xs.rmse(f_r, xr.zeros_like(f_r), dim='iteration').plot.hist(label='resampled RMSE distribution')\n", - "# the gaussian distribution should have an RMSE with 0 of one\n", - "plt.axvline(x=xs.rmse(f, xr.zeros_like(f)), c='k', label='RMSE')\n", - "plt.title('RMSE between gaussian distribution and 0')\n", - "plt.legend()", - ], - }, - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3", - }, - "language_info": { - "codemirror_mode": {"name": "ipython", "version": 3}, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.8.6", - }, + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Quick Start\n", + "\n", + "See the [API](https://xskillscore.readthedocs.io/en/stable/api.html) for more detailed information, examples, formulas, and references for each function." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "import xarray as xr\n", + "import xskillscore as xs\n", + "import matplotlib.pyplot as plt\n", + "np.random.seed(seed=42)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Here, we generate some sample gridded data. Our data has three time steps, and a 4x5 latitude/longitude grid. `obs` denotes some verification data (sometimes termed `y`) and `fct` some forecast data (e.g. from a statistical or dynamical model; sometimes termed `yhat`)." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "obs = xr.DataArray(\n", + " np.random.rand(3, 4, 5),\n", + " coords=[\n", + " xr.cftime_range(\"2000-01-01\", \"2000-01-03\", freq=\"D\"),\n", + " np.arange(4),\n", + " np.arange(5),\n", + " ],\n", + " dims=[\"time\", \"lat\", \"lon\"],\n", + " name='var'\n", + " )\n", + "fct = obs.copy()\n", + "fct.values = np.random.rand(3, 4, 5)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Deterministic Metrics\n", + "\n", + "`xskillscore` offers a suite of correlation-based and distance-based deterministic metrics:\n", + "\n", + "### Correlation-Based \n", + "\n", + "* Effective Sample Size (`effective_sample_size`)\n", + "* Pearson Correlation (`pearson_r`)\n", + "* Pearson Correlation effective p value (`pearson_r_eff_p_value`)\n", + "* Pearson Correlation p value (`pearson_r_p_value`)\n", + "* Slope of Linear Fit (`linslope`)\n", + "* Spearman Correlation (`spearman_r`)\n", + "* Spearman Correlation effective p value (`spearman_r_eff_p_value`)\n", + "* Spearman Correlation p value (`spearman_r_p_value`)\n", + "\n", + "### Distance-Based\n", + "\n", + "* Coefficient of Determination (`r2`)\n", + "* Mean Absolute Error (`mae`)\n", + "* Mean Absolute Percentage Error (`mape`)\n", + "* Mean Error (`me`)\n", + "* Mean Squared Error (`mse`)\n", + "* Median Absolute Error (`median_absolute_error`)\n", + "* Root Mean Squared Error (`rmse`)\n", + "* Symmetric Mean Absolute Percentage Error (`smape`)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Calling the functions is very straight-forward. All deterministic functions take the form `func(a, b, dim=None, **kwargs)`. **Notice that the original dataset is reduced by the dimension passed.** I.e., since we passed `time` as the dimension here, we are returned an object with dimensions `(lat, lon)`. For correlation metrics `dim` cannot be `[]`." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "array([[ 0.99509676, -0.88499394, 0.94083077, 0.96521259, -0.13696899],\n", + " [-0.90613709, 0.51585291, 0.72875703, 0.19331043, 0.79754067],\n", + " [-0.80112059, -0.95632624, -0.23640403, -0.57684283, 0.43389289],\n", + " [ 0.00230351, -0.58970109, -0.87332763, -0.99992557, -0.31404248]])\n", + "Coordinates:\n", + " * lat (lat) int64 0 1 2 3\n", + " * lon (lon) int64 0 1 2 3 4\n" + ] + } + ], + "source": [ + "r = xs.pearson_r(obs, fct, dim='time')\n", + "print(r)" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "array([[0.06306879, 0.30832471, 0.22009394, 0.1684121 , 0.91252786],\n", + " [0.2780348 , 0.6549502 , 0.48019675, 0.87615511, 0.41226788],\n", + " [0.40847506, 0.1888421 , 0.84806222, 0.60856901, 0.71427925],\n", + " [0.99853354, 0.59849112, 0.32391484, 0.00776728, 0.79663312]])\n", + "Coordinates:\n", + " * lat (lat) int64 0 1 2 3\n", + " * lon (lon) int64 0 1 2 3 4\n" + ] + } + ], + "source": [ + "p = xs.pearson_r_p_value(obs, fct, dim=\"time\")\n", + "print(p)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can also specify multiple axes for deterministic metrics. Here, we apply it over the latitude and longitude dimension (a pattern correlation)." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "array([-0.16920304, -0.06326809, 0.18040449])\n", + "Coordinates:\n", + " * time (time) object 2000-01-01 00:00:00 ... 2000-01-03 00:00:00\n" + ] + } + ], + "source": [ + "r = xs.pearson_r(obs, fct, dim=[\"lat\", \"lon\"])\n", + "print(r)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "All deterministic metrics except for `effective_sample_size`, `pearson_r_eff_p_value` and `spearman_r_eff_p_value` can take the kwarg `weights=...`. `weights` should be a DataArray of the size of the reduced dimension (e.g., if time is being reduced it should be of length 3 in our example).\n", + "\n", + "Weighting is a common practice when working with observations and model simulations of the Earth system. When working with rectilinear grids, one can weight the data by the cosine of the latitude, which is maximum at the equator and minimum at the poles (as in the below example). More complicated model grids tend to be accompanied by a cell area varaible, which could also be passed into this function." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "obs2 = xr.DataArray(\n", + " np.random.rand(3, 180, 360),\n", + " coords=[\n", + " xr.cftime_range(\"2000-01-01\", \"2000-01-03\", freq=\"D\"),\n", + " np.linspace(-89.5, 89.5, 180),\n", + " np.linspace(-179.5, 179.5, 360),\n", + " ],\n", + " dims=[\"time\", \"lat\", \"lon\"],\n", + " )\n", + "fct2 = obs2.copy()\n", + "fct2.values = np.random.rand(3, 180, 360)" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "# make weights as cosine of the latitude and broadcast\n", + "weights = np.cos(np.deg2rad(obs2.lat))\n", + "_, weights = xr.broadcast(obs2, weights)\n", + "\n", + "# Remove the time dimension from weights\n", + "weights = weights.isel(time=0)" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "array([-0.0020303 , -0.00498588, -0.00401522])\n", + "Coordinates:\n", + " * time (time) object 2000-01-01 00:00:00 ... 2000-01-03 00:00:00\n" + ] + } + ], + "source": [ + "r_weighted = xs.pearson_r(obs2, fct2, dim=[\"lat\", \"lon\"], weights=weights)\n", + "print(r_weighted)" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "array([ 5.72646719e-05, -4.32380560e-03, 4.17909845e-05])\n", + "Coordinates:\n", + " * time (time) object 2000-01-01 00:00:00 ... 2000-01-03 00:00:00\n" + ] + } + ], + "source": [ + "r_unweighted = xs.pearson_r(obs2, fct2, dim=[\"lat\", \"lon\"], weights=None)\n", + "print(r_unweighted)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can also pass the optional boolean kwarg `skipna`. If `True`, ignore any NaNs (pairwise) in `obs` and `fct` when computing the result. If `False`, return NaNs anywhere there are pairwise NaNs." + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "array([[[ nan, nan, nan, nan, nan],\n", + " [ nan, nan, nan, nan, nan],\n", + " [0.02058449, 0.96990985, 0.83244264, 0.21233911, 0.18182497],\n", + " [0.18340451, 0.30424224, 0.52475643, 0.43194502, 0.29122914]],\n", + "\n", + " [[ nan, nan, nan, nan, nan],\n", + " [ nan, nan, nan, nan, nan],\n", + " [0.60754485, 0.17052412, 0.06505159, 0.94888554, 0.96563203],\n", + " [0.80839735, 0.30461377, 0.09767211, 0.68423303, 0.44015249]],\n", + "\n", + " [[ nan, nan, nan, nan, nan],\n", + " [ nan, nan, nan, nan, nan],\n", + " [0.96958463, 0.77513282, 0.93949894, 0.89482735, 0.59789998],\n", + " [0.92187424, 0.0884925 , 0.19598286, 0.04522729, 0.32533033]]])\n", + "Coordinates:\n", + " * time (time) object 2000-01-01 00:00:00 ... 2000-01-03 00:00:00\n", + " * lat (lat) int64 0 1 2 3\n", + " * lon (lon) int64 0 1 2 3 4\n" + ] + } + ], + "source": [ + "obs_with_nans = obs.where(obs.lat > 1)\n", + "fct_with_nans = fct.where(fct.lat > 1)\n", + "print(obs_with_nans)" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "array([0.51901116, 0.41623426, 0.32621064])\n", + "Coordinates:\n", + " * time (time) object 2000-01-01 00:00:00 ... 2000-01-03 00:00:00\n" + ] + } + ], + "source": [ + "mae_with_skipna = xs.mae(obs_with_nans, fct_with_nans, dim=['lat', 'lon'], skipna=True)\n", + "print(mae_with_skipna)" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "array([nan, nan, nan])\n", + "Coordinates:\n", + " * time (time) object 2000-01-01 00:00:00 ... 2000-01-03 00:00:00\n" + ] + } + ], + "source": [ + "mae_without_skipna = xs.mae(obs_with_nans, fct_with_nans, dim=['lat', 'lon'], skipna=False)\n", + "print(mae_without_skipna)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Probabilistic Metrics\n", + "\n", + "`xskillscore` offers a suite of probabilistic metrics:\n", + "\n", + "* Brier Score (`brier_score`)\n", + "* Brier scores of an ensemble for exceeding given thresholds (`threshold_brier_score`)\n", + "* Continuous Ranked Probability Score with a gaussian distribution (`crps_gaussian`)\n", + "* Continuous Ranked Probability Score with numerical integration of the normal distribution (`crps_quadrature`)\n", + "* Continuous Ranked Probability Score with the ensemble distribution (`crps_ensemble`)\n", + "* Discrimination (`discrimination`)\n", + "* Rank Histogram (`rank_histogram`)\n", + "* Ranked Probability Score (`rps`)\n", + "* Receiver Operating Characteristic (`roc`)\n", + "* Reliability (`reliability`)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We now create some data with an ensemble member dimension. In this case, we envision an ensemble forecast with multiple members to validate against our theoretical observations:" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [], + "source": [ + "obs3 = xr.DataArray(\n", + " np.random.rand(4, 5),\n", + " coords=[np.arange(4), np.arange(5)],\n", + " dims=[\"lat\", \"lon\"],\n", + " name='var'\n", + " )\n", + "fct3 = xr.DataArray(\n", + " np.random.rand(3, 4, 5),\n", + " coords=[np.arange(3), np.arange(4), np.arange(5)],\n", + " dims=[\"member\", \"lat\", \"lon\"],\n", + " name='var'\n", + " )" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Continuous Ranked Probability Score with the ensemble distribution. Pass `dim=[]` to get the same behaviour as `properscoring.crps_ensemble` without any averaging over `dim`." + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "array([[0.19918258, 0.10670612, 0.11858151, 0.15974459, 0.26841063],\n", + " [0.08038415, 0.13237479, 0.23778382, 0.18009214, 0.08326884],\n", + " [0.08589149, 0.11666573, 0.21579228, 0.09646599, 0.12855359],\n", + " [0.19891371, 0.10470738, 0.05289158, 0.107965 , 0.11143681]])\n", + "Coordinates:\n", + " * lat (lat) int64 0 1 2 3\n", + " * lon (lon) int64 0 1 2 3 4\n" + ] + } + ], + "source": [ + "crps_ensemble = xs.crps_ensemble(obs3, fct3, dim=[])\n", + "print(crps_ensemble)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The CRPS with a Gaussian distribution requires two parameters: $\\mu$ and $\\sigma$ from the forecast distribution. Here, we just use the ensemble mean and ensemble spread." + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "array([[0.19821619, 0.11640329, 0.14219455, 0.15912935, 0.28104703],\n", + " [0.08953392, 0.11758925, 0.25156378, 0.095484 , 0.10679842],\n", + " [0.05069082, 0.07081479, 0.24529056, 0.08700853, 0.09535839],\n", + " [0.1931706 , 0.11233935, 0.0783092 , 0.09593862, 0.11037143]])\n", + "Coordinates:\n", + " * lat (lat) int64 0 1 2 3\n", + " * lon (lon) int64 0 1 2 3 4\n" + ] + } + ], + "source": [ + "crps_gaussian = xs.crps_gaussian(obs3, fct3.mean(\"member\"), fct3.std(\"member\"), dim=[])\n", + "print(crps_gaussian)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The CRPS quadrature metric requires a callable distribution function. Here we use `norm` from `scipy.stats`." + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "array([[0.52852898, 0.58042038, 0.46945497, 0.25013942, 0.23370234],\n", + " [0.39109762, 0.24071855, 0.25557803, 0.28994381, 0.23764056],\n", + " [0.40236669, 0.33477031, 0.24063375, 0.45538915, 0.48236113],\n", + " [0.42011508, 0.4174865 , 0.24837346, 0.43954946, 0.44689198]])\n", + "Coordinates:\n", + " * lat (lat) int64 0 1 2 3\n", + " * lon (lon) int64 0 1 2 3 4\n" + ] + } + ], + "source": [ + "from scipy.stats import norm\n", + "crps_quadrature = xs.crps_quadrature(obs3, norm, dim=[])\n", + "print(crps_quadrature)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can also use a threshold Brier Score, to score hits over a certain threshold. Ranked Probability Score for two categories yields the same result." + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "array(0.15555556)\n", + "Coordinates:\n", + " threshold float64 0.5\n" + ] + } + ], + "source": [ + "threshold_brier_score = xs.threshold_brier_score(obs3, fct3, 0.5, dim=None)\n", + "print(threshold_brier_score)" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "array(0.15555556)\n" + ] + } + ], + "source": [ + "brier_score = xs.brier_score(obs3>.5, (fct3>.5).mean('member'))\n", + "print(brier_score)" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "array(0.15555556)\n" + ] + } + ], + "source": [ + "rps = xs.rps(obs3>.5, fct3>.5, category_edges=np.array([0.5]))\n", + "print(rps)" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "array([3, 8, 6, 3])\n", + "Coordinates:\n", + " * rank (rank) float64 1.0 2.0 3.0 4.0\n" + ] + } + ], + "source": [ + "rank_histogram = xs.rank_histogram(obs3, fct3)\n", + "print(rank_histogram)" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "array([[0. , 0.08333333, 0. , 0.66666667, 0.25 ],\n", + " [0.125 , 0.5 , 0. , 0.375 , 0. ]])\n", + "Coordinates:\n", + " * forecast_probability (forecast_probability) float64 0.1 0.3 0.5 0.7 0.9\n", + " * event (event) bool True False\n" + ] + } + ], + "source": [ + "disc = xs.discrimination(obs3 > 0.5, (fct3 > 0.5).mean(\"member\"))\n", + "print(disc)" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "array([0. , 0.2 , nan, 0.72727273, 1. ])\n", + "Coordinates:\n", + " * forecast_probability (forecast_probability) float64 0.1 0.3 0.5 0.7 0.9\n", + " samples (forecast_probability) float64 1.0 5.0 0.0 11.0 3.0\n" + ] + } + ], + "source": [ + "rel = xs.reliability(obs3 > 0.5, (fct3 > 0.5).mean(\"member\"))\n", + "print(rel)" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0.8229166666666666" + ] + }, + "execution_count": 23, + "metadata": {}, + "output_type": "execute_result" }, - "nbformat": 4, - "nbformat_minor": 4, -} + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "# ROC for probabilistic forecasts and bin_edges='continuous' default\n", + "roc = xs.roc(obs3 > 0.5, (fct3 > 0.5).mean(\"member\"), return_results='all_as_metric_dim')\n", + "\n", + "plt.figure(figsize=(4, 4))\n", + "plt.plot([0, 1], [0, 1], 'k:')\n", + "roc.to_dataset(dim='metric').plot.scatter(y='true positive rate', x='false positive rate')\n", + "roc.sel(metric='area under curve').values[0]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Contingency-Based\n", + "\n", + "To work with contingency-based scoring, first instantiate a `Contingency` object by passing in your observations, forecast, and observation/forecast bin edges. See https://www.cawcr.gov.au/projects/verification/#Contingency_table for more information." + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": {}, + "outputs": [], + "source": [ + "dichotomous_category_edges = np.array([0, 0.5, 1]) # \"dichotomous\" mean two-category\n", + "dichotomous_contingency = xs.Contingency(\n", + " obs, fct, dichotomous_category_edges, dichotomous_category_edges, dim=[\"lat\", \"lon\"]\n", + ")\n", + "dichotomous_contingency_table = dichotomous_contingency.table\n", + "print(dichotomous_contingency_table)" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
histogram_observations_forecasts
observations_category12
observations_category_bounds[0.0, 0.5)[0.5, 1.0]
forecasts_categoryforecasts_category_bounds
1[0.0, 0.5)5.334.67
2[0.5, 1.0]5.334.67
\n", + "
" + ], + "text/plain": [ + " histogram_observations_forecasts \\\n", + "observations_category 1 \n", + "observations_category_bounds [0.0, 0.5) \n", + "forecasts_category forecasts_category_bounds \n", + "1 [0.0, 0.5) 5.33 \n", + "2 [0.5, 1.0] 5.33 \n", + "\n", + " \n", + "observations_category 2 \n", + "observations_category_bounds [0.5, 1.0] \n", + "forecasts_category forecasts_category_bounds \n", + "1 [0.0, 0.5) 4.67 \n", + "2 [0.5, 1.0] 4.67 " + ] + }, + "execution_count": 26, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "print(\n", + " dichotomous_contingency_table.to_dataframe()\n", + " .pivot_table(\n", + " index=[\"forecasts_category\", \"forecasts_category_bounds\"],\n", + " columns=[\"observations_category\", \"observations_category_bounds\"],\n", + " )\n", + " .round(2)\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Scores based on the constructed contingency table can be called via class methods. The available methods are:\n", + "\n", + "* Accuracy (`accuracy`)\n", + "* Bias Score (`bias_score`)\n", + "* Equitable Threat Score (`equit_threat_score`)\n", + "* False Alarms / False Positives (`false_alarms`)\n", + "* False Alarm Ratio / False Discovery Rate (`false_alarm_ratio`)\n", + "* False Alarm Rate / False Positive Rate / Fall-out (`false_alarm_rate`)\n", + "* Gerrity Score (`gerrity_score`)\n", + "* Heidke Score / Cohen's Kappa (`heidke_score`)\n", + "* Hit Rate / Recall / Sensitivity / True Positive Rate (`hit_rate`)\n", + "* Hits / True Positives (`hits`)\n", + "* Misses / False Negatives (`misses`)\n", + "* Odds Ratio (`odds_ratio`)\n", + "* Odds Ratio Skill Score (`odds_ratio_skill_score`)\n", + "* Peirce Score (`peirce_score`)\n", + "* Receiver Operating Characteristic (`roc`)\n", + "* Success Ratio / Precision / Positive Predictive Value (`success_ratio`)\n", + "* Threat Score / Critical Success Index (`threat_score`)\n", + "\n", + "Below, we share a few examples of these in action:" + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "array([1. , 1.11111111, 1.1 ])\n", + "Coordinates:\n", + " * time (time) object 2000-01-01 00:00:00 ... 2000-01-03 00:00:00\n" + ] + } + ], + "source": [ + "print(dichotomous_contingency.bias_score())" + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "array([0.33333333, 0.55555556, 0.6 ])\n", + "Coordinates:\n", + " * time (time) object 2000-01-01 00:00:00 ... 2000-01-03 00:00:00\n" + ] + } + ], + "source": [ + "print(dichotomous_contingency.hit_rate())" + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "array([0.54545455, 0.45454545, 0.5 ])\n", + "Coordinates:\n", + " * time (time) object 2000-01-01 00:00:00 ... 2000-01-03 00:00:00\n" + ] + } + ], + "source": [ + "print(dichotomous_contingency.false_alarm_rate())" + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "array([-0.41176471, 0.2 , 0.2 ])\n", + "Coordinates:\n", + " * time (time) object 2000-01-01 00:00:00 ... 2000-01-03 00:00:00\n" + ] + } + ], + "source": [ + "print(dichotomous_contingency.odds_ratio_skill_score())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now we can leverage multi-category edges to make use of some scores." + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "metadata": {}, + "outputs": [], + "source": [ + "multi_category_edges = np.array([0, 0.25, 0.75, 1])\n", + "multicategory_contingency = xs.Contingency(\n", + " obs, fct, multi_category_edges, multi_category_edges, dim=[\"lat\", \"lon\"]\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "array([0.25, 0.25, 0.5 ])\n", + "Coordinates:\n", + " * time (time) object 2000-01-01 00:00:00 ... 2000-01-03 00:00:00\n" + ] + } + ], + "source": [ + "print(multicategory_contingency.accuracy())" + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "array([-0.14503817, -0.25 , 0.2481203 ])\n", + "Coordinates:\n", + " * time (time) object 2000-01-01 00:00:00 ... 2000-01-03 00:00:00\n" + ] + } + ], + "source": [ + "print(multicategory_contingency.heidke_score())" + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "array([-0.1496063 , -0.24193548, 0.25 ])\n", + "Coordinates:\n", + " * time (time) object 2000-01-01 00:00:00 ... 2000-01-03 00:00:00\n" + ] + } + ], + "source": [ + "print(multicategory_contingency.peirce_score())" + ] + }, + { + "cell_type": "code", + "execution_count": 35, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "array([-0.15212912, -0.11160714, 0.25 ])\n", + "Coordinates:\n", + " * time (time) object 2000-01-01 00:00:00 ... 2000-01-03 00:00:00\n" + ] + } + ], + "source": [ + "print(multicategory_contingency.gerrity_score())" + ] + }, + { + "cell_type": "code", + "execution_count": 36, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0.5035528250988777" + ] + }, + "execution_count": 36, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "# ROC for deterministic forecasts and bin_edges\n", + "roc = xs.roc(obs, fct, np.linspace(0, 1, 11), return_results='all_as_metric_dim')\n", + "\n", + "plt.figure(figsize=(4, 4))\n", + "plt.plot([0,1], [0,1], 'k:')\n", + "roc.to_dataset(dim='metric').plot.scatter(y='true positive rate', x='false positive rate')\n", + "roc.sel(metric='area under curve').values[0]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Comparative\n", + "\n", + "Tests to compare whether one forecast is significantly better than another one." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Sign test" + ] + }, + { + "cell_type": "code", + "execution_count": 37, + "metadata": {}, + "outputs": [], + "source": [ + "length = 100\n", + "obs_1d = xr.DataArray(\n", + " np.random.rand(length),\n", + " coords=[\n", + " np.arange(length),\n", + " ],\n", + " dims=[\"time\"],\n", + " name='var'\n", + " )\n", + "fct_1d = obs_1d.copy()\n", + "fct_1d.values = np.random.rand(length)" + ] + }, + { + "cell_type": "code", + "execution_count": 38, + "metadata": {}, + "outputs": [], + "source": [ + "# given you want to test whether one forecast is better than another forecast\n", + "significantly_different, walk, confidence = xs.sign_test(\n", + " fct_1d, fct_1d + 0.2, obs_1d, time_dim=\"time\", metric=\"mae\", orientation=\"negative\"\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 39, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[]" + ] + }, + "execution_count": 39, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "walk.plot()\n", + "confidence.plot(c='gray')\n", + "(-1 * confidence).plot(c='gray')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### MAE test" + ] + }, + { + "cell_type": "code", + "execution_count": 40, + "metadata": {}, + "outputs": [], + "source": [ + "# create a worse forecast with high but different to perfect correlation\n", + "fct_1d_worse = fct_1d.copy()\n", + "step = 3\n", + "fct_1d_worse[::step] = fct_1d[::step].values + 0.1" + ] + }, + { + "cell_type": "code", + "execution_count": 41, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "array(0.00966918)\n", + "\n", + "array(0.01083478)\n", + "MAEs significantly different at level 0.05 : False\n" + ] + } + ], + "source": [ + "# half-with of the confidence interval at level alpha is larger than the MAE differences,\n", + "# therefore not significant\n", + "alpha = 0.05\n", + "significantly_different, diff, hwci = xs.mae_test(\n", + " fct_1d, fct_1d_worse, obs_1d, time_dim=\"time\", dim=[], alpha=alpha\n", + ")\n", + "print(diff)\n", + "print(hwci)\n", + "print(\n", + " f\"MAEs significantly different at level {alpha} : {bool(significantly_different)}\"\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Accessors\n", + "\n", + "You can also use `xskillscore` as a method of your `xarray` Dataset." + ] + }, + { + "cell_type": "code", + "execution_count": 42, + "metadata": {}, + "outputs": [], + "source": [ + "ds = xr.Dataset()\n", + "ds[\"obs_var\"] = obs\n", + "ds[\"fct_var\"] = fct" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In the case that your Dataset contains both your observation and forecast variable, just pass them as strings into the function." + ] + }, + { + "cell_type": "code", + "execution_count": 43, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "array([[ 0.99509676, -0.88499394, 0.94083077, 0.96521259, -0.13696899],\n", + " [-0.90613709, 0.51585291, 0.72875703, 0.19331043, 0.79754067],\n", + " [-0.80112059, -0.95632624, -0.23640403, -0.57684283, 0.43389289],\n", + " [ 0.00230351, -0.58970109, -0.87332763, -0.99992557, -0.31404248]])\n", + "Coordinates:\n", + " * lat (lat) int64 0 1 2 3\n", + " * lon (lon) int64 0 1 2 3 4\n" + ] + } + ], + "source": [ + "print(ds.xs.pearson_r(\"obs_var\", \"fct_var\", dim=\"time\"))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can also pass in a separate Dataset that contains your observations or forecast variable." + ] + }, + { + "cell_type": "code", + "execution_count": 44, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "array([[ 0.99509676, -0.88499394, 0.94083077, 0.96521259, -0.13696899],\n", + " [-0.90613709, 0.51585291, 0.72875703, 0.19331043, 0.79754067],\n", + " [-0.80112059, -0.95632624, -0.23640403, -0.57684283, 0.43389289],\n", + " [ 0.00230351, -0.58970109, -0.87332763, -0.99992557, -0.31404248]])\n", + "Coordinates:\n", + " * lat (lat) int64 0 1 2 3\n", + " * lon (lon) int64 0 1 2 3 4\n" + ] + } + ], + "source": [ + "ds = ds.drop_vars(\"fct_var\")\n", + "print(ds.xs.pearson_r(\"obs_var\", fct, dim=\"time\"))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Resampling\n", + "- randomly resample the `time` dimension and then take mean over `time` to get resample threshold\n", + "- resample over `member` dimension to get uncertainty due to member sampling in hindcasts" + ] + }, + { + "cell_type": "code", + "execution_count": 45, + "metadata": {}, + "outputs": [], + "source": [ + "# create large one-dimensional array\n", + "s = 1000\n", + "f = xr.DataArray(\n", + " np.random.normal(size=s), dims=\"member\", coords={\"member\": np.arange(s)}, name=\"var\"\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 46, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "65.1 ms ± 1.78 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)\n", + "1.44 ms ± 41.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n" + ] + } + ], + "source": [ + "# resample with replacement in that one dimension\n", + "iterations = 100\n", + "%timeit f_r = xs.resampling.resample_iterations(f, iterations, 'member', replace=True)\n", + "# resample_iterations_idx is much (50x) faster because it involves no loops\n", + "%timeit f_r = xs.resampling.resample_iterations_idx(f, iterations, 'member', replace=True)\n", + "# but both do the same resampling" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "- use `resample_iterations` for very large data, because very robust, chunksize stays contants and only more tasks are added\n", + "- use `resample_iterations_idx` for small data always and very large data only, when chunked to small chunks in the other dimensions, because the function increases the input chunksize by factor `iterations`" + ] + }, + { + "cell_type": "code", + "execution_count": 47, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 47, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "f_r = xs.resampling.resample_iterations_idx(f, iterations, 'member', replace=True)\n", + "f.plot.hist(label='distribution')\n", + "f_r.mean('iteration').plot.hist(label='resampled mean distribution')\n", + "plt.axvline(x=f.mean('member'), c='k', label='distribution mean')\n", + "plt.title('Gaussian distribution mean')\n", + "plt.legend()" + ] + }, + { + "cell_type": "code", + "execution_count": 48, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 48, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "# we can calculate the distribution of the RMSE of 0 and f resampled over member\n", + "xs.rmse(f_r, xr.zeros_like(f_r), dim='iteration').plot.hist(label='resampled RMSE distribution')\n", + "# the gaussian distribution should have an RMSE with 0 of one\n", + "plt.axvline(x=xs.rmse(f, xr.zeros_like(f)), c='k', label='RMSE')\n", + "plt.title('RMSE between gaussian distribution and 0')\n", + "plt.legend()" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8.6" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} \ No newline at end of file From 7f1368555b1de8f532fab717577a8d849f150ea6 Mon Sep 17 00:00:00 2001 From: Ray Bell Date: Tue, 11 May 2021 00:28:34 -0400 Subject: [PATCH 9/9] lint3 --- docs/source/quick-start.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/quick-start.ipynb b/docs/source/quick-start.ipynb index f37cd367..aef79bcb 100644 --- a/docs/source/quick-start.ipynb +++ b/docs/source/quick-start.ipynb @@ -1380,4 +1380,4 @@ }, "nbformat": 4, "nbformat_minor": 4 -} \ No newline at end of file +}