Skip to content

Commit

Permalink
Merge branch 'sdh-similarity-docs' into 'main'
Browse files Browse the repository at this point in the history
Update documentation for similarity functions

See merge request water/computational-tools/surface-water-work/hyswap!77
  • Loading branch information
Scott Hamshaw committed May 22, 2024
2 parents 94bd738 + 2865f24 commit 7912e4a
Show file tree
Hide file tree
Showing 3 changed files with 54 additions and 10 deletions.
11 changes: 5 additions & 6 deletions docs/source/examples/similarity_examples.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,13 @@
Similarity Measures
-------------------

These examples showcase the usage of the functions in the `similarity` module, with heatmap visualizations via the :obj:`hyswap.plots.plot_similarity_heatmap` function.
Sometimes it is helpful to compare the relationships between a set of stations and their respective measurements.
The `similarity` functions packaged in `hyswap` handle some of the data clean-up for you by ensuring the time-series of observations being compared at the same, and by removing any missing data.
This ensures that your results are not skewed by missing data or gaps in one of the time-series.
Sometimes it is helpful to compare the relationships between a set of streamgaging stations and their respective measurements. These examples showcase the usage of the functions in the `similarity` module to quantify how similar streamflow records are across multiple streamgages. Matrices of similarity measures (e.g., correlations) are calculated and visualized by generating heatmap visualizations via the :obj:`hyswap.plots.plot_similarity_heatmap` function.

The `similarity` functions packaged in `hyswap` handle some of the data clean-up for you by ensuring the time-series of observations being compared acros the same dates, and by removing any missing data. This ensures that your results are not skewed by missing data or gaps in one of the time-series.

Correlations Between 5 Stations
*******************************

Pearson's *r* Correlations Between 5 Stations
*********************************************

The following example shows the correlations between streamflow at 5 stations (07374525, 07374000, 07289000, 07032000, 07024175) along the Mississippi River, listed from downstream to upstream.
First we have to fetch the streamflow data for these stations, to do this we will use the `dataretrieval` package to access the NWIS database.
Expand Down
52 changes: 48 additions & 4 deletions docs/source/meta/calculations.rst
Original file line number Diff line number Diff line change
Expand Up @@ -158,23 +158,67 @@ Figure 1. Example computation for computation of runoff for a selected HUC unit.

**Note:** Description of methods for area-based runoff computation is adapted from `USGS WaterWatch <https://pubs.usgs.gov/publication/fs20083031>`_.

Streamflow Record Similarity
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Identifying streamgages that are most similar or correlated is a common task when identifying potential streamgages to be used for estimating missing records or other hydrological modeling tasks. ``hyswap`` can compute the similarity of streamflow records using different similarity measures and plot these as a matrix. The available measures are:

+---------------------------+-------------------------------------------+
| Similarity Measure | Description |
+===========================+===========================================+
| Pearson's *r* correlation | Commonly used measure of correlation that |
| | measures the linear association between |
| | two datasets `(Helsel and others, 2020)`_.|
| | Calculation of Pearson's *r* correlation |
| | on daily streamflow records or |
| | log-transformed daily streamflow records |
| | is often used to identify potential |
| | index or reference streamgages |
| | `(Yuan, 2013)_.` |
+---------------------------+-------------------------------------------+
| Wasserstein Distance | A metric that measures the distance |
| | between two distributions and in a |
| | hydrological context measures the “effort”|
| | required to rearrange one distribution of |
| | water into the other. The Wasserstein |
| | distance can be used to compare how |
| | similar two hydrographs are to each other |
| | `(Magyar & Sambridge, 2023)`_. |
+---------------------------+-------------------------------------------+
| Energy Distance | A metric that measures the distance |
| | between two distributions. The energy |
| | is experimental in hydrology but has been |
| | used to identify similarity between time |
| | series such electricity demand |
| | `(Ziel, 2021)`_. |
+---------------------------+-------------------------------------------+

References
----------

Brakebill, J.W., D.M. Wolock, and S.E. Terziotti, 2011. Digital Hydrologic Networks Supporting Applications Related to Spatially Referenced Regression Modeling. Journal of the American Water Resources Association(JAWRA) 47(5):916-932.
Brakebill, J.W., D.M. Wolock, and S.E. Terziotti, 2011. Digital Hydrologic Networks Supporting Applications Related to Spatially Referenced Regression Modeling. Journal of the American Water Resources Association (JAWRA) 47(5):916-932.

Helsel, D.R., Hirsch, R.M., Ryberg, K.R., Archfield, S.A., and Gilroy, E.J., 2020, Statistical methods in water resources: U.S. Geological Survey Techniques and Methods, book 4, chap. A3, 458 p., `doi.org/10.3133/tm4a3 <https://doi.org/10.3133/tm4a3>`_. [Supersedes USGS Techniques of Water-Resources Investigations, book 4, chap. A3, version 1.1.]
Helsel, D.R., Hirsch, R.M., Ryberg, K.R., Archfield, S.A., and Gilroy, E.J., 2020. Statistical methods in water resources: U.S. Geological Survey Techniques and Methods, book 4, chap. A3, 458 p., `doi.org/10.3133/tm4a3 <https://doi.org/10.3133/tm4a3>`_. [Supersedes USGS Techniques of Water-Resources Investigations, book 4, chap. A3, version 1.1.]

Jones, K.A., Niknami, L.S., Buto, S.G., and Decker, D., 2022, Federal standards and procedures for the national Watershed Boundary Dataset (WBD) (5 ed.): U.S. Geological Survey Techniques and Methods 11-A3, 54 p., `doi.org/10.3133/tm11A3 <https://doi.org/10.3133/tm11A3>`_.
Jones, K.A., Niknami, L.S., Buto, S.G., and Decker, D., 2022. Federal standards and procedures for the national Watershed Boundary Dataset (WBD) (5 ed.): U.S. Geological Survey Techniques and Methods 11-A3, 54 p., `doi.org/10.3133/tm11A3 <https://doi.org/10.3133/tm11A3>`_.

Magyar, J.C. & Sambridge, M., 2023. Hydrological objective functions and ensemble averaging with the Wasserstein distance, Hydrol. Earth Syst. Sci., 27, 991–1010, `doi.org/10.5194/hess-27-991-2023 <https://doi.org/10.5194/hess-27-991-2023>`_.

U.S. Geological Survey, 2011. USGS Streamgage NHDPlus Version 1 Basins 2011. Data Series [DS-719] `water.usgs.gov/lookup/getspatial?streamgagebasins <https://water.usgs.gov/lookup/getspatial?streamgagebasins>`_

U.S. Geological Survey, 2023. USGS water data for the Nation: U.S. Geological Survey National Water Information System database, accessed at `doi.org/10.5066/F7P55KJN <http://dx.doi.org/10.5066/F7P55KJN>`_

Weibull, W., 1939. A statistical theory of strength of materials: Ingeniors Vetenskaps Akademien Handlinga, no. 153, 9. 17
Weibull, W., 1939. A statistical theory of strength of materials, Ingeniors Vetenskaps Akademien Handlinga, no. 153, 9. 17

Yuan, L.L., 2013. Using correlation of daily flows to identify index gauges for ungauged streams, Water Resour. Res., 49, `doi:10.1002/wrcr.20070 <https://doi.org/10.1002/wrcr.20070>`_.

Ziel, F., 2021. The energy distance for ensemble and scenario reduction, Phil, Trans. R. Soc. A. 379: 20190431, `doi.org/10.1098/rsta.2019.0431<http://doi.org/10.1098/rsta.2019.0431>`_.

.. _(Brakebill and others, 2011): https://doi.org/10.1111/j.1752-1688.2011.00578.x
.. _(Helsel and others, 2020): https://doi.org/10.3133/tm4A3
.. _(Jones and others, 2022): https://doi.org/10.3133/tm11A3
.. _(Magyar & Sambridge, 2023): https://doi.org/10.5194/hess-27-991-2023
.. _(U.S. Geological Survey, 2011): https://water.usgs.gov/lookup/getspatial?streamgagebasins
.. _(U.S. Geological Survey, 2023): http://dx.doi.org/10.5066/F7P55KJN
.. _(Yuan, 2013): https://doi.org/10.1002/wrcr.20070
.. _(Ziel, 2021): https://doi.org/10.1098/rsta.2019.0431
1 change: 1 addition & 0 deletions hyswap/plots.py
Original file line number Diff line number Diff line change
Expand Up @@ -877,6 +877,7 @@ def plot_similarity_heatmap(sim_matrix, n_obs=None, cmap='inferno',
# set tick labels
ax.set_xticklabels(sim_matrix.columns)
ax.set_yticklabels(sim_matrix.index)
plt.xticks(rotation=45, ha='right')
# add colorbar
plt.colorbar(im, ax=ax)
# return
Expand Down

0 comments on commit 7912e4a

Please sign in to comment.