diff --git a/.codespellrc b/.codespellrc new file mode 100644 index 00000000..99ff9f68 --- /dev/null +++ b/.codespellrc @@ -0,0 +1,5 @@ +[codespell] +skip = *.nc,*.ipynb,./local_work,./float_source,./binder,./.github,*.log,./.git,./docs/_build,./docs/_static +count = +quiet-level = 3 +ignore-words-list = PRES, pres \ No newline at end of file diff --git a/.gitignore b/.gitignore index a092b15b..9a295ce2 100644 --- a/.gitignore +++ b/.gitignore @@ -189,10 +189,10 @@ fabric.properties # Android studio 3.1+ serialized cache file .idea/caches/build_file_checksums.ser -#pytest quai20 +#pytest and misc .vscode/ .pytest_cache knotebooks/ argopy/tests/cov.xml argopy/tests/dummy_fileA.txt -float_source +float_source/ diff --git a/HOW_TO_RELEASE.md b/HOW_TO_RELEASE.md index fab91385..a38955dd 100644 --- a/HOW_TO_RELEASE.md +++ b/HOW_TO_RELEASE.md @@ -1,28 +1,30 @@ -1. [ ] Make sure that all CI tests are passed with **free* environments +1. [ ] Run codespell ``codespell -q 3`` -2. [ ] Update ``./requirements.txt`` and ``./docs/requirements.txt`` with CI free environments dependencies versions +2. [ ] Make sure that all CI tests are passed with **free* environments -3. [ ] Update ``./ci/requirements/py*-dev.yml`` with last free environments dependencies versions +3. [ ] Update ``./requirements.txt`` and ``./docs/requirements.txt`` with CI free environments dependencies versions -4. [ ] Make sure that all CI tests are passed with **dev* environments +4. [ ] Update ``./ci/requirements/py*-dev.yml`` with last free environments dependencies versions -5. [ ] Increase release version in ``./setup.py`` file +5. [ ] Make sure that all CI tests are passed with **dev* environments -6. [ ] Update date and release version in ``./docs/whats-new.rst`` +6. [ ] Increase release version in ``./setup.py`` file -7. [ ] On the master branch, commit the release in git: +7. [ ] Update date and release version in ``./docs/whats-new.rst`` + +8. [ ] On the master branch, commit the release in git: ```git commit -a -m 'Release v0.X.Y'``` -8. [ ] Tag the release: +9. [ ] Tag the release: ```git tag -a v0.X.Y -m 'v0.X.Y'``` -9. [ ] Push it online: +10. [ ] Push it online: - ```git push origin v0.X.Y``` + ```git push origin v0.X.Y``` -10. [ ] Issue the release on GitHub. Click on "Draft a new release" at +11. [ ] Issue the release on GitHub. Click on "Draft a new release" at https://github.com/euroargodev/argopy/releases. Type in the version number, but don't bother to describe it -- we maintain that on the docs instead. diff --git a/README.md b/README.md index df7b2521..eede2e4c 100644 --- a/README.md +++ b/README.md @@ -9,7 +9,11 @@ ## Install -Install the last release with pip: +Install the last release with conda: +```bash +conda install -c conda-forge argopy +``` +or pip: ```bash pip install argopy ``` @@ -113,21 +117,4 @@ See the [documentation page for more examples](https://argopy.readthedocs.io/en/ ## Development roadmap -Our next big steps: -- [ ] To provide Bio-geochemical variables ([#22](https://github.com/euroargodev/argopy/issues/22), [#77](https://github.com/euroargodev/argopy/issues/77), [#81](https://github.com/euroargodev/argopy/issues/81)) -- [ ] To develop expert methods related to Quality Control of the data with other python softwares like: - - [ ] [pyowc](https://github.com/euroargodev/argodmqc_owc): [#33](https://github.com/euroargodev/argodmqc_owc/issues/33), [#53](https://github.com/euroargodev/argodmqc_owc/issues/53) - - [ ] [bgcArgoDMQC](https://github.com/ArgoCanada/bgcArgoDMQC): [#37](https://github.com/ArgoCanada/bgcArgoDMQC/issues/37) - -We aim to provide high level helper methods to load Argo data and meta-data from: -- [x] Ifremer erddap -- [x] local copy of the GDAC ftp folder -- [x] Index files (local and online) -- [x] Argovis -- [ ] Online GDAC ftp - -We also aim to provide high level helper methods to visualise and plot Argo data and meta-data: -- [x] Map with trajectories -- [x] Histograms for meta-data -- [ ] Waterfall plots -- [ ] T/S diagram \ No newline at end of file +See milestone here: https://github.com/euroargodev/argopy/milestone/3 \ No newline at end of file diff --git a/argopy/options.py b/argopy/options.py index c8e8e6d6..8af9939d 100644 --- a/argopy/options.py +++ b/argopy/options.py @@ -65,32 +65,37 @@ def validate_ftp(this_path): class set_options: - """Set options for argopy. + """Set options for argopy List of options: - - `dataset`: Define the Dataset to work with. - Default: `phy`. Possible values: `phy`, `bgc` or `ref`. - - `src`: Source of fetched data. - Default: `erddap`. Possible values: `erddap`, `localftp`, `argovis` - - `local_ftp`: Absolute path to a local GDAC ftp copy. - Default: `.` - - `cachedir`: Absolute path to a local cache directory. - Default: `~/.cache/argopy` - - `mode`: User mode. - Default: `standard`. Possible values: `standard` or `expert`. - - `api_timeout`: Define the time out of internet requests to web API, in seconds. - Default: 60 - - `trust_env`: Allow for local environment variables to be used by fsspec to connect to the internet. Get - proxies information from HTTP_PROXY / HTTPS_PROXY environment variables if this option is True (False by - default). Also can get proxy credentials from ~/.netrc file if present. + - ``dataset``: Define the Dataset to work with. + Default: ``phy``. + Possible values: ``phy``, ``bgc`` or ``ref``. + - ``src``: Source of fetched data. + Default: ``erddap``. + Possible values: ``erddap``, ``localftp``, ``argovis`` + - ``local_ftp``: Absolute path to a local GDAC ftp copy. + Default: None + - ``cachedir``: Absolute path to a local cache directory. + Default: ``~/.cache/argopy`` + - ``mode``: User mode. + Default: ``standard``. + Possible values: ``standard`` or ``expert``. + - ``api_timeout``: Define the time out of internet requests to web API, in seconds. + Default: 60 + - ``trust_env``: Allow for local environment variables to be used by fsspec to connect to the internet. + Get proxies information from HTTP_PROXY / HTTPS_PROXY environment variables if this option is True ( + False by default). Also can get proxy credentials from ~/.netrc file if present. You can use `set_options` either as a context manager: + >>> import argopy >>> with argopy.set_options(src='localftp'): >>> ds = argopy.DataFetcher().float(3901530).to_xarray() Or to set global options: + >>> argopy.set_options(src='localftp') """ diff --git a/argopy/tests/__init__.py b/argopy/tests/__init__.py index 14777826..285bf0c4 100644 --- a/argopy/tests/__init__.py +++ b/argopy/tests/__init__.py @@ -187,7 +187,7 @@ def test_wrapper(fix): pass except ServerDisconnectedError as e: # We can't do anything about this ! - warnings.warn("\nWe were disconnected from server !\n%s" % str(e.args)) + warnings.warn("\n We were disconnected from server !\n%s" % str(e.args)) pass except ClientResponseError as e: # The server is sending back an error when creating the response diff --git a/argopy/tests/test_utilities.py b/argopy/tests/test_utilities.py index 11b3d1f1..545374c1 100644 --- a/argopy/tests/test_utilities.py +++ b/argopy/tests/test_utilities.py @@ -23,6 +23,9 @@ format_oneline, is_indexbox, check_wmo, is_wmo, wmo2box, + modified_environ, + wrap_longitude, + toYearFraction, YearFraction_to_datetime, TopoFetcher ) from argopy.errors import InvalidFetcherAccessPoint, FtpPathError @@ -487,6 +490,14 @@ def test_check_wmo(): assert check_wmo(np.array((12345, 1234567), dtype='int')) == [12345, 1234567] +def test_modified_environ(): + os.environ["DUMMY_ENV_ARGOPY"] = 'initial' + with modified_environ(DUMMY_ENV_ARGOPY='toto'): + assert os.environ['DUMMY_ENV_ARGOPY'] == 'toto' + assert os.environ['DUMMY_ENV_ARGOPY'] == 'initial' + os.environ.pop('DUMMY_ENV_ARGOPY') + + def test_wmo2box(): with pytest.raises(ValueError): wmo2box(12) @@ -507,6 +518,23 @@ def complete_box(b): assert is_box(complete_box(wmo2box(7501))) +def test_wrap_longitude(): + assert wrap_longitude(np.array([-20])) == 340 + assert wrap_longitude(np.array([40])) == 40 + assert np.all(np.equal(wrap_longitude(np.array([340, 20])), np.array([340, 380]))) + + +def test_toYearFraction(): + assert toYearFraction(pd.to_datetime('202001010000')) == 2020 + assert toYearFraction(pd.to_datetime('202001010000', utc=True)) == 2020 + assert toYearFraction(pd.to_datetime('202001010000')+pd.offsets.DateOffset(years=1)) == 2021 + + +def test_YearFraction_to_datetime(): + assert YearFraction_to_datetime(2020) == pd.to_datetime('202001010000') + assert YearFraction_to_datetime(2020+1) == pd.to_datetime('202101010000') + + @requires_connection def test_TopoFetcher(): box = [81, 123, -67, -54] diff --git a/argopy/tests/test_xarray.py b/argopy/tests/test_xarray.py index ba31acd1..598c083c 100644 --- a/argopy/tests/test_xarray.py +++ b/argopy/tests/test_xarray.py @@ -1,9 +1,14 @@ +import os import pytest import warnings +import numpy as np +import tempfile +import xarray as xr +import argopy from argopy import DataFetcher as ArgoDataFetcher -from argopy.errors import InvalidDatasetStructure -from . import requires_connected_erddap_phy +from argopy.errors import InvalidDatasetStructure, OptionValueError +from . import requires_connected_erddap_phy, requires_localftp @pytest.fixture(scope="module") @@ -19,7 +24,8 @@ def ds_pts(): data[user_mode] = ( ArgoDataFetcher(src="erddap", mode=user_mode) .region([-75, -55, 30.0, 40.0, 0, 100.0, "2011-01-01", "2011-01-15"]) - .to_xarray() + .load() + .data ) except Exception as e: warnings.warn("Error when fetching tests data: %s" % str(e.args)) @@ -60,12 +66,6 @@ def test_interpolation_expert(self, ds_pts): ds = ds_pts["expert"].argo.point2profile() assert "PRES_INTERPOLATED" in ds.argo.interp_std_levels([20, 30, 40, 50]).dims - def test_points_error(self, ds_pts): - """Try to interpolate points, not profiles""" - ds = ds_pts["standard"] - with pytest.raises(InvalidDatasetStructure): - ds.argo.interp_std_levels([20, 30, 40, 50]) - def test_std_error(self, ds_pts): """Try to interpolate on a wrong axis""" ds = ds_pts["standard"].argo.point2profile() @@ -77,6 +77,48 @@ def test_std_error(self, ds_pts): ds.argo.interp_std_levels(12) +@requires_connected_erddap_phy +class Test_groupby_pressure_bins: + def test_groupby_ds_type(self, ds_pts): + """Run with success for standard/expert mode and point/profile""" + for user_mode, this in ds_pts.items(): + for format in ["point", "profile"]: + if format == 'profile': + that = this.argo.point2profile() + else: + that = this.copy() + bins = np.arange(0.0, np.max(that["PRES"]) + 10.0, 10.0) + assert "STD_PRES_BINS" in that.argo.groupby_pressure_bins(bins).coords + + def test_bins_error(self, ds_pts): + """Try to groupby over invalid bins """ + ds = ds_pts["standard"] + with pytest.raises(ValueError): + ds.argo.groupby_pressure_bins([100, 20, 30, 40, 50]) # un-sorted + with pytest.raises(ValueError): + ds.argo.groupby_pressure_bins([-20, 20, 30, 40, 50]) # Negative values + + def test_axis_error(self, ds_pts): + """Try to group by using invalid pressure axis """ + ds = ds_pts["standard"] + bins = np.arange(0.0, np.max(ds["PRES"]) + 10.0, 10.0) + with pytest.raises(ValueError): + ds.argo.groupby_pressure_bins(bins, axis='invalid') + + def test_empty_result(self, ds_pts): + """Try to groupby over bins without data""" + ds = ds_pts["standard"] + with pytest.warns(Warning): + out = ds.argo.groupby_pressure_bins([10000, 20000]) + assert out == None + + def test_all_select(self, ds_pts): + ds = ds_pts["standard"] + bins = np.arange(0.0, np.max(ds["PRES"]) + 10.0, 10.0) + for select in ["shallow", "deep", "middle", "random", "min", "max", "mean", "median"]: + assert "STD_PRES_BINS" in ds.argo.groupby_pressure_bins(bins).coords + + @requires_connected_erddap_phy class Test_teos10: def test_teos10_variables_default(self, ds_pts): @@ -126,3 +168,58 @@ def test_teos10_invalid_variable(self, ds_pts): that = that.argo.point2profile() with pytest.raises(ValueError): that.argo.teos10(["InvalidVariable"]) + + +@requires_localftp +class Test_create_float_source: + local_ftp = argopy.tutorial.open_dataset("localftp")[0] + + def is_valid_mdata(self, this_mdata): + """Validate structure of the output dataset """ + check = [] + # Check for dimensions: + check.append(argopy.utilities.is_list_equal(['m', 'n'], list(this_mdata.dims))) + # Check for coordinates: + check.append(argopy.utilities.is_list_equal(['m', 'n'], list(this_mdata.coords))) + # Check for data variables: + check.append(np.all( + [v in this_mdata.data_vars for v in ['PRES', 'TEMP', 'PTMP', 'SAL', 'DATES', 'LAT', 'LONG', 'PROFILE_NO']])) + check.append(np.all( + [argopy.utilities.is_list_equal(['n'], this_mdata[v].dims) for v in ['LONG', 'LAT', 'DATES', 'PROFILE_NO'] + if v in this_mdata.data_vars])) + check.append(np.all( + [argopy.utilities.is_list_equal(['m', 'n'], this_mdata[v].dims) for v in ['PRES', 'TEMP', 'SAL', 'PTMP'] if + v in this_mdata.data_vars])) + return np.all(check) + + def test_error_user_mode(self): + with argopy.set_options(local_ftp=self.local_ftp): + with pytest.raises(InvalidDatasetStructure): + ds = ArgoDataFetcher(src="localftp", mode='standard').float([6901929, 2901623]).load().data + ds.argo.create_float_source() + + def test_opt_force(self): + with argopy.set_options(local_ftp=self.local_ftp): + expert_ds = ArgoDataFetcher(src="localftp", mode='expert').float([2901623]).load().data + + with pytest.raises(OptionValueError): + expert_ds.argo.create_float_source(force='dummy') + + ds_float_source = expert_ds.argo.create_float_source(path=None, force='default') + assert np.all([k in np.unique(expert_ds['PLATFORM_NUMBER']) for k in ds_float_source.keys()]) + assert np.all([isinstance(ds_float_source[k], xr.Dataset) for k in ds_float_source.keys()]) + assert np.all([self.is_valid_mdata(ds_float_source[k]) for k in ds_float_source.keys()]) + + ds_float_source = expert_ds.argo.create_float_source(path=None, force='raw') + assert np.all([k in np.unique(expert_ds['PLATFORM_NUMBER']) for k in ds_float_source.keys()]) + assert np.all([isinstance(ds_float_source[k], xr.Dataset) for k in ds_float_source.keys()]) + assert np.all([self.is_valid_mdata(ds_float_source[k]) for k in ds_float_source.keys()]) + + def test_filecreate(self): + with argopy.set_options(local_ftp=self.local_ftp): + expert_ds = ArgoDataFetcher(src="localftp", mode='expert').float([6901929, 2901623]).load().data + + N_file = len(np.unique(expert_ds['PLATFORM_NUMBER'])) + with tempfile.TemporaryDirectory() as folder_output: + expert_ds.argo.create_float_source(path=folder_output) + assert len(os.listdir(folder_output)) == N_file diff --git a/argopy/utilities.py b/argopy/utilities.py index 2426b73d..c698ffe3 100644 --- a/argopy/utilities.py +++ b/argopy/utilities.py @@ -42,6 +42,7 @@ FtpPathError, InvalidFetcher, InvalidFetcherAccessPoint, + InvalidOption ) try: @@ -1258,6 +1259,11 @@ def is_list_of_datasets(lst): return all(isinstance(x, xr.Dataset) for x in lst) +def is_list_equal(lst1, lst2): + """ Return true if 2 lists contain same elements""" + return len(lst1) == len(lst2) and len(lst1) == sum([1 for i, j in zip(lst1, lst2) if i == j]) + + def check_wmo(lst): """ Validate a WMO option and returned it as a list of integers @@ -1403,6 +1409,93 @@ def modified_environ(*remove, **update): [env.pop(k) for k in remove_after] +def toYearFraction(this_date: pd._libs.tslibs.timestamps.Timestamp = pd.to_datetime('now')): + """ Compute decimal year, robust to leap years, precision to the second + + Compute the fraction of the year a given timestamp corresponds to. + The "fraction of the year" goes: + - from 0 on 01-01T00:00:00.000 of the year + - to 1 on the 01-01T00:00:00.000 of the following year + + 1 second corresponds to the number of days in the year times 86400. + The faction of the year is rounded to 10-digits in order to have a "second" precision. + + See discussion here: https://github.com/euroargodev/argodmqc_owc/issues/35 + + Parameters + ---------- + pd._libs.tslibs.timestamps.Timestamp + + Returns + ------- + float + """ + if "UTC" in [this_date.tzname() if not this_date.tzinfo is None else ""]: + startOfThisYear = pd.to_datetime("%i-01-01T00:00:00.000" % this_date.year, utc=True) + else: + startOfThisYear = pd.to_datetime("%i-01-01T00:00:00.000" % this_date.year) + yearDuration_sec = (startOfThisYear + pd.offsets.DateOffset(years=1) - startOfThisYear).total_seconds() + + yearElapsed_sec = (this_date - startOfThisYear).total_seconds() + fraction = yearElapsed_sec / yearDuration_sec + fraction = np.round(fraction, 10) + return this_date.year + fraction + + +def YearFraction_to_datetime(yf: float): + """ Compute datetime from year fraction + + Inverse the toYearFraction() function + + Parameters + ---------- + float + + Returns + ------- + pd._libs.tslibs.timestamps.Timestamp + """ + year = np.int32(yf) + fraction = yf - year + fraction = np.round(fraction, 10) + + startOfThisYear = pd.to_datetime("%i-01-01T00:00:00" % year) + yearDuration_sec = (startOfThisYear + pd.offsets.DateOffset(years=1) - startOfThisYear).total_seconds() + yearElapsed_sec = pd.Timedelta(fraction * yearDuration_sec, unit='s') + return pd.to_datetime(startOfThisYear + yearElapsed_sec, unit='s') + + +def wrap_longitude(grid_long): + """ Allows longitude (0-360) to wrap beyond the 360 mark, for mapping purposes. + Makes sure that, if the longitude is near the boundary (0 or 360) that we + wrap the values beyond + 360 so it appears nicely on a map + This is a refactor between get_region_data and get_region_hist_locations to + avoid duplicate code + + source: https://github.com/euroargodev/argodmqc_owc/blob/e174f4538fdae1534c9740491398972b1ffec3ca/pyowc/utilities.py#L80 + + Parameters + ---------- + grid_long: array of longitude values + + Returns + ------- + array of longitude values that can extend past 360 + """ + neg_long = np.argwhere(grid_long < 0) + grid_long[neg_long] = grid_long[neg_long] + 360 + + # if we have data close to upper boundary (360), then wrap some of the data round + # so it appears on the map + top_long = np.argwhere(grid_long >= 320) + if top_long.__len__() != 0: + bottom_long = np.argwhere(grid_long <= 40) + grid_long[bottom_long] = 360 + grid_long[bottom_long] + + return grid_long + + def wmo2box(wmo_id: int): """ Convert WMO square box number into a latitude/longitude box @@ -1455,6 +1548,92 @@ def wmo2box(wmo_id: int): return box +def groupby_remap(z, data, z_regridded, z_dim=None, z_regridded_dim="regridded", output_dim="remapped", select='deep', right=False): + """ todo: Need a docstring here !""" + + # sub-sampling called in xarray ufunc + def _subsample_bins(x, y, target_values): + # remove all nans from input x and y + idx = np.logical_or(np.isnan(x), np.isnan(y)) + x = x[~idx] + y = y[~idx] + + ifound = np.digitize(x, target_values, right=right) # ``bins[i-1] <= x < bins[i]`` + ifound -= 1 # Because digitize returns a 1-based indexing, we need to remove 1 + y_binned = np.ones_like(target_values) * np.nan + + for ib, this_ibin in enumerate(np.unique(ifound)): + ix = np.where(ifound == this_ibin) + iselect = ix[-1] + + # Map to y value at specific x index in the bin: + if select == 'shallow': + iselect = iselect[0] # min/shallow + mapped_value = y[iselect] + elif select == 'deep': + iselect = iselect[-1] # max/deep + mapped_value = y[iselect] + elif select == 'middle': + iselect = iselect[np.where(x[iselect] >= np.median(x[iselect]))[0][0]] # median/middle + mapped_value = y[iselect] + elif select == 'random': + iselect = iselect[np.random.randint(len(iselect))] + mapped_value = y[iselect] + + # or Map to y statistics in the bin: + elif select == 'mean': + mapped_value = np.nanmean(y[iselect]) + elif select == 'min': + mapped_value = np.nanmin(y[iselect]) + elif select == 'max': + mapped_value = np.nanmax(y[iselect]) + elif select == 'median': + mapped_value = np.median(y[iselect]) + + else: + raise InvalidOption("`select` option has invalid value (%s)" % select) + + y_binned[this_ibin] = mapped_value + + return y_binned + + # infer dim from input + if z_dim is None: + if len(z.dims) != 1: + raise RuntimeError("if z_dim is not specified, x must be a 1D array.") + dim = z.dims[0] + else: + dim = z_dim + + # if dataset is passed drop all data_vars that dont contain dim + if isinstance(data, xr.Dataset): + raise ValueError("Dataset input is not supported yet") + # TODO: for a dataset input just apply the function for each appropriate array + + if version.parse(xr.__version__) > version.parse("0.15.0"): + kwargs = dict( + input_core_dims=[[dim], [dim], [z_regridded_dim]], + output_core_dims=[[output_dim]], + vectorize=True, + dask="parallelized", + output_dtypes=[data.dtype], + dask_gufunc_kwargs={'output_sizes': {output_dim: len(z_regridded[z_regridded_dim])}}, + ) + else: + kwargs = dict( + input_core_dims=[[dim], [dim], [z_regridded_dim]], + output_core_dims=[[output_dim]], + vectorize=True, + dask="parallelized", + output_dtypes=[data.dtype], + output_sizes={output_dim: len(z_regridded[z_regridded_dim])}, + ) + remapped = xr.apply_ufunc(_subsample_bins, z, data, z_regridded, **kwargs) + + remapped.coords[output_dim] = z_regridded.rename({z_regridded_dim: output_dim}).coords[output_dim] + return remapped + + class TopoFetcher(): """ Fetch topographic data through an ERDDAP server for an ocean rectangle @@ -1627,4 +1806,4 @@ def to_xarray(self, errors: str = 'ignore'): def load(self, errors: str = 'ignore'): """ Load Topographic data and return a xarray.DataSet """ - return self.to_xarray(errors=errors) \ No newline at end of file + return self.to_xarray(errors=errors) diff --git a/argopy/xarray.py b/argopy/xarray.py index 07347a5c..315861ae 100644 --- a/argopy/xarray.py +++ b/argopy/xarray.py @@ -1,3 +1,4 @@ +import os import sys import warnings @@ -5,6 +6,7 @@ import pandas as pd import xarray as xr from sklearn import preprocessing +import logging try: import gsw @@ -13,24 +15,56 @@ except ModuleNotFoundError: with_gsw = False -from argopy.utilities import linear_interpolation_remap -from argopy.errors import InvalidDatasetStructure +from argopy.utilities import ( + linear_interpolation_remap, + is_list_equal, + is_list_of_strings, + toYearFraction, + groupby_remap +) +from argopy.errors import InvalidDatasetStructure, DataNotFound, OptionValueError + + +log = logging.getLogger("argopy.xarray") @xr.register_dataset_accessor("argo") class ArgoAccessor: - """Class registered under scope ``argo`` to access a :class:`xarray.Dataset` object. + """ - Methods - ------- - cast_types: - Ensure all variables are of the Argo required dtype with: - point2profile: - Convert a collection of points into a collection of profiles - profile2point: - Convert a collection of profiles to a collection of points: + Class registered under scope ``argo`` to access a :class:`xarray.Dataset` object. - """ + - Ensure all variables are of the Argo required dtype with: + >>> ds.argo.cast_types() + + - Convert a collection of points into a collection of profiles: + >>> ds.argo.point2profile() + + - Convert a collection of profiles to a collection of points: + >>> ds.argo.profile2point() + + - Filter measurements according to data mode: + >>> ds.argo.filter_data_mode() + + - Filter measurements according to QC flag values: + >>> ds.argo.filter_qc(QC_list=[1, 2], QC_fields='all') + + - Filter variables according OWC salinity calibration requirements: + >>> ds.argo.filter_scalib_pres(force='default') + + - Interpolate measurements on pressure levels: + >>> ds.argo.inter_std_levels(std_lev=[10., 500., 1000.]) + + - Group and reduce measurements by pressure bins: + >>> ds.argo.groupby_pressure_bins(bins=[0, 200., 500., 1000.]) +` + - Compute and add additional variables to the dataset: + >>> ds.argo.teos10(vlist='PV') + + - Preprocess data for OWC salinity calibration: + >>> ds.argo.create_float_source("output_folder") + + """ def __init__(self, xarray_obj): """ Init """ @@ -49,7 +83,7 @@ def __init__(self, xarray_obj): elif "N_POINTS" in self._dims: self._type = "point" else: - raise InvalidDatasetStructure("Argo dataset structure not recognised") + raise InvalidDatasetStructure("Argo dataset structure not recognised (dimensions N_PROF or N_POINTS not found)") if "PRES_ADJUSTED" in self._vars: self._mode = "expert" @@ -58,17 +92,121 @@ def __init__(self, xarray_obj): else: raise InvalidDatasetStructure("Argo dataset structure not recognised") + def __repr__(self): + # import xarray.core.formatting as xrf + # col_width = xrf._calculate_col_width(xrf._get_col_items(self._obj.variables)) + # max_rows = xr.core.options.OPTIONS["display_max_rows"] + + summary = ["".format(type(self._obj).__name__)] + if self._type == "profile": + summary.append("This is a collection of Argo profiles") + summary.append( + "N_PROF(%i) x N_LEVELS(%i) ~ N_POINTS(%i)" + % (self.N_PROF, self.N_LEVELS, self.N_POINTS) + ) + + elif self._type == "point": + summary.append("This is a collection of Argo points") + summary.append( + "N_POINTS(%i) ~ N_PROF(%i) x N_LEVELS(%i)" + % (self.N_POINTS, self.N_PROF, self.N_LEVELS) + ) + + # dims_start = xrf.pretty_print("Dimensions:", col_width) + # summary.append("{}({})".format(dims_start, xrf.dim_summary(self._obj))) + + return "\n".join(summary) + + @property + def N_PROF(self): + """Number of profiles""" + if self._type == "point": + dummy_argo_uid = xr.DataArray( + self.uid( + self._obj["PLATFORM_NUMBER"].values, + self._obj["CYCLE_NUMBER"].values, + self._obj["DIRECTION"].values, + ), + dims="N_POINTS", + coords={"N_POINTS": self._obj["N_POINTS"]}, + name="dummy_argo_uid", + ) + N_PROF = len(np.unique(dummy_argo_uid)) + else: + N_PROF = len(np.unique(self._obj["N_PROF"])) + return N_PROF + + @property + def N_LEVELS(self): + """Number of vertical levels""" + if self._type == "point": + dummy_argo_uid = xr.DataArray( + self.uid( + self._obj["PLATFORM_NUMBER"].values, + self._obj["CYCLE_NUMBER"].values, + self._obj["DIRECTION"].values, + ), + dims="N_POINTS", + coords={"N_POINTS": self._obj["N_POINTS"]}, + name="dummy_argo_uid", + ) + N_LEVELS = int( + xr.DataArray( + np.ones_like(self._obj["N_POINTS"].values), + dims="N_POINTS", + coords={"N_POINTS": self._obj["N_POINTS"]}, + ) + .groupby(dummy_argo_uid) + .sum() + .max() + .values + ) + else: + N_LEVELS = len(np.unique(self._obj["N_LEVELS"])) + return N_LEVELS + + @property + def N_POINTS(self): + """Number of measurement points""" + if self._type == "profile": + N_POINTS = self.N_PROF * self.N_LEVELS + else: + N_POINTS = len(np.unique(self._obj["N_POINTS"])) + return N_POINTS + def _add_history(self, txt): if "history" in self._obj.attrs: self._obj.attrs["history"] += "; %s" % txt else: self._obj.attrs["history"] = txt + def _where(self, cond, other=xr.core.dtypes.NA, drop: bool = False): + """ where that preserve dtypes of Argo fields + + Parameters + ---------- + cond : DataArray, Dataset, or callable + Locations at which to preserve this object's values. dtype must be `bool`. + If a callable, it must expect this object as its only parameter. + other : scalar, DataArray or Dataset, optional + Value to use for locations in this object where ``cond`` is False. + By default, these locations filled with NA. + drop : bool, optional + If True, coordinate labels that only correspond to False values of + the condition are dropped from the result. Mutually exclusive with + ``other``. + """ + this = self._obj.copy(deep=True) + this = this.where(cond, other=other, drop=drop) + this = this.argo.cast_types() + # this.argo._add_history("Modified with 'where' statement") + return this + def cast_types(self): # noqa: C901 - """ Make sure variables are of the appropriate types + """ Make sure variables are of the appropriate types according to Argo - This is hard coded, but should be retrieved from an API somewhere - Should be able to handle all possible variables encountered in the Argo dataset + This is hard coded, but should be retrieved from an API somewhere. + Should be able to handle all possible variables encountered in the Argo dataset. """ ds = self._obj @@ -132,8 +270,7 @@ def cast_this(da, type): da.attrs["casted"] = 1 except Exception: print("Oops!", sys.exc_info()[0], "occurred.") - print("Fail to cast: ", da.dtype, - "into:", type, "for: ", da.name) + print("Fail to cast: ", da.dtype, "into:", type, "for: ", da.name) print("Encountered unique values:", np.unique(da)) return da @@ -231,253 +368,6 @@ def cast_this_da(da): return ds - def filter_data_mode(self, keep_error: bool = True, errors: str = "raise"): # noqa: C901 - """ Filter variables according to their data mode - - This applies to and - - For data mode 'R' and 'A': keep (eg: 'PRES', 'TEMP' and 'PSAL') - For data mode 'D': keep (eg: 'PRES_ADJUSTED', 'TEMP_ADJUSTED' and 'PSAL_ADJUSTED') - - Parameters - ---------- - keep_error: bool, optional - If true (default) keep the measurements error fields or not. - - errors: {'raise','ignore'}, optional - If 'raise' (default), raises a InvalidDatasetStructure error if any of the expected dataset variables is - not found. If 'ignore', fails silently and return unmodified dataset. - - Returns - ------- - :class:`xarray.Dataset` - """ - if self._type != "point": - raise InvalidDatasetStructure( - "Method only available to a collection of points" - ) - - ######### - # Sub-functions - ######### - def safe_where_eq(xds, key, value): - # xds.where(xds[key] == value, drop=True) is not safe to empty time variables, cf issue #64 - try: - return xds.where(xds[key] == value, drop=True) - except ValueError as v: - if v.args[0] == ("zero-size array to reduction operation " - "minimum which has no identity"): - # A bug in xarray will cause a ValueError if trying to - # decode the times in a NetCDF file with length 0. - # See: - # https://github.com/pydata/xarray/issues/1329 - # https://github.com/euroargodev/argopy/issues/64 - # Here, we just need to return an empty array - TIME = xds['TIME'] - xds = xds.drop_vars('TIME') - xds = xds.where(xds[key] == value, drop=True) - xds['TIME'] = xr.DataArray(np.arange(len(xds['N_POINTS'])), dims='N_POINTS', - attrs=TIME.attrs).astype(np.datetime64) - xds = xds.set_coords('TIME') - return xds - - def ds_split_datamode(xds): - """ Create one dataset for each of the data_mode - - Split full dataset into 3 datasets - """ - # Real-time: - argo_r = safe_where_eq(xds, 'DATA_MODE', 'R') - for v in plist: - vname = v.upper() + "_ADJUSTED" - if vname in argo_r: - argo_r = argo_r.drop_vars(vname) - vname = v.upper() + "_ADJUSTED_QC" - if vname in argo_r: - argo_r = argo_r.drop_vars(vname) - vname = v.upper() + "_ADJUSTED_ERROR" - if vname in argo_r: - argo_r = argo_r.drop_vars(vname) - # Real-time adjusted: - argo_a = safe_where_eq(xds, 'DATA_MODE', 'A') - for v in plist: - vname = v.upper() - if vname in argo_a: - argo_a = argo_a.drop_vars(vname) - vname = v.upper() + "_QC" - if vname in argo_a: - argo_a = argo_a.drop_vars(vname) - # Delayed mode: - argo_d = safe_where_eq(xds, 'DATA_MODE', 'D') - - return argo_r, argo_a, argo_d - - def fill_adjusted_nan(ds, vname): - """Fill in the adjusted field with the non-adjusted wherever it is NaN - - Ensure to have values even for bad QC data in delayed mode - """ - ii = ds.where(np.isnan(ds[vname + "_ADJUSTED"]), drop=1)["N_POINTS"] - ds[vname + "_ADJUSTED"].loc[dict(N_POINTS=ii)] = ds[vname].loc[ - dict(N_POINTS=ii) - ] - return ds - - def merge_arrays(this_argo_r, this_argo_a, this_argo_d, this_vname): - """ Merge one variable from 3 DataArrays - - Based on xarray merge function with ’no_conflicts’: only values - which are not null in all datasets must be equal. The returned - dataset then contains the combination of all non-null values. - - Return a xarray.DataArray - """ - - def merge_this(a1, a2, a3): - return xr.merge((xr.merge((a1, a2)), a3)) - - DA = merge_this( - this_argo_r[this_vname], - this_argo_a[this_vname + "_ADJUSTED"].rename(this_vname), - this_argo_d[this_vname + "_ADJUSTED"].rename(this_vname), - ) - DA_QC = merge_this( - this_argo_r[this_vname + "_QC"], - this_argo_a[this_vname + "_ADJUSTED_QC"].rename(this_vname + "_QC"), - this_argo_d[this_vname + "_ADJUSTED_QC"].rename(this_vname + "_QC"), - ) - - if keep_error: - DA_ERROR = xr.merge(( - this_argo_a[this_vname + "_ADJUSTED_ERROR"].rename(this_vname + "_ERROR"), - this_argo_d[this_vname + "_ADJUSTED_ERROR"].rename(this_vname + "_ERROR"), - )) - DA = merge_this(DA, DA_QC, DA_ERROR) - else: - DA = xr.merge((DA, DA_QC)) - return DA - - ######### - # filter - ######### - ds = self._obj - if "DATA_MODE" not in ds: - if errors: - raise InvalidDatasetStructure( - "Method only available for dataset with a 'DATA_MODE' variable " - ) - else: - # todo should raise a warning instead ? - return ds - - # Define variables to filter: - possible_list = [ - "PRES", - "TEMP", - "PSAL", - "DOXY", - "CHLA", - "BBP532", - "BBP700", - "DOWNWELLING_PAR", - "DOWN_IRRADIANCE380", - "DOWN_IRRADIANCE412", - "DOWN_IRRADIANCE490", - ] - plist = [p for p in possible_list if p in ds.data_vars] - - # Create one dataset for each of the data_mode: - argo_r, argo_a, argo_d = ds_split_datamode(ds) - - # Fill in the adjusted field with the non-adjusted wherever it is NaN - for v in plist: - argo_d = fill_adjusted_nan(argo_d, v.upper()) - - # Drop QC fields in delayed mode dataset: - for v in plist: - vname = v.upper() - if vname in argo_d: - argo_d = argo_d.drop_vars(vname) - vname = v.upper() + "_QC" - if vname in argo_d: - argo_d = argo_d.drop_vars(vname) - - # Create new arrays with the appropriate variables: - vlist = [merge_arrays(argo_r, argo_a, argo_d, v) for v in plist] - - # Create final dataset by merging all available variables - final = xr.merge(vlist) - - # Merge with all other variables: - other_variables = list( - set([v for v in list(ds.data_vars) if "ADJUSTED" not in v]) - - set(list(final.data_vars)) - ) - # other_variables.remove('DATA_MODE') # Not necessary anymore - for p in other_variables: - final = xr.merge((final, ds[p])) - - final.attrs = ds.attrs - final.argo._add_history("Variables filtered according to DATA_MODE") - final = final[np.sort(final.data_vars)] - - # Cast data types and add attributes: - final = final.argo.cast_types() - - return final - - def filter_qc(self, QC_list=[1, 2], drop=True, mode="all", mask=False): # noqa: C901 - """ Filter data set according to QC values - - Mask the dataset for points where 'all' or 'any' of the QC fields has a value in the list of - integer QC flags. - - This method can return the filtered dataset or the filter mask. - """ - if self._type != "point": - raise InvalidDatasetStructure( - "Method only available to a collection of points" - ) - - if mode not in ["all", "any"]: - raise ValueError("Mode must 'all' or 'any'") - - this = self._obj - - # Extract QC fields: - QC_fields = [] - for v in this.data_vars: - if "QC" in v and "PROFILE" not in v: - QC_fields.append(v) - QC_fields = this[QC_fields] - for v in QC_fields.data_vars: - QC_fields[v] = QC_fields[v].astype(int) - - # Now apply filter - this_mask = xr.DataArray( - np.zeros_like(QC_fields["N_POINTS"]), - dims=["N_POINTS"], - coords={"N_POINTS": QC_fields["N_POINTS"]}, - ) - for v in QC_fields.data_vars: - for qc in QC_list: - this_mask += QC_fields[v] == qc - if mode == "all": - this_mask = this_mask == len(QC_fields) # all - else: - this_mask = this_mask >= 1 # any - - if not mask: - this = this.where(this_mask, drop=drop) - for v in this.data_vars: - if "QC" in v and "PROFILE" not in v: - this[v] = this[v].astype(int) - this.argo._add_history("Variables selected according to QC") - this = this.argo.cast_types() - return this - else: - return this_mask - def uid(self, wmo_or_uid, cyc=None, direction=None): """ UID encoder/decoder @@ -496,8 +386,9 @@ def uid(self, wmo_or_uid, cyc=None, direction=None): Examples -------- - unique_float_profile_id = uid(690024,13,'A') # Encode - wmo, cyc, drc = uid(unique_float_profile_id) # Decode + >>> unique_float_profile_id = uid(690024,13,'A') # Encode + >>> wmo, cyc, drc = uid(unique_float_profile_id) # Decode + """ le = preprocessing.LabelEncoder() le.fit(["A", "D"]) @@ -563,7 +454,6 @@ def fillvalue(da): name="dummy_argo_uid", ) N_PROF = len(np.unique(dummy_argo_uid)) - # that = this.groupby(dummy_argo_uid) N_LEVELS = int( xr.DataArray( @@ -648,7 +538,7 @@ def fillvalue(da): # Restore coordinate variables: new_ds = new_ds.set_coords([c for c in coords_list if c in new_ds]) - # Misc formating + # Misc formatting new_ds = new_ds.sortby("TIME") new_ds = new_ds.argo.cast_types() new_ds = new_ds[np.sort(new_ds.data_vars)] @@ -706,18 +596,429 @@ def profile2point(self): ds.argo._type = "point" return ds - def interp_std_levels(self, std_lev): - """ Returns a new dataset interpolated to new inputs levels + def filter_data_mode( # noqa: C901 + self, keep_error: bool = True, errors: str = "raise" + ): + """ Filter variables according to their data mode + + This filter applies to and + + For data mode 'R' and 'A': keep (eg: 'PRES', 'TEMP' and 'PSAL') + + For data mode 'D': keep (eg: 'PRES_ADJUSTED', 'TEMP_ADJUSTED' and 'PSAL_ADJUSTED') + + Since ADJUSTED variables are not required anymore after the filter, all *ADJUSTED* variables are dropped in + order to avoid confusion wrt variable content. DATA_MODE is preserved for the record. Parameters ---------- - list or np.array - Standard levels used for interpolation + keep_error: bool, optional + If true (default) keep the measurements error fields or not. + + errors: {'raise','ignore'}, optional + If 'raise' (default), raises a InvalidDatasetStructure error if any of the expected dataset variables is + not found. If 'ignore', fails silently and return unmodified dataset. + + Returns + ------- + :class:`xarray.Dataset` + """ + if self._type != "point": + raise InvalidDatasetStructure( + "Method only available to a collection of points" + ) + + ######### + # Sub-functions + ######### + def safe_where_eq(xds, key, value): + # xds.where(xds[key] == value, drop=True) is not safe to empty time variables, cf issue #64 + try: + return xds.where(xds[key] == value, drop=True) + except ValueError as v: + if v.args[0] == ( + "zero-size array to reduction operation " + "minimum which has no identity" + ): + # A bug in xarray will cause a ValueError if trying to + # decode the times in a NetCDF file with length 0. + # See: + # https://github.com/pydata/xarray/issues/1329 + # https://github.com/euroargodev/argopy/issues/64 + # Here, we just need to return an empty array + TIME = xds["TIME"] + xds = xds.drop_vars("TIME") + xds = xds.where(xds[key] == value, drop=True) + xds["TIME"] = xr.DataArray( + np.arange(len(xds["N_POINTS"])), + dims="N_POINTS", + attrs=TIME.attrs, + ).astype(np.datetime64) + xds = xds.set_coords("TIME") + return xds + + def ds_split_datamode(xds): + """ Create one dataset for each of the data_mode + + Split full dataset into 3 datasets + """ + # Real-time: + argo_r = safe_where_eq(xds, "DATA_MODE", "R") + for v in plist: + vname = v.upper() + "_ADJUSTED" + if vname in argo_r: + argo_r = argo_r.drop_vars(vname) + vname = v.upper() + "_ADJUSTED_QC" + if vname in argo_r: + argo_r = argo_r.drop_vars(vname) + vname = v.upper() + "_ADJUSTED_ERROR" + if vname in argo_r: + argo_r = argo_r.drop_vars(vname) + # Real-time adjusted: + argo_a = safe_where_eq(xds, "DATA_MODE", "A") + for v in plist: + vname = v.upper() + if vname in argo_a: + argo_a = argo_a.drop_vars(vname) + vname = v.upper() + "_QC" + if vname in argo_a: + argo_a = argo_a.drop_vars(vname) + # Delayed mode: + argo_d = safe_where_eq(xds, "DATA_MODE", "D") + + return argo_r, argo_a, argo_d + + def fill_adjusted_nan(this_ds, vname): + """Fill in the adjusted field with the non-adjusted wherever it is NaN + + Ensure to have values even for bad QC data in delayed mode + """ + ii = this_ds.where(np.isnan(this_ds[vname + "_ADJUSTED"]), drop=1)[ + "N_POINTS" + ] + this_ds[vname + "_ADJUSTED"].loc[dict(N_POINTS=ii)] = this_ds[vname].loc[ + dict(N_POINTS=ii) + ] + return this_ds + + def merge_arrays(this_argo_r, this_argo_a, this_argo_d, this_vname): + """ Merge one variable from 3 DataArrays + + Based on xarray merge function with ’no_conflicts’: only values + which are not null in all datasets must be equal. The returned + dataset then contains the combination of all non-null values. + + Return a xarray.DataArray + """ + + def merge_this(a1, a2, a3): + return xr.merge((xr.merge((a1, a2)), a3)) + + DA = merge_this( + this_argo_r[this_vname], + this_argo_a[this_vname + "_ADJUSTED"].rename(this_vname), + this_argo_d[this_vname + "_ADJUSTED"].rename(this_vname), + ) + DA_QC = merge_this( + this_argo_r[this_vname + "_QC"], + this_argo_a[this_vname + "_ADJUSTED_QC"].rename(this_vname + "_QC"), + this_argo_d[this_vname + "_ADJUSTED_QC"].rename(this_vname + "_QC"), + ) + + if keep_error: + DA_ERROR = xr.merge( + ( + this_argo_a[this_vname + "_ADJUSTED_ERROR"].rename( + this_vname + "_ERROR" + ), + this_argo_d[this_vname + "_ADJUSTED_ERROR"].rename( + this_vname + "_ERROR" + ), + ) + ) + DA = merge_this(DA, DA_QC, DA_ERROR) + else: + DA = xr.merge((DA, DA_QC)) + return DA + + ######### + # filter + ######### + ds = self._obj + if "DATA_MODE" not in ds: + if errors: + raise InvalidDatasetStructure( + "Method only available for dataset with a 'DATA_MODE' variable " + ) + else: + # todo should raise a warning instead ? + return ds + + # Define variables to filter: + possible_list = [ + "PRES", + "TEMP", + "PSAL", + "DOXY", + "CHLA", + "BBP532", + "BBP700", + "DOWNWELLING_PAR", + "DOWN_IRRADIANCE380", + "DOWN_IRRADIANCE412", + "DOWN_IRRADIANCE490", + ] + plist = [p for p in possible_list if p in ds.data_vars] + + # Create one dataset for each of the data_mode: + argo_r, argo_a, argo_d = ds_split_datamode(ds) + + # Fill in the adjusted field with the non-adjusted wherever it is NaN + for v in plist: + argo_d = fill_adjusted_nan(argo_d, v.upper()) + + # Drop QC fields in delayed mode dataset: + for v in plist: + vname = v.upper() + if vname in argo_d: + argo_d = argo_d.drop_vars(vname) + vname = v.upper() + "_QC" + if vname in argo_d: + argo_d = argo_d.drop_vars(vname) + + # Create new arrays with the appropriate variables: + vlist = [merge_arrays(argo_r, argo_a, argo_d, v) for v in plist] + + # Create final dataset by merging all available variables + final = xr.merge(vlist) + + # Merge with all other variables: + other_variables = list( + set([v for v in list(ds.data_vars) if "ADJUSTED" not in v]) + - set(list(final.data_vars)) + ) + # other_variables.remove('DATA_MODE') # Not necessary anymore + for p in other_variables: + final = xr.merge((final, ds[p])) + + final.attrs = ds.attrs + final.argo._add_history("Variables filtered according to DATA_MODE") + final = final[np.sort(final.data_vars)] + + # Cast data types and add attributes: + final = final.argo.cast_types() + + return final + + def filter_qc( # noqa: C901 + self, QC_list=[1, 2], QC_fields="all", drop=True, mode="all", mask=False + ): + """ Filter data set according to QC values + + Filter the dataset to keep points where ``all`` or ``any`` of the QC fields has a value in the list of + integer QC flags. + + This method can return the filtered dataset or the filter mask. + + Parameters + ---------- + QC_list: list(int) + List of QC flag values (integers) to keep + QC_fields: 'all' or list(str) + List of QC fields to consider to apply the filter. By default we use all available QC fields + drop: bool + Drop values not matching the QC filter, default is True + mode: str + Must be ``all`` (default) or ``any``. Boolean operator on QC values: should we keep points + matching ``all`` QC fields or 'any' one of them. + mask: bool + ``False`` by default. Determine if we should return the QC mask or the filtered dataset. + + Returns + ------- + :class:`xarray.Dataset` + """ + if self._type != "point": + raise InvalidDatasetStructure( + "Method only available to a collection of points" + ) + + if mode not in ["all", "any"]: + raise ValueError("Mode must be 'all' or 'any'") + + # Make sure we deal with a list of integers: + if not isinstance(QC_list, list): + if isinstance(QC_list, np.ndarray): + QC_list = list(QC_list) + else: + QC_list = [QC_list] + QC_list = [abs(int(qc)) for qc in QC_list] + + this = self._obj + + # Extract QC fields: + if isinstance(QC_fields, str) and QC_fields == "all": + QC_fields = [] + for v in this.data_vars: + if "QC" in v and "PROFILE" not in v: + QC_fields.append(v) + elif is_list_of_strings(QC_fields): + for v in QC_fields: + if v not in this.data_vars: + raise ValueError( + "%s not found in this dataset while trying to apply QC filter" + % v + ) + else: + raise ValueError( + "Invalid content for parameter 'QC_fields'. Use 'all' or a list of strings" + ) + + log.debug( + "filter_qc: Filtering dataset to keep points with QC in %s for '%s' fields in %s" + % (QC_list, mode, ",".join(QC_fields)) + ) + # log.debug("filter_qc: Filter applied to '%s' of the fields: %s" % (mode, ",".join(QC_fields))) + + QC_fields = this[QC_fields] + for v in QC_fields.data_vars: + QC_fields[v] = QC_fields[v].astype(int) + + # Now apply filter + this_mask = xr.DataArray( + np.zeros_like(QC_fields["N_POINTS"]), + dims=["N_POINTS"], + coords={"N_POINTS": QC_fields["N_POINTS"]}, + ) + for v in QC_fields.data_vars: + for qc_value in QC_list: + this_mask += QC_fields[v] == qc_value + if mode == "all": + this_mask = this_mask == len(QC_fields) # all + else: + this_mask = this_mask >= 1 # any + + if not mask: + this = this.argo._where(this_mask, drop=drop) + this.argo._add_history("Variables selected according to QC") + # this = this.argo.cast_types() + return this + else: + return this_mask + + def filter_scalib_pres(self, force: str = "default", inplace: bool = True): + """ Filter variables according to OWC salinity calibration software requirements + + By default: this filter will return a dataset with raw PRES, PSAL and TEMP; and if PRES is adjusted, + PRES variable will be replaced by PRES_ADJUSTED. + + With option force='raw', you can force the filter to return a dataset with raw PRES, PSAL and TEMP whether + PRES is adjusted or not. + + With option force='adjusted', you can force the filter to return a dataset where PRES/PSAL and TEMP replaced + with adjusted variables: PRES_ADJUSTED, PSAL_ADJUSTED, TEMP_ADJUSTED. + + Since ADJUSTED variables are not required anymore after the filter, all *ADJUSTED* variables are dropped in + order to avoid confusion wrt variable content. + + Parameters + ---------- + force: str + Use force='default' to load PRES/PSAL/TEMP or PRES_ADJUSTED/PSAL/TEMP according to PRES_ADJUSTED + filled or not. + + Use force='raw' to force load of PRES/PSAL/TEMP + + Use force='adjusted' to force load of PRES_ADJUSTED/PSAL_ADJUSTED/TEMP_ADJUSTED + inplace: boolean, True by default + If True, return the filtered input :class:`xarray.Dataset` + + If False, return a new :class:`xarray.Dataset` + + Returns + ------- + :class:`xarray.Dataset` + """ + if not with_gsw: + raise ModuleNotFoundError("This functionality requires the gsw library") + + this = self._obj + + # Will work with a collection of points + to_profile = False + if this.argo._type == "profile": + to_profile = True + this = this.argo.profile2point() + + if force == "raw": + # PRES/PSAL/TEMP are not changed + # All ADJUSTED variables are removed (not required anymore, avoid confusion with variable content): + this = this.drop_vars([v for v in this.data_vars if "ADJUSTED" in v]) + elif force == "adjusted": + # PRES/PSAL/TEMP are replaced by PRES_ADJUSTED/PSAL_ADJUSTED/TEMP_ADJUSTED + for v in ["PRES", "PSAL", "TEMP"]: + if "%s_ADJUSTED" % v in this.data_vars: + this[v] = this["%s_ADJUSTED" % v] + this["%s_ERROR" % v] = this["%s_ADJUSTED_ERROR" % v] + this["%s_QC" % v] = this["%s_ADJUSTED_QC" % v] + else: + raise InvalidDatasetStructure( + "%s_ADJUSTED not in this dataset. Tip: fetch data in 'expert' mode" + % v + ) + # All ADJUSTED variables are removed (not required anymore, avoid confusion with variable content): + this = this.drop_vars([v for v in this.data_vars if "ADJUSTED" in v]) + else: + # In default mode, we just need to do something if PRES_ADJUSTED is different from PRES, meaning + # pressure was adjusted: + if np.any(this["PRES_ADJUSTED"] == this["PRES"]): # Yes + # We need to recompute salinity with adjusted pressur, so + # Compute raw conductivity from raw salinity and raw pressure: + cndc = gsw.C_from_SP( + this["PSAL"].values, this["TEMP"].values, this["PRES"].values + ) + # Then recompute salinity with adjusted pressure: + sp = gsw.SP_from_C( + cndc, this["TEMP"].values, this["PRES_ADJUSTED"].values + ) + # Now fill in filtered variables (no need to change TEMP): + this["PRES"] = this["PRES_ADJUSTED"] + this["PRES_QC"] = this["PRES_ADJUSTED_QC"] + this["PSAL"].values = sp + + # Finally drop everything not required anymore: + this = this.drop_vars([v for v in this.data_vars if "ADJUSTED" in v]) + + # Manage output: + this.argo._add_history("Variables filtered according to OWC methodology") + this = this[np.sort(this.data_vars)] + if to_profile: + this = this.argo.point2profile() + + # Manage output: + if inplace: + self._obj = this + return self._obj + else: + return this + + def interp_std_levels(self, + std_lev: list or np.array, + axis: str = 'PRES'): + """ Interpolate measurements to standard pressure levels + + Parameters + ---------- + std_lev: list or np.array + Standard pressure levels used for interpolation. It has to be 1-dimensional and monotonic. + axis: str, default: ``PRES`` + The dataset variable to use as pressure axis. This could be ``PRES`` or ``PRES_ADJUSTED``. Returns ------- :class:`xarray.Dataset` """ + this_dsp = self._obj if (type(std_lev) is np.ndarray) | (type(std_lev) is list): std_lev = np.array(std_lev) @@ -730,27 +1031,36 @@ def interp_std_levels(self, std_lev): "Standard levels must be a list or a numpy array of positive and sorted values" ) + if axis not in ['PRES', 'PRES_ADJUSTED']: + raise ValueError("'axis' option must be 'PRES' or 'PRES_ADJUSTED'") + if self._type != "profile": raise InvalidDatasetStructure( "Method only available for a collection of profiles" ) - ds = self._obj + # Will work with a collection of profiles: + # to_point = False + # if this_ds.argo._type == "point": + # to_point = True + # this_dsp = this_ds.argo.point2profile() + # else: + # this_dsp = this_ds.copy(deep=True) # Selecting profiles that have a max(pressure) > max(std_lev) to avoid extrapolation in that direction # For levels < min(pressure), first level values of the profile are extended to surface. - i1 = ds["PRES"].max("N_LEVELS") >= std_lev[-1] - dsp = ds.where(i1, drop=True) + i1 = this_dsp[axis].max("N_LEVELS") >= std_lev[-1] + this_dsp = this_dsp.where(i1, drop=True) # check if any profile is left, ie if any profile match the requested depth - if len(dsp["N_PROF"]) == 0: - raise Warning( + if len(this_dsp["N_PROF"]) == 0: + warnings.warn( "None of the profiles can be interpolated (not reaching the requested depth range)." ) return None # add new vertical dimensions, this has to be in the datasets to apply ufunc later - dsp["Z_LEVELS"] = xr.DataArray(std_lev, dims={"Z_LEVELS": std_lev}) + this_dsp["Z_LEVELS"] = xr.DataArray(std_lev, dims={"Z_LEVELS": std_lev}) # init ds_out = xr.Dataset() @@ -758,17 +1068,17 @@ def interp_std_levels(self, std_lev): # vars to interpolate datavars = [ dv - for dv in list(dsp.variables) - if set(["N_LEVELS", "N_PROF"]) == set(dsp[dv].dims) + for dv in list(this_dsp.variables) + if set(["N_LEVELS", "N_PROF"]) == set(this_dsp[dv].dims) and "QC" not in dv and "ERROR" not in dv ] # coords - coords = [dv for dv in list(dsp.coords)] + coords = [dv for dv in list(this_dsp.coords)] # vars depending on N_PROF only solovars = [ dv - for dv in list(dsp.variables) + for dv in list(this_dsp.variables) if dv not in datavars and dv not in coords and "QC" not in dv @@ -777,32 +1087,300 @@ def interp_std_levels(self, std_lev): for dv in datavars: ds_out[dv] = linear_interpolation_remap( - dsp.PRES, - dsp[dv], - dsp["Z_LEVELS"], + this_dsp[axis], + this_dsp[dv], + this_dsp["Z_LEVELS"], z_dim="N_LEVELS", z_regridded_dim="Z_LEVELS", ) - ds_out = ds_out.rename({"remapped": "PRES_INTERPOLATED"}) + ds_out = ds_out.rename({"remapped": "%s_INTERPOLATED" % axis}) for sv in solovars: - ds_out[sv] = dsp[sv] + ds_out[sv] = this_dsp[sv] for co in coords: - ds_out.coords[co] = dsp[co] + ds_out.coords[co] = this_dsp[co] ds_out = ds_out.drop_vars(["N_LEVELS", "Z_LEVELS"]) ds_out = ds_out[np.sort(ds_out.data_vars)] + ds_out = ds_out.argo.cast_types() ds_out.attrs = self.attrs # Preserve original attributes - ds_out.argo._add_history("Interpolated on standard levels") + ds_out.argo._add_history("Interpolated on standard %s levels" % axis) + + # if to_point: + # ds_out = ds_out.argo.profile2point() return ds_out + def groupby_pressure_bins(self, # noqa: C901 + bins: list or np.array, + axis: str = 'PRES', + right: bool = False, + select: str = 'deep', + squeeze: bool = True, + merge: bool = True): + """ Group measurements by pressure bins + + This method can be used to subsample and align an irregular dataset (pressure not being similar in all profiles) + on a set of pressure bins. The output dataset could then be used to perform statistics along the ``N_PROF`` dimension + because ``N_LEVELS`` will corresponds to similar pressure bins, while avoiding to interpolate data. + + Parameters + ---------- + bins: list or np.array, + Array of bins. It has to be 1-dimensional and monotonic. Bins of data are localised using values from + options `axis` (default: ``PRES``) and `right` (default: ``False``), see below. + axis: str, default: ``PRES`` + The dataset variable to use as pressure axis. This could be ``PRES`` or ``PRES_ADJUSTED`` + right: bool, default: False + Indicating whether the bin intervals include the right or the left bin edge. Default behavior is + (right==False) indicating that the interval does not include the right edge. The left bin end is open + in this case, i.e., bins[i-1] <= x < bins[i] is the default behavior for monotonically increasing bins. + Note the ``merge`` option is intended to work only for the default ``right=False``. + select: {'deep','shallow','middle','random','min','max','mean','median'}, default: 'deep' + The value selection method for bins. + + This selection can be based on values at the pressure axis level with: ``deep`` (default), ``shallow``, + ``middle``, ``random``. For instance, ``select='deep'`` will lead to the value + returned for a bin to be taken at the deepest pressure level in the bin. + + Or this selection can be based on statistics of measurements in a bin. Stats available are: ``min``, ``max``, + ``mean``, ``median``. For instance ``select='mean'`` will lead to the value returned for a bin to be the mean of + all measurements in the bin. + squeeze: bool, default: True + Squeeze from the output bin levels without measurements. + merge: bool, default: True + Optimize the output bins axis size by merging levels with/without data. The pressure bins axis is modified + accordingly. This means that the return ``STD_PRES_BINS`` axis has not necessarily the same size as + the input ``bins``. + + Returns + ------- + :class:`xarray.Dataset` + + See Also + -------- + :class:`numpy.digitize`, :class:`argopy.utilities.groupby_remap` + """ + this_ds = self._obj + + if (type(bins) is np.ndarray) | (type(bins) is list): + bins = np.array(bins) + if (np.any(sorted(bins) != bins)) | (np.any(bins < 0)): + raise ValueError( + "Standard bins must be a list or a numpy array of positive and sorted values" + ) + else: + raise ValueError( + "Standard bins must be a list or a numpy array of positive and sorted values" + ) + + if axis not in ['PRES', 'PRES_ADJUSTED']: + raise ValueError("'axis' option must be 'PRES' or 'PRES_ADJUSTED'") + + # Will work with a collection of profiles: + to_point = False + if this_ds.argo._type == "point": + to_point = True + this_dsp = this_ds.argo.point2profile() + else: + this_dsp = this_ds.copy(deep=True) + + # Adjust bins axis if we possibly have to squeeze empty bins: + h, bin_edges = np.histogram(np.unique(np.round(this_dsp[axis], 1)), bins) + N_bins_empty = len(np.where(h == 0)[0]) + # check if any profile is left, ie if any profile match the requested bins + if N_bins_empty == len(h): + warnings.warn( + "None of the profiles can be aligned (pressure values out of bins range)." + ) + return None + if N_bins_empty > 0 and squeeze: + log.debug( + "bins axis was squeezed to full bins only (%i bins found empty out of %i)" % (N_bins_empty, len(bins))) + bins = bins[np.where(h > 0)] + + def replace_i_level_values(this_da, this_i_level, new_values_along_profiles): + """ Convenience fct to update only one level of a ["N_PROF", "N_LEVELS"] xr.DataArray""" + if this_da.dims == ("N_PROF", "N_LEVELS"): + values = this_da.values + values[:, this_i_level] = new_values_along_profiles + this_da.values = values + # else: + # raise ValueError("Array not with expected ['N_PROF', 'N_LEVELS'] shape") + return this_da + + def nanmerge(x, y): + """ Merge two 1D array + + Given 2 arrays x, y of 1 dimension, return a new array with: + - x values where x is not NaN + - y values where x is NaN + """ + z = x.copy() + for i, v in enumerate(x): + if np.isnan(v): + z[i] = y[i] + return z + + merged_is_nan = lambda l1, l2: len(np.unique(np.where(np.isnan(l1.values + l2.values)))) == len(l1) # noqa: E731 + + def merge_bin_matching_levels(this_ds: xr.Dataset) -> xr.Dataset: + """ Levels merger of type 'bins' value + + Merge pair of lines with the following pattern: + nan, VAL, VAL, nan, VAL, VAL + BINVAL, nan, nan, BINVAL, nan, nan + + This pattern is due to the bins definition: bins[i] <= x < bins[i+1] + + Parameters + ---------- + :class:`xarray.Dataset` + + Returns + ------- + :class:`xarray.Dataset` + """ + new_ds = this_ds.copy(deep=True) + N_LEVELS = new_ds.argo.N_LEVELS + idel = [] + for i_level in range(0, N_LEVELS - 1 - 1): + this_ds_level = this_ds[axis].isel(N_LEVELS=i_level) + this_ds_dw = this_ds[axis].isel(N_LEVELS=i_level + 1) + pres_dw = np.unique(this_ds_dw[~np.isnan(this_ds_dw)]) + if len(pres_dw) == 1 \ + and pres_dw[0] in this_ds["STD_%s_BINS" % axis] \ + and merged_is_nan(this_ds_level, this_ds_dw): + new_values = nanmerge(this_ds_dw.values, this_ds_level.values) + replace_i_level_values(new_ds[axis], i_level, new_values) + idel.append(i_level + 1) + + ikeep = [i for i in np.arange(0, new_ds.argo.N_LEVELS - 1) if i not in idel] + new_ds = new_ds.isel(N_LEVELS=ikeep) + new_ds = new_ds.assign_coords({'N_LEVELS': np.arange(0, len(new_ds['N_LEVELS']))}) + val = new_ds[axis].values + new_ds[axis].values = np.where(val == 0, np.nan, val) + return new_ds + + def merge_all_matching_levels(this_ds: xr.Dataset) -> xr.Dataset: + """ Levels merger + + Merge any pair of levels with a "matching" pattern like this: + VAL, VAL, VAL, nan, nan, VAL, nan, nan, + nan, nan, nan, VAL, VAL, nan, VAL, nan + + This pattern is due to a strict application of the bins definition. + But when bins are small (eg: 10db), many bins can have no data. + This has the consequence to change the size and number of the bins. + + Parameters + ---------- + :class:`xarray.Dataset` + + Returns + ------- + :class:`xarray.Dataset` + """ + new_ds = this_ds.copy(deep=True) + N_LEVELS = new_ds.argo.N_LEVELS + idel = [] + for i_level in range(0, N_LEVELS): + if i_level + 1 < N_LEVELS: + this_ds_level = this_ds[axis].isel(N_LEVELS=i_level) + this_ds_dw = this_ds[axis].isel(N_LEVELS=i_level + 1) + if merged_is_nan(this_ds_level, this_ds_dw): + new_values = nanmerge(this_ds_level.values, this_ds_dw.values) + replace_i_level_values(new_ds[axis], i_level, new_values) + idel.append(i_level + 1) + + ikeep = [i for i in np.arange(0, new_ds.argo.N_LEVELS - 1) if i not in idel] + new_ds = new_ds.isel(N_LEVELS=ikeep) + new_ds = new_ds.assign_coords({'N_LEVELS': np.arange(0, len(new_ds['N_LEVELS']))}) + val = new_ds[axis].values + new_ds[axis].values = np.where(val == 0, np.nan, val) + return new_ds + + # init + new_ds = [] + + # add new vertical dimensions, this has to be in the datasets to apply ufunc later + this_dsp["Z_LEVELS"] = xr.DataArray(bins, dims={"Z_LEVELS": bins}) + + # vars to align + if select in ["shallow", "deep", "middle", "random"]: + datavars = [ + dv + for dv in list(this_dsp.data_vars) + if set(["N_LEVELS", "N_PROF"]) == set(this_dsp[dv].dims) + ] + else: + datavars = [ + dv + for dv in list(this_dsp.data_vars) + if set(["N_LEVELS", "N_PROF"]) == set(this_dsp[dv].dims) + and "QC" not in dv + and "ERROR" not in dv + ] + + # All other variables: + othervars = [ + dv + for dv in list(this_dsp.variables) + if dv not in datavars + and dv not in this_dsp.coords + ] + + # Sub-sample and align: + for dv in datavars: + v = groupby_remap( + this_dsp[axis], + this_dsp[dv], + this_dsp["Z_LEVELS"], + z_dim="N_LEVELS", + z_regridded_dim="Z_LEVELS", + select=select, + right=right + ) + v.name = this_dsp[dv].name + v.attrs = this_dsp[dv].attrs + new_ds.append(v) + + # Finish + new_ds = xr.merge(new_ds) + new_ds = new_ds.rename({"remapped": "N_LEVELS"}) + new_ds = new_ds.assign_coords({'N_LEVELS': range(0, len(new_ds['N_LEVELS']))}) + # new_ds["STD_%s_BINS" % axis] = new_ds['N_LEVELS'] + new_ds["STD_%s_BINS" % axis] = xr.DataArray(bins, + dims=['N_LEVELS'], + attrs={'Comment': + "Range of bins is: bins[i] <= x < bins[i+1] for i=[0,N_LEVELS-2]\n" + "Last bins is bins[N_LEVELS-1] <= x"} + ) + new_ds = new_ds.set_coords("STD_%s_BINS" % axis) + new_ds.attrs = this_ds.attrs + + for dv in othervars: + new_ds[dv] = this_dsp[dv] + + new_ds = new_ds.argo.cast_types() + new_ds = new_ds[np.sort(new_ds.data_vars)] + new_ds.attrs = this_dsp.attrs # Preserve original attributes + new_ds.argo._add_history("Sub-sampled and re-aligned on standard bins") + + if merge: + new_ds = merge_bin_matching_levels(new_ds) + new_ds = merge_all_matching_levels(new_ds) + + if to_point: + new_ds = new_ds.argo.profile2point() + + return new_ds + def teos10( # noqa: C901 self, vlist: list = ["SA", "CT", "SIG0", "N2", "PV", "PTEMP"], - inplace: bool = True, - ): + inplace: bool = True): """ Add TEOS10 variables to the dataset By default, adds: 'SA', 'CT' @@ -817,29 +1395,31 @@ def teos10( # noqa: C901 List with the name of variables to add. Must be a list containing one or more of the following string values: - * `"SA"` + * ``SA`` Adds an absolute salinity variable - * `"CT"` + * ``CT`` Adds a conservative temperature variable - * `"SIG0"` + * ``SIG0`` Adds a potential density anomaly variable referenced to 0 dbar - * `"N2"` + * ``N2`` Adds a buoyancy (Brunt-Vaisala) frequency squared variable. This variable has been regridded to the original pressure levels in the Dataset using a linear interpolation. - * `"PV"` + * ``PV`` Adds a planetary vorticity variable calculated from :math:`\\frac{f N^2}{\\text{gravity}}`. This is not a TEOS-10 variable from the gsw toolbox, but is provided for convenience. This variable has been regridded to the original pressure levels in the Dataset using a linear interpolation. - * `"PTEMP"` - Adds a potential temperature variable - * `"SOUND_SPEED"` - Adds a sound speed variable + * ``PTEMP`` + Add potential temperature + * ``SOUND_SPEED`` + Add sound speed + * ``CNDC`` + Add Electrical Conductivity - inplace: boolean, True by default - If True, return the input :class:`xarray.Dataset` with new TEOS10 variables + inplace: boolean, True by default + * If True, return the input :class:`xarray.Dataset` with new TEOS10 variables added as a new :class:`xarray.DataArray`. - If False, return a :class:`xarray.Dataset` with new TEOS10 variables + * If False, return a :class:`xarray.Dataset` with new TEOS10 variables Returns ------- @@ -848,11 +1428,17 @@ def teos10( # noqa: C901 if not with_gsw: raise ModuleNotFoundError("This functionality requires the gsw library") - allowed = ['SA', 'CT', 'SIG0', 'N2', 'PV', 'PTEMP', 'SOUND_SPEED'] + allowed = ["SA", "CT", "SIG0", "N2", "PV", "PTEMP", "SOUND_SPEED", "CNDC"] if any(var not in allowed for var in vlist): - raise ValueError(f"vlist must be a subset of {allowed}, instead found {vlist}") + raise ValueError( + f"vlist must be a subset of {allowed}, instead found {vlist}" + ) - warnings.warn("Default variables will be reduced to 'SA' and 'CT' in 0.1.9", category=FutureWarning) + if is_list_equal(vlist, ["SA", "CT", "SIG0", "N2", "PV", "PTEMP"]): + warnings.warn( + "Default variables will be reduced to 'SA' and 'CT' in 0.1.9", + category=FutureWarning, + ) this = self._obj @@ -862,11 +1448,11 @@ def teos10( # noqa: C901 this = this.argo.profile2point() # Get base variables as numpy arrays: - psal = this['PSAL'].values - temp = this['TEMP'].values - pres = this['PRES'].values - lon = this['LONGITUDE'].values - lat = this['LATITUDE'].values + psal = this["PSAL"].values + temp = this["TEMP"].values + pres = this["PRES"].values + lon = this["LONGITUDE"].values + lat = this["LATITUDE"].values # Coriolis f = gsw.f(lat) @@ -885,6 +1471,10 @@ def teos10( # noqa: C901 if "SIG0" in vlist: sig0 = gsw.sigma0(sa, ct) + # Electrical conductivity + if "CNDC" in vlist: + cndc = gsw.C_from_SP(psal, temp, pres) + # N2 if "N2" in vlist or "PV" in vlist: n2_mid, p_mid = gsw.Nsquared(sa, ct, pres, lat) @@ -903,56 +1493,65 @@ def mid(x): pv = f * n2 / gsw.grav(lat, pres) # Sound Speed: - if 'SOUND_SPEED' in vlist: + if "SOUND_SPEED" in vlist: cs = gsw.sound_speed(sa, ct, pres) # Back to the dataset: that = [] - if 'SA' in vlist: - SA = xr.DataArray(sa, coords=this['PSAL'].coords, name='SA') - SA.attrs['long_name'] = 'Absolute Salinity' - SA.attrs['standard_name'] = 'sea_water_absolute_salinity' - SA.attrs['unit'] = 'g/kg' + if "SA" in vlist: + SA = xr.DataArray(sa, coords=this["PSAL"].coords, name="SA") + SA.attrs["long_name"] = "Absolute Salinity" + SA.attrs["standard_name"] = "sea_water_absolute_salinity" + SA.attrs["unit"] = "g/kg" that.append(SA) - if 'CT' in vlist: - CT = xr.DataArray(ct, coords=this['TEMP'].coords, name='CT') - CT.attrs['long_name'] = 'Conservative Temperature' - CT.attrs['standard_name'] = 'sea_water_conservative_temperature' - CT.attrs['unit'] = 'degC' + if "CT" in vlist: + CT = xr.DataArray(ct, coords=this["TEMP"].coords, name="CT") + CT.attrs["long_name"] = "Conservative Temperature" + CT.attrs["standard_name"] = "sea_water_conservative_temperature" + CT.attrs["unit"] = "degC" that.append(CT) - if 'SIG0' in vlist: - SIG0 = xr.DataArray(sig0, coords=this['TEMP'].coords, name='SIG0') - SIG0.attrs['long_name'] = 'Potential density anomaly with reference pressure of 0 dbar' - SIG0.attrs['standard_name'] = 'sea_water_sigma_theta' - SIG0.attrs['unit'] = 'kg/m^3' + if "SIG0" in vlist: + SIG0 = xr.DataArray(sig0, coords=this["TEMP"].coords, name="SIG0") + SIG0.attrs[ + "long_name" + ] = "Potential density anomaly with reference pressure of 0 dbar" + SIG0.attrs["standard_name"] = "sea_water_sigma_theta" + SIG0.attrs["unit"] = "kg/m^3" that.append(SIG0) - if 'N2' in vlist: - N2 = xr.DataArray(n2, coords=this['TEMP'].coords, name='N2') - N2.attrs['long_name'] = 'Squared buoyancy frequency' - N2.attrs['unit'] = '1/s^2' + if "CNDC" in vlist: + CNDC = xr.DataArray(cndc, coords=this["TEMP"].coords, name="CNDC") + CNDC.attrs["long_name"] = "Electrical Conductivity" + CNDC.attrs["standard_name"] = "sea_water_electrical_conductivity" + CNDC.attrs["unit"] = "mS/cm" + that.append(CNDC) + + if "N2" in vlist: + N2 = xr.DataArray(n2, coords=this["TEMP"].coords, name="N2") + N2.attrs["long_name"] = "Squared buoyancy frequency" + N2.attrs["unit"] = "1/s^2" that.append(N2) - if 'PV' in vlist: - PV = xr.DataArray(pv, coords=this['TEMP'].coords, name='PV') - PV.attrs['long_name'] = 'Planetary Potential Vorticity' - PV.attrs['unit'] = '1/m/s' + if "PV" in vlist: + PV = xr.DataArray(pv, coords=this["TEMP"].coords, name="PV") + PV.attrs["long_name"] = "Planetary Potential Vorticity" + PV.attrs["unit"] = "1/m/s" that.append(PV) - if 'PTEMP' in vlist: - PTEMP = xr.DataArray(pt, coords=this['TEMP'].coords, name='PTEMP') - PTEMP.attrs['long_name'] = 'Potential Temperature' - PTEMP.attrs['standard_name'] = 'sea_water_potential_temperature' - PTEMP.attrs['unit'] = 'degC' + if "PTEMP" in vlist: + PTEMP = xr.DataArray(pt, coords=this["TEMP"].coords, name="PTEMP") + PTEMP.attrs["long_name"] = "Potential Temperature" + PTEMP.attrs["standard_name"] = "sea_water_potential_temperature" + PTEMP.attrs["unit"] = "degC" that.append(PTEMP) - if 'SOUND_SPEED' in vlist: - CS = xr.DataArray(cs, coords=this['TEMP'].coords, name='SOUND_SPEED') - CS.attrs['long_name'] = 'Speed of sound' - CS.attrs['standard_name'] = 'speed_of_sound_in_sea_water' - CS.attrs['unit'] = 'm/s' + if "SOUND_SPEED" in vlist: + CS = xr.DataArray(cs, coords=this["TEMP"].coords, name="SOUND_SPEED") + CS.attrs["long_name"] = "Speed of sound" + CS.attrs["standard_name"] = "speed_of_sound_in_sea_water" + CS.attrs["unit"] = "m/s" that.append(CS) # Create a dataset with all new variables: @@ -991,11 +1590,333 @@ def mid(x): else: return that - # @property - # def plot(self): - # """Access plotting functions""" - # # Create a mutable instance on 1st call so that later changes will be reflected in future calls - # # https://stackoverflow.com/a/8140747 - # if "plot" not in self._register: - # self._register["plot"] = [_PlotMethods(self)] - # return self._register["plot"][0] + def create_float_source(self, # noqa: C901 + path: str or os.PathLike = None, + force: str = "default", + select: str = 'deep', + file_pref: str = '', + file_suff: str = '', + format: str = '5', + do_compression: bool = True, + debug_output: bool = False): + """ Preprocess data for OWC software calibration + + This method can create a FLOAT SOURCE file (i.e. the .mat file that usually goes into /float_source/) for OWC software. + The FLOAT SOURCE file is saved as: + + ``/.mat`` + + where ```` is automatically extracted from the dataset variable PLATFORM_NUMBER (in order to avoid mismatch + between user input and data content). So if this dataset has measurements from more than one float, more than one + Matlab file will be created. + + By default, variables loaded are raw PRES, PSAL and TEMP. + If PRES is adjusted, variables loaded are PRES_ADJUSTED, raw PSAL calibrated in pressure and raw TEMP. + + You can force the program to load raw PRES, PSAL and TEMP whatever PRES is adjusted or not: + + >>> ds.argo.create_float_source(force='raw') + + or you can force the program to load adjusted variables: PRES_ADJUSTED, PSAL_ADJUSTED, TEMP_ADJUSTED + + >>> ds.argo.create_float_source(force='adjusted') + + **Pre-processing details**: + + #. select only ascending profiles + + #. subsample vertical levels to keep the deepest pressure levels on each 10db bins from the surface down + to the deepest level. + + #. align pressure values, i.e. make sure that a pressure index corresponds to measurements from the same + binned pressure values. This can lead to modify the number of levels in the dataset. + + #. filter variables according to the ``force`` option (see below) + + #. filter variables according to QC flags: + + * Remove measurements where timestamp QC is >= 3 + * Keep measurements where pressure QC is anything but 3 + * Keep measurements where pressure, temperature or salinity QC are anything but 4 + + + #. remove dummy values: salinity not in [0/50], potential temperature not in [-10/50] and pressure not + in [0/60000]. Bounds inclusive. + + #. convert timestamp to fractional year + + #. convert longitudes to 0-360 + + + + Parameters + ---------- + path: str or path-like, optional + Path or folder name to which to save this Matlab file. If no path is provided, this function returns the + resulting Matlab file as :class:`xarray.Dataset`. + force: {"default", "raw", "adjusted"}, default: "default" + If force='default' will load PRES/PSAL/TEMP or PRES_ADJUSTED/PSAL/TEMP according to PRES_ADJUSTED filled or not. + + If force='raw' will load PRES/PSAL/TEMP + + If force='adjusted' will load PRES_ADJUSTED/PSAL_ADJUSTED/TEMP_ADJUSTED + select: {'deep','shallow','middle','random','min','max','mean','median'}, default: 'deep' + file_pref: str, optional + Preffix to add at the beginning of output file(s). + file_suff: str, optional + Suffix to add at the end of output file(s). + do_compression: bool, optional + Whether or not to compress matrices on write. Default is True. + format: {'5', '4'}, string, optional + Matlab file format version. '5' (the default) for MATLAB 5 and up (to 7.2). Use '4' for MATLAB 4 .mat files. + + Returns + ------- + :class:`xarray.Dataset` + The output dataset, or Matlab file, will have the following variables (``n`` is the number of profiles, ``m`` + is the number of vertical levels): + + - ``DATES`` (1xn): decimal year, e.g. 10 Dec 2000 = 2000.939726 + - ``LAT`` (1xn): decimal degrees, -ve means south of the equator, e.g. 20.5S = -20.5 + - ``LONG`` (1xn): decimal degrees, from 0 to 360, e.g. 98.5W in the eastern Pacific = 261.5E + - ``PROFILE_NO`` (1xn): this goes from 1 to n. PROFILE_NO is the same as CYCLE_NO in the Argo files + - ``PRES`` (mxn): dbar, from shallow to deep, e.g. 10, 20, 30 ... These have to line up along a fixed nominal depth axis. + - ``TEMP`` (mxn): in-situ IPTS-90 + - ``SAL`` (mxn): PSS-78 + - ``PTMP`` (mxn): potential temperature referenced to zero pressure, use SAL in PSS-78 and in-situ TEMP in IPTS-90 for calculation. + + """ + this = self._obj + + if ( + "history" in this.attrs + and "DATA_MODE" in this.attrs["history"] + and "QC" in this.attrs["history"] + ): + # This is surely a dataset fetch with 'standard' mode, we can't deal with this, we need 'expert' file + raise InvalidDatasetStructure( + "Need a full Argo dataset to create OWC float source. " + "This dataset was probably loaded with a 'standard' user mode. " + "Try to fetch float data in 'expert' mode" + ) + + if force not in ["default", "raw", "adjusted"]: + raise OptionValueError( + "force option must be 'default', 'raw' or 'adjusted'." + ) + + log.debug("===================== START create_float_source in '%s' mode" % force) + + if len(np.unique(this['PLATFORM_NUMBER'])) > 1: + log.debug("Found more than one 1 float in this dataset, will split processing") + + def ds2mat(this_dsp): + # Return a Matlab dictionary with dataset data to be used by savemat: + mdata = {} + mdata["PROFILE_NO"] = ( + this_dsp["PROFILE_NO"].astype("uint8").values.T[np.newaxis, :] + ) # 1-based index in Matlab + mdata["DATES"] = this_dsp["DATES"].values.T[np.newaxis, :] + mdata["LAT"] = this_dsp["LAT"].values.T[np.newaxis, :] + mdata["LONG"] = this_dsp["LONG"].values.T[np.newaxis, :] + mdata["PRES"] = this_dsp["PRES"].values + mdata["TEMP"] = this_dsp["TEMP"].values + mdata["PTMP"] = this_dsp["PTMP"].values + mdata["SAL"] = this_dsp["SAL"].values + return mdata + + def pretty_print_count(dd, txt): + # if dd.argo._type == "point": + # np = len(dd['N_POINTS'].values) + # nc = len(dd.argo.point2profile()['N_PROF'].values) + # else: + # np = len(dd.argo.profile2point()['N_POINTS'].values) + # nc = len(dd['N_PROF'].values) + out = [] + np, nc = dd.argo.N_POINTS, dd.argo.N_PROF + out.append("%i points / %i profiles in dataset %s" % (np, nc, txt)) + # np.unique(this['PSAL_QC'].values)) + # out.append(pd.to_datetime(dd['TIME'][0].values).strftime('%Y/%m/%d %H:%M:%S')) + return "\n".join(out) + + def getfilled_bins(pressure, bins): + ip = np.digitize(np.unique(pressure), bins, right=False) + ii, ij = np.unique(ip, return_index=True) + ii = ii[np.where(ii - 1 > 0)] - 1 + return bins[ii] + + def preprocess_one_float(this_one: xr.Dataset, + this_path: str or os.PathLike = None, + select: str = 'deep', + debug_output: bool = False): + """ Run the entire preprocessing on a given dataset with one float data """ + + # Add potential temperature: + if "PTEMP" not in this_one: + this_one = this_one.argo.teos10(vlist=["PTEMP"], inplace=True) + + # Only use Ascending profiles: + # https://github.com/euroargodev/dm_floats/blob/c580b15202facaa0848ebe109103abe508d0dd5b/src/ow_source/create_float_source.m#L143 + this_one = this_one.argo._where(this_one["DIRECTION"] == "A", drop=True) + log.debug(pretty_print_count(this_one, "after direction selection")) + + # Todo: ensure we load only the primary profile of cycles with multiple sampling schemes: + # https://github.com/euroargodev/dm_floats/blob/c580b15202facaa0848ebe109103abe508d0dd5b/src/ow_source/create_float_source.m#L194 + + # # Subsample and align vertical levels (max 1 level every 10db): + # https://github.com/euroargodev/dm_floats/blob/c580b15202facaa0848ebe109103abe508d0dd5b/src/ow_source/create_float_source.m#L208 + # this_one = this_one.argo.align_std_bins(inplace=False) + # log.debug(pretty_print_count(this_one, "after vertical levels subsampling")) + + # Filter variables according to OWC workflow + # (I don't understand why this_one come at the end of the Matlab routine ...) + # https://github.com/euroargodev/dm_floats/blob/c580b15202facaa0848ebe109103abe508d0dd5b/src/ow_source/create_float_source.m#L258 + this_one = this_one.argo.filter_scalib_pres(force=force, inplace=False) + log.debug(pretty_print_count(this_one, "after pressure fields selection")) + + # Filter along some QC: + # https://github.com/euroargodev/dm_floats/blob/c580b15202facaa0848ebe109103abe508d0dd5b/src/ow_source/create_float_source.m#L372 + this_one = this_one.argo.filter_qc( + QC_list=[0, 1, 2], QC_fields=["TIME_QC"], drop=True + ) # Matlab says to reject > 3 + # https://github.com/euroargodev/dm_floats/blob/c580b15202facaa0848ebe109103abe508d0dd5b/src/ow_source/create_float_source.m#L420 + this_one = this_one.argo.filter_qc( + QC_list=[v for v in range(10) if v != 3], QC_fields=["PRES_QC"], drop=True + ) # Matlab says to keep != 3 + this_one = this_one.argo.filter_qc( + QC_list=[v for v in range(10) if v != 4], + QC_fields=["PRES_QC", "TEMP_QC", "PSAL_QC"], + drop=True, + mode="any", + ) # Matlab says to keep != 4 + if len(this_one["N_POINTS"]) == 0: + raise DataNotFound( + "All data have been discarded because either PSAL_QC or TEMP_QC is filled with 4 or" + " PRES_QC is filled with 3 or 4\n" + "NO SOURCE FILE WILL BE GENERATED !!!" + ) + log.debug(pretty_print_count(this_one, "after QC filter")) + + # Exclude dummies + # https://github.com/euroargodev/dm_floats/blob/c580b15202facaa0848ebe109103abe508d0dd5b/src/ow_source/create_float_source.m#L427 + this_one = ( + this_one + .argo._where(this_one["PSAL"] <= 50, drop=True) + .argo._where(this_one["PSAL"] >= 0, drop=True) + .argo._where(this_one["PTEMP"] <= 50, drop=True) + .argo._where(this_one["PTEMP"] >= -10, drop=True) + .argo._where(this_one["PRES"] <= 6000, drop=True) + .argo._where(this_one["PRES"] >= 0, drop=True) + ) + if len(this_one["N_POINTS"]) == 0: + raise DataNotFound( + "All data have been discarded because they are filled with values out of range\n" + "NO SOURCE FILE WILL BE GENERATED !!!" + ) + log.debug(pretty_print_count(this_one, "after dummy values exclusion")) + + # Transform measurements to a collection of profiles for Matlab-like formation: + this_one = this_one.argo.point2profile() + + # Subsample and align vertical levels (max 1 level every 10db): + # https://github.com/euroargodev/dm_floats/blob/c580b15202facaa0848ebe109103abe508d0dd5b/src/ow_source/create_float_source.m#L208 + # https://github.com/euroargodev/dm_floats/blob/c580b15202facaa0848ebe109103abe508d0dd5b/src/ow_source/create_float_source.m#L451 + bins = np.arange(0.0, np.max(this_one["PRES"]) + 10.0, 10.0) + this_one = this_one.argo.groupby_pressure_bins(bins=bins, select=select, axis='PRES') + log.debug(pretty_print_count(this_one, "after vertical levels subsampling and re-alignment")) + + # Compute fractional year: + # https://github.com/euroargodev/dm_floats/blob/c580b15202facaa0848ebe109103abe508d0dd5b/src/ow_source/create_float_source.m#L334 + DATES = np.array( + [toYearFraction(d) for d in pd.to_datetime(this_one["TIME"].values)] + )[np.newaxis, :] + + # Read measurements: + PRES = this_one["PRES"].values.T # (mxn) + TEMP = this_one["TEMP"].values.T # (mxn) + PTMP = this_one["PTEMP"].values.T # (mxn) + SAL = this_one["PSAL"].values.T # (mxn) + LAT = this_one["LATITUDE"].values[np.newaxis, :] + LONG = this_one["LONGITUDE"].values[np.newaxis, :] + LONG[0][np.argwhere(LONG[0] < 0)] = LONG[0][np.argwhere(LONG[0] < 0)] + 360 + PROFILE_NO = this_one["CYCLE_NUMBER"].values[np.newaxis, :] + + # Create dataset with preprocessed data: + this_one_dsp_processed = xr.DataArray( + PRES, + dims=["m", "n"], + coords={"m": np.arange(0, PRES.shape[0]), "n": np.arange(0, PRES.shape[1])}, + name="PRES", + ).to_dataset(promote_attrs=False) + this_one_dsp_processed["TEMP"] = xr.DataArray( + TEMP, + dims=["m", "n"], + coords={"m": np.arange(0, TEMP.shape[0]), "n": np.arange(0, TEMP.shape[1])}, + name="TEMP", + ) + this_one_dsp_processed["PTMP"] = xr.DataArray( + PTMP, + dims=["m", "n"], + coords={"m": np.arange(0, PTMP.shape[0]), "n": np.arange(0, PTMP.shape[1])}, + name="PTMP", + ) + this_one_dsp_processed["SAL"] = xr.DataArray( + SAL, + dims=["m", "n"], + coords={"m": np.arange(0, SAL.shape[0]), "n": np.arange(0, SAL.shape[1])}, + name="SAL", + ) + this_one_dsp_processed["PROFILE_NO"] = xr.DataArray( + PROFILE_NO[0, :], + dims=["n"], + coords={"n": np.arange(0, PROFILE_NO.shape[1])}, + name="PROFILE_NO", + ) + this_one_dsp_processed["DATES"] = xr.DataArray( + DATES[0, :], + dims=["n"], + coords={"n": np.arange(0, DATES.shape[1])}, + name="DATES", + ) + this_one_dsp_processed["LAT"] = xr.DataArray( + LAT[0, :], dims=["n"], coords={"n": np.arange(0, LAT.shape[1])}, name="LAT" + ) + this_one_dsp_processed["LONG"] = xr.DataArray( + LONG[0, :], + dims=["n"], + coords={"n": np.arange(0, LONG.shape[1])}, + name="LONG", + ) + this_one_dsp_processed["m"].attrs = {"long_name": "vertical levels"} + this_one_dsp_processed["n"].attrs = {"long_name": "profiles"} + + # Create Matlab dictionary with preprocessed data (to be used by savemat): + mdata = ds2mat(this_one_dsp_processed) + + # Output + log.debug("float source data saved in: %s" % this_path) + if this_path is None: + if debug_output: + return mdata, this_one_dsp_processed, this_one # For debug/devel + else: + return this_one_dsp_processed + else: + from scipy.io import savemat + # Validity check of the path type is delegated to savemat + return savemat(this_path, mdata, appendmat=False, format=format, do_compression=do_compression) + + # Run pre-processing for each float data + output = {} + for WMO in np.unique(this['PLATFORM_NUMBER']): + log.debug("> Preprocessing data for float WMO %i" % WMO) + this_float = this.argo._where(this['PLATFORM_NUMBER'] == WMO, drop=True) + if path is None: + output[WMO] = preprocess_one_float(this_float, this_path=path, select=select, debug_output=debug_output) + else: + os.makedirs(path, exist_ok=True) # Make path exists + float_path = os.path.join(path, "%s%i%s.mat" % (file_pref, WMO, file_suff)) + preprocess_one_float(this_float, this_path=float_path, select=select, debug_output=debug_output) + output[WMO] = float_path + if path is None: + log.debug("===================== END create_float_source") + return output diff --git a/docs/_static/groupby_pressure_bins_select_deep.png b/docs/_static/groupby_pressure_bins_select_deep.png new file mode 100644 index 00000000..c7cfb881 Binary files /dev/null and b/docs/_static/groupby_pressure_bins_select_deep.png differ diff --git a/docs/_static/groupby_pressure_bins_select_random.png b/docs/_static/groupby_pressure_bins_select_random.png new file mode 100644 index 00000000..4489a066 Binary files /dev/null and b/docs/_static/groupby_pressure_bins_select_random.png differ diff --git a/docs/api-hidden.rst b/docs/api-hidden.rst index 40f7fd2d..136d02d9 100644 --- a/docs/api-hidden.rst +++ b/docs/api-hidden.rst @@ -58,8 +58,10 @@ argopy.utilities.list_available_data_src argopy.utilities.list_available_index_src argopy.utilities.Chunker + + argopy.utilities.groupby_remap + argopy.utilities.linear_interpolation_remap - argopy.utilities.TopoFetcher argopy.utilities.TopoFetcher.cname argopy.utilities.TopoFetcher.define_constraints argopy.utilities.TopoFetcher.get_url @@ -125,12 +127,14 @@ argopy.stores.argo_index.indexstore argopy.stores.argo_index.indexfilter_wmo argopy.stores.argo_index.indexfilter_box - + argopy.xarray.ArgoAccessor.point2profile argopy.xarray.ArgoAccessor.profile2point - argopy.xarray.ArgoAccessor.cast_types - argopy.xarray.ArgoAccessor.uid - argopy.xarray.ArgoAccessor.filter_qc - argopy.xarray.ArgoAccessor.filter_data_mode argopy.xarray.ArgoAccessor.interp_std_levels + argopy.xarray.ArgoAccessor.groupby_pressure_bins argopy.xarray.ArgoAccessor.teos10 + argopy.xarray.ArgoAccessor.create_float_source + argopy.xarray.ArgoAccessor.filter_qc + argopy.xarray.ArgoAccessor.filter_data_mode + argopy.xarray.ArgoAccessor.filter_scalib_pres + argopy.xarray.ArgoAccessor.cast_types diff --git a/docs/api.rst b/docs/api.rst index c15d682b..d94679b6 100644 --- a/docs/api.rst +++ b/docs/api.rst @@ -4,6 +4,9 @@ API reference This page provides an auto-generated summary of argopy's API. For more details and examples, refer to the relevant chapters in the main part of the documentation. +.. contents:: + :local: + Top-levels functions ==================== @@ -35,8 +38,8 @@ Fetcher access points IndexFetcher.float IndexFetcher.profile -Fetching methods ----------------- +Fetcher methods +--------------- .. autosummary:: :toctree: generated/ @@ -54,8 +57,8 @@ Fetching methods IndexFetcher.to_dataframe IndexFetcher.to_csv -Fetched data visualisation --------------------------- +Data visualisation +------------------ .. autosummary:: :toctree: generated/ @@ -120,10 +123,10 @@ This accessor extends :py:class:`xarray.Dataset`. Proper use of this accessor sh >>> import xarray as xr # first import xarray >>> import argopy # import argopy (the dataset 'argo' accessor is registered) - >>> from argopy import DataFetcher as ArgoDataFetcher - >>> ds = ArgoIndexFetcher().float([6902766, 6902772, 6902914, 6902746]).load().data + >>> from argopy import DataFetcher + >>> ds = DataFetcher().float([6902766, 6902772, 6902914, 6902746]).load().data >>> ds.argo - >>> ds.argo.filter_qc + >>> ds.argo.filter_qc() Data Transformation @@ -136,6 +139,7 @@ Data Transformation Dataset.argo.point2profile Dataset.argo.profile2point Dataset.argo.interp_std_levels + Dataset.argo.groupby_pressure_bins Data Filters ------------ @@ -146,15 +150,17 @@ Data Filters Dataset.argo.filter_qc Dataset.argo.filter_data_mode + Dataset.argo.filter_scalib_pres -Complementing -------------- +Processing +---------- .. autosummary:: :toctree: generated/ :template: autosummary/accessor_method.rst Dataset.argo.teos10 + Dataset.argo.create_float_source Misc ---- diff --git a/docs/conf.py b/docs/conf.py index 12034250..32f12b53 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -85,7 +85,6 @@ 'sphinx.ext.doctest', 'sphinx.ext.viewcode', 'sphinx.ext.inheritance_diagram', - 'matplotlib.sphinxext.plot_directive', 'nbsphinx', 'numpydoc', 'sphinx_issues', @@ -191,7 +190,9 @@ # # html_theme = 'sphinx_rtd_theme' html_theme = 'sphinx_book_theme' -# html_theme = 'bootstrap' +# html_theme = 'bootstrap' # pip install sphinx-bootstrap-theme +# html_theme = 'sphinx_redactor_theme' # pip install sphinx-redactor-theme +# html_theme = 'pydata_sphinx_theme' # pip install pydata-sphinx-theme # Theme options are theme-specific and customize the look and feel of a theme # further. For a list of options available for each theme, see the @@ -218,6 +219,7 @@ 'logo_only': True, 'display_version': False, 'prev_next_buttons_location': 'bottom', + 'show_navbar_depth': 1, # 'style_external_links': False, # 'vcs_pageview_mode': '', # 'style_nav_header_background': 'white', @@ -227,6 +229,7 @@ 'navigation_depth': 4, # 'includehidden': True, # 'titles_only': False +# 'launch_buttons': { "thebe": True} } # Sometimes the savefig directory doesn't exist and needs to be created diff --git a/docs/data_manipulation.rst b/docs/data_manipulation.rst index 33c10a3c..13192fd8 100644 --- a/docs/data_manipulation.rst +++ b/docs/data_manipulation.rst @@ -1,6 +1,9 @@ Manipulating data ================= +.. contents:: + :local: + .. currentmodule:: xarray Once you fetched data, **argopy** comes with a handy :class:`xarray.Dataset` accessor ``argo`` to perform specific manipulation of the data. This means that if your dataset is named `ds`, then you can use `ds.argo` to access more **argopy** functions. The full list is available in the API documentation page :ref:`Dataset.argo (xarray accessor)`. @@ -16,9 +19,9 @@ Transformation -------------- Points vs profiles -~~~~~~~~~~~~~~~~~~ +^^^^^^^^^^^^^^^^^^ -Fetched data are returned as a 1D array collection of measurements: +By default, fetched data are returned as a 1D array collection of measurements: .. ipython:: python :okwarning: @@ -43,10 +46,10 @@ You can simply reverse this transformation with the :meth:`Dataset.argo.profile2 ds = ds_profiles.argo.profile2point() ds -Interpolation to standard levels -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Pressure levels: Interpolation +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -Once your dataset is a collection of vertical **profiles**, you can interpolate variables on standard pressure levels using :meth:`Dataset.argo.interp_std_levels` with your levels as input :meth:`Dataset.argo.interp_std_levels`: +Once your dataset is a collection of vertical **profiles**, you can interpolate variables on standard pressure levels using :meth:`Dataset.argo.interp_std_levels` with your levels as input: .. ipython:: python :okwarning: @@ -59,13 +62,68 @@ Note on the linear interpolation process : - Remaining profiles must have at least five data points to allow interpolation. - For each profile, shallowest data point is repeated to the surface to allow a 0 standard level while avoiding extrapolation. +Pressure levels: Group-by bins +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +If you prefer to avoid interpolation, you can opt for a pressure bins grouping reduction using :meth:`Dataset.argo.groupby_pressure_bins`. This method can be used to subsample and align an irregular dataset (pressure not being similar in all profiles) on a set of pressure bins. The output dataset could then be used to perform statistics along the N_PROF dimension because N_LEVELS will corresponds to similar pressure bins. + +To illustrate this method, let's start by fetching some data from a low vertical resolution float: + +.. ipython:: python + :okwarning: + + loader = ArgoDataFetcher(src='erddap', mode='expert').float(2901623) # Low res float + ds = loader.load().data + +Let's now sub-sample these measurements along 250db bins, selecting values from the **deepest** pressure levels for each bins: + +.. ipython:: python + :okwarning: + + bins = np.arange(0.0, np.max(ds["PRES"]), 250.0) + ds_binned = ds.argo.groupby_pressure_bins(bins=bins, select='deep') + ds_binned + +See the new ``STD_PRES_BINS`` variable that hold the pressure bins definition. + +The figure below shows the sub-sampling effect: + +.. code-block:: python + + import matplotlib as mpl + import matplotlib.pyplot as plt + import cmocean + + fig, ax = plt.subplots(figsize=(18,6)) + ds.plot.scatter(x='CYCLE_NUMBER', y='PRES', hue='PSAL', ax=ax, cmap=cmocean.cm.haline) + plt.plot(ds_binned['CYCLE_NUMBER'], ds_binned['PRES'], 'r+') + plt.hlines(bins, ds['CYCLE_NUMBER'].min(), ds['CYCLE_NUMBER'].max(), color='k') + plt.hlines(ds_binned['STD_PRES_BINS'], ds_binned['CYCLE_NUMBER'].min(), ds_binned['CYCLE_NUMBER'].max(), color='r') + plt.title(ds.attrs['Fetched_constraints']) + plt.gca().invert_yaxis() + +.. image:: _static/groupby_pressure_bins_select_deep.png + +The bin limits are shown with horizontal red lines, the original data are in the background colored scatter and the group-by pressure bins values are highlighted in red marks + +The ``select`` option can take many different values, see the full documentation of :meth:`Dataset.argo.groupby_pressure_bins` , for all the details. Let's show here results from the ``random`` sampling: + +.. code-block:: python + + ds_binned = ds.argo.groupby_pressure_bins(bins=bins, select='random') + +.. image:: _static/groupby_pressure_bins_select_random.png + + Filters -~~~~~~~ +^^^^^^^ + +If you fetched data with the ``expert`` mode, you may want to use *filters* to help you curate the data. -If you fetched data with the ``expert`` mode, you may want to use -*filters* to help you curate the data. +- **QC flag filter**: :meth:`Dataset.argo.filter_qc`. This method allows you to filter measurements according to QC flag values. This filter modifies all variables of the dataset. +- **Data mode filter**: :meth:`Dataset.argo.filter_data_mode`. This method allows you to filter variables according to their data mode. This filter modifies the and variables of the dataset. +- **OWC variables filter**: :meth:`Dataset.argo.filter_scalib_pres`. This method allows you to filter variables according to OWC salinity calibration software requirements. This filter modifies pressure, temperature and salinity related variables of the dataset. -[To be added] Complementary data ------------------ @@ -73,7 +131,7 @@ Complementary data TEOS-10 variables ~~~~~~~~~~~~~~~~~ -You can compute additional ocean variables from `TEOS-10 `_. The default list of variables is: 'SA', 'CT', 'SIG0', 'N2', 'PV', 'PTEMP' ('SOUND_SPEED' is optional). `Simply raise an issue to add a new one `_. +You can compute additional ocean variables from `TEOS-10 `_. The default list of variables is: 'SA', 'CT', 'SIG0', 'N2', 'PV', 'PTEMP' ('SOUND_SPEED', 'CNDC' are optional). `Simply raise an issue to add a new one `_. This can be done using the :meth:`Dataset.argo.teos10` method and indicating the list of variables you want to compute: diff --git a/docs/data_quality_control.rst b/docs/data_quality_control.rst index 99df70cc..302a4614 100644 --- a/docs/data_quality_control.rst +++ b/docs/data_quality_control.rst @@ -3,26 +3,154 @@ Data quality control ==================== -**argopy** comes with handy methods to help you quality control measurements. This section is probably intended for `expert` users. +.. contents:: + :local: + +**argopy** comes with methods to help you quality control measurements. This section is probably intended for `expert` users. Most of these methods are available through the :class:`xarray.Dataset` accessor namespace ``argo``. This means that if your dataset is `ds`, then you can use `ds.argo` to access more **argopy** functionalities. -Let's start with import and set-up: +Let's start with standard import: .. ipython:: python :okwarning: from argopy import DataFetcher as ArgoDataFetcher + +Salinity calibration +-------------------- + +.. currentmodule:: xarray + +The Argo salinity calibration method is called OWC_, after the names of the core developers: Breck Owens, Anny Wong and Cecile Cabanes. +Historically, the OWC method has been implemented in `Matlab `_ . More recently a `python version has been developed `_. + +Preprocessing data +^^^^^^^^^^^^^^^^^^ + +At this point, both OWC software take as input a pre-processed version of the Argo float data to evaluate/calibrate. + +**argopy** is able to perform this preprocessing and to create a *float source* data to be used by OWC software. This is made by :meth:`Dataset.argo.create_float_source`. + +First, you would need to fetch the Argo float data you want to calibrate, in ``expert`` mode: + +.. ipython:: python + :okwarning: + + ds = ArgoDataFetcher(mode='expert').float(6902766).load().data + +Then, to create the float source data, you call the method and provide a folder name to save output files: + +.. ipython:: python + :okwarning: + + ds.argo.create_float_source("float_source") + +This will create the ``float_source/6902766.mat`` Matlab files to be set directly in the configuration file of the OWC software. This routine implements the same pre-processing as in the Matlab version (which is hosted on `this repo `_ and ran with `this routine `_). All the detailed steps of this pre-processing are given in the :meth:`Dataset.argo.create_float_source` API page. + +.. note:: + If the dataset contains data from more than one float, several Matlab files are created, one for each float. This will allow you to prepare data from a collection of floats. + +If you don't specify a path name, the method returns a dictionary with the float WMO as keys and pre-processed data as :class:`xarray.Dataset` as values. + +.. ipython:: python + :okwarning: + + ds_source = ds.argo.create_float_source() + ds_source + +See all options available for this method here: :meth:`Dataset.argo.create_float_source`. + +The method partially relies on two others: + +- :meth:`Dataset.argo.filter_scalib_pres`: to filter variables according to OWC salinity calibration software requirements. This filter modifies pressure, temperature and salinity related variables of the dataset. + +- :meth:`Dataset.argo.groupby_pressure_bins`: to sub-sampled measurements by pressure bins. This is an excellent alternative to the :meth:`Dataset.argo.interp_std_levels` to avoid interpolation and preserve values of raw measurements while at the same time aligning measurements along approximately similar pressure levels (depending on the size of the bins). See more description at here: :ref:`Pressure levels: Group-by bins`. + +Running the calibration +^^^^^^^^^^^^^^^^^^^^^^^ + +Please refer to the `OWC python software documentation `_. + +A typical workflow would look like this: + +.. code-block:: python + + import os, shutil + from pathlib import Path + + import pyowc as owc + import argopy + from argopy import DataFetcher + + # Define float to calibrate: + FLOAT_NAME = "6903010" + + # Set-up where to save OWC analysis results: + results_folder = './analysis/%s' % FLOAT_NAME + Path(results_folder).mkdir(parents=True, exist_ok=True) + shutil.rmtree(results_folder) # Clean up folder content + Path(os.path.sep.join([results_folder, 'float_source'])).mkdir(parents=True, exist_ok=True) + Path(os.path.sep.join([results_folder, 'float_calib'])).mkdir(parents=True, exist_ok=True) + Path(os.path.sep.join([results_folder, 'float_mapped'])).mkdir(parents=True, exist_ok=True) + Path(os.path.sep.join([results_folder, 'float_plots'])).mkdir(parents=True, exist_ok=True) + + # fetch the default configuration and parameters + USER_CONFIG = owc.configuration.load() + + # Fix paths to run at Ifremer: + for k in USER_CONFIG: + if "FLOAT" in k and "data/" in USER_CONFIG[k][0:5]: + USER_CONFIG[k] = os.path.abspath(USER_CONFIG[k].replace("data", results_folder)) + USER_CONFIG['CONFIG_DIRECTORY'] = os.path.abspath('../data/constants') + USER_CONFIG['HISTORICAL_DIRECTORY'] = os.path.abspath('/Volumes/OWC/CLIMATOLOGY/') # where to find ARGO_for_DMQC_2020V03 and CTD_for_DMQC_2021V01 folders + USER_CONFIG['HISTORICAL_ARGO_PREFIX'] = 'ARGO_for_DMQC_2020V03/argo_' + USER_CONFIG['HISTORICAL_CTD_PREFIX'] = 'CTD_for_DMQC_2021V01/ctd_' + print(owc.configuration.print_cfg(USER_CONFIG)) + + # Create float source data with argopy: + fetcher_for_real = DataFetcher(src='localftp', cache=True, mode='expert').float(FLOAT_NAME) + fetcher_sample = DataFetcher(src='localftp', cache=True, mode='expert').profile(FLOAT_NAME, [1, 2]) # To reduce execution time for demo + ds = fetcher_sample.load().data + ds.argo.create_float_source(path=USER_CONFIG['FLOAT_SOURCE_DIRECTORY'], force='default') + + # Prepare data for calibration: map salinity on theta levels + owc.calibration.update_salinity_mapping("", USER_CONFIG, FLOAT_NAME) + + # Set the calseries parameters for analysis and line fitting + owc.configuration.set_calseries("", FLOAT_NAME, USER_CONFIG) + + # Calculate the fit of each break and calibrate salinities + owc.calibration.calc_piecewisefit("", FLOAT_NAME, USER_CONFIG) + + # Results figures + owc.plot.dashboard("", FLOAT_NAME, USER_CONFIG) + +OWC references +^^^^^^^^^^^^^^ + +.. [OWC] See all the details about the OWC methodology in these references: + +"An improved calibration method for the drift of the conductivity sensor on autonomous CTD profiling floats by θ–S climatology". +Deep-Sea Research Part I: Oceanographic Research Papers, 56(3), 450-457, 2009. https://doi.org/10.1016/j.dsr.2008.09.008 + +"Improvement of bias detection in Argo float conductivity sensors and its application in the North Atlantic". +Deep-Sea Research Part I: Oceanographic Research Papers, 114, 128-136, 2016. https://doi.org/10.1016/j.dsr.2016.05.007 + + +Trajectories +------------ + Topography ----------- +^^^^^^^^^^ .. currentmodule:: argopy For some QC of trajectories, it can be useful to easily get access to the topography. This can be done with the **argopy** utility :class:`TopoFetcher`: .. ipython:: python :okwarning: - + from argopy import TopoFetcher box = [-65, -55, 10, 20] ds = TopoFetcher(box, cache=True).to_xarray() @@ -40,8 +168,8 @@ Combined with the fetcher property ``domain``, it now becomes easy to superimpos .. code-block:: python - fig, ax = fetcher.plot('trajectory', figsize=(10, 10)) - ds['elevation'].plot.contourf(levels=np.arange(-6000,0,200), ax=ax, add_colorbar=False) + fig, ax = loader.plot('trajectory', figsize=(10, 10)) + ds['elevation'].plot.contourf(levels=np.arange(-6000,0,100), ax=ax, add_colorbar=False) .. image:: _static/trajectory_topography_sample.png diff --git a/docs/data_sources.rst b/docs/data_sources.rst index f97e1463..c0c1edea 100644 --- a/docs/data_sources.rst +++ b/docs/data_sources.rst @@ -1,6 +1,9 @@ Data sources ============ +.. contents:: + :local: + Let's start with standard import: .. ipython:: python diff --git a/docs/index.rst b/docs/index.rst index b08dd117..51caf99f 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -29,9 +29,9 @@ Documentation * :doc:`data_fetching` * :doc:`data_sources` * :doc:`data_manipulation` +* :doc:`visualisation` * :doc:`user_mode` * :doc:`metadata_fetching` -* :doc:`visualisation` * :doc:`performances` * :doc:`data_quality_control` @@ -43,11 +43,11 @@ Documentation data_fetching data_sources data_manipulation + visualisation + data_quality_control user_mode metadata_fetching - visualisation performances - data_quality_control **Help & reference** diff --git a/docs/paper.md b/docs/paper.md index bd901bf8..bb7bd4de 100644 --- a/docs/paper.md +++ b/docs/paper.md @@ -83,7 +83,7 @@ The result of this tremendous success in data management -- in developing good p procedures ([see all the Argo Data Management Team documentation here](http://www.argodatamgt.org/Documentation)) -- is a very complex Argo dataset: the **argopy** software aims to help users navigate this complex realm. -Since the Argo community focuses on delivering a curated dataset for science, software packages exist for Argo data operators to decode and quality control the data [e.g. @scoop]. However, no open source softwares are available for scientists, who therefore must develop their own machinery to download and manipulate the data. +Since the Argo community focuses on delivering a curated dataset for science, software packages exist for Argo data operators to decode and quality control the data [e.g. @scoop]. However, no open source software are available for scientists, who therefore must develop their own machinery to download and manipulate the data. Python is becoming widely used by the scientific community and beyond: worldwide, and is the most popular and fastest growing language in the last 5 years (20%, source: http://pypl.github.io/PYPL.html). It offers a modern, powerful and open source framework to work with. Since, up to this point, no Python based software has been dedicated to the Argo dataset, it made sense to develop **argopy**. diff --git a/docs/performances.rst b/docs/performances.rst index 2ee8ba12..d6f4fb26 100644 --- a/docs/performances.rst +++ b/docs/performances.rst @@ -1,6 +1,9 @@ Performances ============ +.. contents:: + :local: + To improve **argopy** data fetching performances (in terms of time of retrieval), 2 solutions are available: diff --git a/docs/requirements.txt b/docs/requirements.txt index 56a1bd41..7750780d 100644 --- a/docs/requirements.txt +++ b/docs/requirements.txt @@ -13,6 +13,7 @@ sphinx-autosummary-accessors>=0.1.2 readthedocs-sphinx-ext sphinx-rtd-theme sphinx-book-theme +# sphinx-bootstrap-theme xarray>=0.16.1 scipy>=1.1.0 diff --git a/docs/usage.rst b/docs/usage.rst index a17360b5..76b1c630 100644 --- a/docs/usage.rst +++ b/docs/usage.rst @@ -18,6 +18,8 @@ Data are returned as a collection of measurements in a :class:`xarray.Dataset`: ds +.. currentmodule:: xarray + Fetched data are returned as a 1D array collection of measurements. If you prefer to work with a 2D array collection of vertical profiles, simply transform the dataset with the :class:`xarray.Dataset` accessor method :meth:`Dataset.argo.point2profile`: .. ipython:: python @@ -38,4 +40,4 @@ or for a float profile using the cycle number: .. ipython:: python :okwarning: - ds = ArgoDataFetcher().profile(6902755, 12).to_xarray() \ No newline at end of file + ds = ArgoDataFetcher().profile(6902755, 12).to_xarray() diff --git a/docs/visualisation.rst b/docs/visualisation.rst index 72e63bd7..9c84f792 100644 --- a/docs/visualisation.rst +++ b/docs/visualisation.rst @@ -3,7 +3,7 @@ Data visualisation ################## -Although ``argopy`` is not focus on visualisation, it provides a few functions to get you started. Plotting functions are available for both the data and index fetchers. +Although **argopy** is not focus on visualisation, it provides a few functions to get you started. Plotting functions are available for both the data and index fetchers. Trajectories ------------ diff --git a/docs/what_is_argo.rst b/docs/what_is_argo.rst index 89409a96..e80a9c63 100644 --- a/docs/what_is_argo.rst +++ b/docs/what_is_argo.rst @@ -22,7 +22,7 @@ in situ temperature/salinity measurements of the ocean interior, key information (`Riser et al, 2016 `_). The Argo array reached its full global coverage (of 1 profile per month and per 3x3 degree horizontal area) in 2007, and -continuously pursues its evolution to fullfill new scientific requirements (`Roemmich et al, 2019 +continuously pursues its evolution to fulfill new scientific requirements (`Roemmich et al, 2019 `_). It now extents to higher latitudes and some of the floats are able to profile down to 4000m and 6000m. New floats are also equipped with biogeochemical sensors, measuring oxygen and chlorophyll for instance. Argo is thus providing a deluge of in situ data: more than 400 profiles per day. diff --git a/docs/whats-new.rst b/docs/whats-new.rst index e78e8166..7bd50fac 100644 --- a/docs/whats-new.rst +++ b/docs/whats-new.rst @@ -3,12 +3,46 @@ What's New ========== -v0.1.9 (X XXX. 2021) --------------------- +v0.1.9 (X XXX. 202X) +--------------------- **Features and front-end API** -- New plotter function :meth:`argopy.plotters.open_sat_altim_report` to insert the CLS Satellite Altimeter Report figure in a notebook cell. (:pr:`159`) by `G. Maze `_. +- **New method to preprocess data for OWC software**. This method can preprocessed Argo data and possibly create float_source/.mat files to be used as inputs for OWC implementations in `Matlab `_ and `Python `_. See the :ref:`Salinity calibration` documentation page for more. (:pr:`142`) by `G. Maze `_. + +.. code-block:: python + + from argopy import DataFetcher as ArgoDataFetcher + ds = ArgoDataFetcher(mode='expert').float(6902766).load().data + ds.argo.create_float_source("float_source") + ds.argo.create_float_source("float_source", force='raw') + ds_source = ds.argo.create_float_source() + + +.. currentmodule:: xarray + +This new method comes with others methods and improvements: + + - A new :meth:`Dataset.argo.filter_scalib_pres` method to filter variables according to OWC salinity calibration software requirements, + - A new :meth:`Dataset.argo.groupby_pressure_bins` method to subsample a dataset down to one value by pressure bins (a perfect alternative to interpolation on standard depth levels to precisely avoid interpolation...), see :ref:`Pressure levels: Group-by bins` for more help, + - An improved :meth:`Dataset.argo.filter_qc` method to select which fields to consider (new option ``QC_fields``), + - Add conductivity (``CNDC``) to the possible output of the ``TEOS10`` method. + +.. currentmodule:: argopy + +- **New dataset properties** accessible from the `argo` xarray accessor: ``N_POINTS``, ``N_LEVELS``, ``N_PROF``. Note that depending on the format of the dataset (a collection of points or of profiles) these values do or do not take into account NaN. These information are also visible by a simple print of the accessor. (:pr:`142`) by `G. Maze `_. + +.. code-block:: python + + from argopy import DataFetcher as ArgoDataFetcher + ds = ArgoDataFetcher(mode='expert').float(6902766).load().data + ds.argo.N_POINTS + ds.argo.N_LEVELS + ds.argo.N_PROF + ds.argo + + +- **New plotter function** :meth:`argopy.plotters.open_sat_altim_report` to insert the CLS Satellite Altimeter Report figure in a notebook cell. (:pr:`159`) by `G. Maze `_. .. code-block:: python @@ -25,7 +59,8 @@ v0.1.9 (X XXX. 2021) DataFetcher().float([6902745, 6902746]).plot('qc_altimetry') IndexFetcher().float([6902745, 6902746]).plot('qc_altimetry') -- New utility method :class:`argopy.TopoFetcher` to retrieve `GEBCO topography `_ for a given region. (:pr:`150`) by `G. Maze `_. + +- **New utility method to retrieve topography**. The :class:`argopy.TopoFetcher` will load the `GEBCO topography `_ for a given region. (:pr:`150`) by `G. Maze `_. .. code-block:: python @@ -103,7 +138,7 @@ v0.1.8 (2 Nov. 2021) - More general options. Fix :issue:`91`. (:pr:`102`) by `G. Maze `_. - - ``trust_env`` to allow for local environment variables to be used by fsspec to connect to the internet. Usefull for those using a proxy. + - ``trust_env`` to allow for local environment variables to be used by fsspec to connect to the internet. Useful for those using a proxy. - Documentation on `Read The Docs` now uses a pip environment and get rid of memory eager conda. (:pr:`103`) by `G. Maze `_. @@ -271,11 +306,11 @@ v0.1.3 (15 May 2020) idx.to_dataframe() idx.plot('trajectory') -The ``index`` fetcher can manage caching and works with both Erddap and localftp data sources. It is basically the same as the data fetcher, but do not load measurements, only meta-data. This can be very usefull when looking for regional sampling or trajectories. +The ``index`` fetcher can manage caching and works with both Erddap and localftp data sources. It is basically the same as the data fetcher, but do not load measurements, only meta-data. This can be very useful when looking for regional sampling or trajectories. .. tip:: - **Performance**: we recommand to use the ``localftp`` data source when working this ``index`` fetcher because the ``erddap`` data source currently suffers from poor performances. This is linked to :issue:`16` and is being addressed by Ifremer. + **Performance**: we recommend to use the ``localftp`` data source when working this ``index`` fetcher because the ``erddap`` data source currently suffers from poor performances. This is linked to :issue:`16` and is being addressed by Ifremer. The ``index`` fetcher comes with basic plotting functionalities with the :func:`argopy.IndexFetcher.plot` method to rapidly visualise measurement distributions by DAC, latitude/longitude and floats type. diff --git a/docs/why.rst b/docs/why.rst index 78858a52..d2247bbe 100644 --- a/docs/why.rst +++ b/docs/why.rst @@ -29,11 +29,11 @@ If you don't know in which category you would place yourself, try to answer the * [ ] what is an adjusted parameter ? * [ ] what a QC flag of 3 means ? -If you don't answer to more than 1 question: you probably will feel more confortable with the *standard* user mode. +If you don't answer to more than 1 question: you probably will feel more comfortable with the *standard* user mode. By default, all **argopy** data fetchers are set to work with a **standard** user mode, the other possible mode is **expert**. -In *standard* mode, fetched data are automatically filtered to account for their quality (only good are retained) and level of processing by the data centers (wether they looked at the data briefly or not). +In *standard* mode, fetched data are automatically filtered to account for their quality (only good are retained) and level of processing by the data centers (whether they looked at the data briefly or not). Selecting user mode is further explained in the dedicated documentation section: :ref:`user-mode`.