Skip to content

Commit

Permalink
Version datasets individually (#18)
Browse files Browse the repository at this point in the history
Instead of using the entire data bundle and versioning all datasets based on
the module name, we can now version them individually. To do so, we split up
the data archives into separate repositories and Zenodo releases of
https://github.com/fatiando-data. This means that functions won't have to be
repeated and updating one dataset doesn't mean copying all of the others along
with it (since the collection would be new). Versions are now specified as a
required `version` argument in all `fetch_*` functions. When updating a
dataset, functions for the others don't need to be repeated in a new module.
Also only need 2 environment variables for setting the cache location and the
data source (instead of 2 per version). A downside is that we can no longer
accept a variable to set custom data source URL since each dataset has a
different one. The new environment variable only sets fetching from GitHub or
not. There was considerable refactoring of the code to make this work.
  • Loading branch information
leouieda authored Feb 18, 2022
1 parent 27a5e9f commit 87db21f
Show file tree
Hide file tree
Showing 28 changed files with 778 additions and 643 deletions.
14 changes: 7 additions & 7 deletions .github/workflows/docs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -111,19 +111,19 @@ jobs:
- name: Install the package
run: python -m pip install dist/*.whl

- name: Cache the v1 datasets
- name: Cache the datasets
if: github.event_name == 'pull_request'
uses: actions/cache@v2
with:
path: ${{ runner.temp }}/cache/ensaio/v1
key: ensaio-data-v1-${{ hashFiles('ensaio/v1.py') }}
path: ${{ runner.temp }}/cache/ensaio
key: ensaio-data-${{ hashFiles('ensaio/_fetchers.py') }}

- name: Cache the v1 sphinx-gallery runs
- name: Cache the sphinx-gallery runs
if: github.event_name == 'pull_request'
uses: actions/cache@v2
with:
path: doc/gallery/v1
key: gallery-v1-${{ hashFiles('doc/gallery_src/v1/*.py') }}
path: doc/gallery/
key: gallery-${{ hashFiles('doc/gallery_src/*.py') }}

- name: Cache the tutorial sphinx-gallery runs
if: github.event_name == 'pull_request'
Expand All @@ -136,7 +136,7 @@ jobs:
run: make -C doc all
env:
# Define directory where sample data will be stored
ENSAIO_V1_DATA_DIR: ${{ runner.temp }}/cache/ensaio/v1
ENSAIO_DATA_DIR: ${{ runner.temp }}/cache/ensaio/

# Store the docs as a build artifact so we can deploy it later
- name: Upload HTML documentation as an artifact
Expand Down
8 changes: 4 additions & 4 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -144,17 +144,17 @@ jobs:
- name: List installed packages
run: python -m pip freeze

- name: Cache the v1 datasets
- name: Cache the datasets
if: matrix.cached
uses: actions/cache@v2
with:
path: ${{ runner.temp }}/cache/ensaio/v1
key: ensaio-data-v1-${{ hashFiles('ensaio/v1.py') }}
path: ${{ runner.temp }}/cache/ensaio
key: ensaio-data-${{ hashFiles('ensaio/_fetchers.py') }}

- name: Run the tests
run: make test
env:
ENSAIO_V1_DATA_DIR: ${{ runner.temp }}/cache/ensaio/v1
ENSAIO_DATA_DIR: ${{ runner.temp }}/cache/ensaio

- name: Convert coverage report to XML for codecov
run: coverage xml
Expand Down
10 changes: 5 additions & 5 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,8 @@ About

**Ensaio** (Portuguese for "rehearsal") is a Python package for downloading
open-access sample datasets for Geoscience.
It taps into the curated collection from `fatiando/data
<https://github.com/fatiando/data>`__ that is designed for use in tutorials,
It taps into the `Fatiando a Terra FAIR data collection
<https://github.com/fatiando-data>`__ that is designed for use in tutorials,
documentation, and teaching.

It uses `Pooch <https://www.fatiando.org/pooch>`__ to manage downloading and
Expand All @@ -43,9 +43,9 @@ Project goals
* Only download and let the user load the data. This helps make tutorials and
examples more easily extended to a user's own data.
* Be fully backwards compatible. We achieve this by separating **data**
versions from **Ensaio** versions. Major releases of the data get separate
modules in Ensaio: `ensaio.v1`, `ensaio.v2`, etc. Major releases of Ensaio
will be few and far between (if any).
versions from **Ensaio** versions. Data fetching functions allow you to
choose any data version that is older than the version of Ensaio that's
installed. Major releases of Ensaio will be few and far between (if any).

Contacting Us
-------------
Expand Down
51 changes: 11 additions & 40 deletions doc/api/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,46 +3,17 @@
List of functions and classes (API)
===================================

Functions and variables used to download the datasets and cache them locally.
Use the respective module to access the datasets in each major version of the
data release.

.. tip::

The best way to use Ensaio is to ``import ensaio.v1 as ensaio`` or likewise
with other versions that are available. This way your code will continue to
work even when Ensaio updates to include newer incompatible dataset
versions. See :ref:`compatibility`.

.. automodule:: ensaio
.. currentmodule:: ensaio

``ensaio.v1``
-------------

.. automodule:: ensaio.v1

Functions:

.. autosummary::
:toctree: generated/

ensaio.v1.locate
ensaio.v1.fetch_alps_gps
ensaio.v1.fetch_britain_magnetic
ensaio.v1.fetch_british_columbia_lidar
ensaio.v1.fetch_caribbean_bathymetry
ensaio.v1.fetch_earth_geoid
ensaio.v1.fetch_earth_gravity
ensaio.v1.fetch_earth_topography
ensaio.v1.fetch_southern_africa_gravity

Module variables:

.. autosummary::
:toctree: generated/

ensaio.v1.DOI
ensaio.v1.URL
ensaio.v1.ENVIRONMENT_VARIABLE_URL
ensaio.v1.ENVIRONMENT_VARIABLE_CACHE
:toctree: generated/

ensaio.locate
ensaio.fetch_alps_gps
ensaio.fetch_britain_magnetic
ensaio.fetch_british_columbia_lidar
ensaio.fetch_caribbean_bathymetry
ensaio.fetch_earth_geoid
ensaio.fetch_earth_gravity
ensaio.fetch_earth_topography
ensaio.fetch_southern_africa_gravity
30 changes: 16 additions & 14 deletions doc/compatibility.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,20 +22,22 @@ major releases sparingly and with ample warning.**
Source data releases
--------------------

New releases of Ensaio will tend to accompany releases of the source data
collection in the `fatiando/data <https://github.com/fatiando/data>`__
repository.
However, the **version numbers will not necessarily match**.

A major release of the data collection will result in a **new module being
added to Ensaio** (for example, the data release ``2.0.0`` will prompt an
Ensaio release with the ``ensaio.v2`` module added).
The ``1.*.*`` data will still be accessible through the ``ensaio.v1`` module.
The modules for previous releases will not be removed unless absolutely
necessary.

This means that upgrading Ensaio should almost always be safe and documentation
using ``1.*.*`` data should still work after ``2.*.*`` data is released.
New releases of Ensaio will tend to accompany releases of new datasets or new
versions of existing data in the
`Fatiando a Terra Datasets <https://github.com/fatiando-data>`__ collection.

Older versions of each dataset will still remain available (as much as
possible) and can be accessed by setting the ``version`` argument of the
``fetch_*`` functions accordingly.
This means that **upgrading Ensaio should almost always be safe**.
Documentation using version ``1`` of a dataset will still use the same data
(and hopefully produce the same results) after version ``2`` is included in
Ensaio.

.. seealso::

See :ref:`developers` for more tips and tricks.


.. _python-versions:

Expand Down
4 changes: 2 additions & 2 deletions doc/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -79,9 +79,9 @@
# -----------------------------------------------------------------------------
sphinx_gallery_conf = {
# path to your examples scripts
"examples_dirs": ["gallery_src/v1", "tutorial_src"],
"examples_dirs": ["gallery_src", "tutorial_src"],
# path where to save gallery generated examples
"gallery_dirs": ["gallery/v1", "tutorial"],
"gallery_dirs": ["gallery", "tutorial"],
"filename_pattern": r"\.py",
# Remove the "Download all examples" button from the top level gallery
"download_all_examples": False,
Expand Down
17 changes: 17 additions & 0 deletions doc/gallery_src/README.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
.. _gallery:

Available datasets
==================

Use the functions in the :mod:`ensaio` module to download and cache (store)
each dataset on your computer.
See the :ref:`api` for more information about each dataset, the original data
sources, and their licenses.
The datasets are prepared for use in Ensaio in the repositories of the
`Fatiando a Terra Datasets <https://github.com/fatiando-data>`__ GitHub
organization.

.. tip::

Click on the images for examples of fetching, loading, and plotting each
dataset.
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,8 @@
# This code is part of the Fatiando a Terra project (https://www.fatiando.org)
#
"""
Alpine 3-component GPS velocities
---------------------------------
GPS velocities (3-component) for the Alps
-----------------------------------------
This is a compilation of 3D GPS velocities for the Alps. The horizontal
velocities are reference to the Eurasian frame. All velocity components and
Expand All @@ -21,11 +21,11 @@
import pandas as pd
import pygmt

import ensaio.v1 as ensaio
import ensaio

###############################################################################
# Download and cache the data and return the path to it on disk
fname = ensaio.fetch_alps_gps()
fname = ensaio.fetch_alps_gps(version=1)
print(fname)

###############################################################################
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# This code is part of the Fatiando a Terra project (https://www.fatiando.org)
#
"""
Airborne magnetic survey of Britain
Magnetic airborne survey of Britain
-----------------------------------
This is a digitization of an airborne magnetic survey of Britain. Data are
Expand All @@ -26,11 +26,11 @@
import pandas as pd
import pygmt

import ensaio.v1 as ensaio
import ensaio

###############################################################################
# Download and cache the data and return the path to it on disk
fname = ensaio.fetch_britain_magnetic()
fname = ensaio.fetch_britain_magnetic(version=1)
print(fname)

###############################################################################
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,11 +20,11 @@
import pandas as pd
import pygmt

import ensaio.v1 as ensaio
import ensaio

###############################################################################
# Download and cache the data and return the path to it on disk
fname = ensaio.fetch_british_columbia_lidar()
fname = ensaio.fetch_british_columbia_lidar(version=1)
print(fname)

###############################################################################
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,8 @@
# This code is part of the Fatiando a Terra project (https://www.fatiando.org)
#
"""
Single-beam bathymetry of the Caribbean
---------------------------------------
Bathymetry single-beam surveys of the Caribbean
-----------------------------------------------
This dataset is a compilation of several public domain single-beam bathymetry
surveys of the ocean in the Caribbean. The data display a wide range of
Expand All @@ -20,11 +20,11 @@
import pandas as pd
import pygmt

import ensaio.v1 as ensaio
import ensaio

###############################################################################
# Download and cache the data and return the path to it on disk
fname = ensaio.fetch_caribbean_bathymetry()
fname = ensaio.fetch_caribbean_bathymetry(version=1)
print(fname)

###############################################################################
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,8 @@
# This code is part of the Fatiando a Terra project (https://www.fatiando.org)
#
"""
Geoid height of the Earth at 10 arc-minute resolution
-----------------------------------------------------
Earth geoid height grid at 10 arc-minute resolution
---------------------------------------------------
The grid is grid-node registered and stored in netCDF with CF-compliant
metadata. The geoid height is derived from the EIGEN-6C4 spherical harmonic
Expand All @@ -19,11 +19,11 @@
import pygmt
import xarray as xr

import ensaio.v1 as ensaio
import ensaio

###############################################################################
# Download and cache the data and return the path to it on disk.
fname = ensaio.fetch_earth_geoid()
fname = ensaio.fetch_earth_geoid(version=1)
print(fname)

###############################################################################
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,8 @@
# This code is part of the Fatiando a Terra project (https://www.fatiando.org)
#
"""
Gravity of the Earth at 10 arc-minute resolution
------------------------------------------------
Earth gravity grid at 10 arc-minute resolution
----------------------------------------------
The grid is grid-node registered and stored in netCDF with CF-compliant
metadata. The gravity values are derived from the EIGEN-6C4 spherical harmonic
Expand All @@ -22,11 +22,11 @@
import pygmt
import xarray as xr

import ensaio.v1 as ensaio
import ensaio

###############################################################################
# Download and cache the data and return the path to it on disk.
fname = ensaio.fetch_earth_gravity()
fname = ensaio.fetch_earth_gravity(version=1)
print(fname)

###############################################################################
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,8 @@
# This code is part of the Fatiando a Terra project (https://www.fatiando.org)
#
"""
Topography of the Earth at 10 arc-minute resolution
---------------------------------------------------
Earth topography grid at 10 arc-minute resolution
-------------------------------------------------
The grid is grid-node registered and stored in netCDF with CF-compliant
metadata. The values are derived from a spherical harmonic model of the ETOPO1
Expand All @@ -19,11 +19,11 @@
import pygmt
import xarray as xr

import ensaio.v1 as ensaio
import ensaio

###############################################################################
# Download and cache the data and return the path to it on disk.
fname = ensaio.fetch_earth_topography()
fname = ensaio.fetch_earth_topography(version=1)
print(fname)

###############################################################################
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,11 +22,11 @@
import pandas as pd
import pygmt

import ensaio.v1 as ensaio
import ensaio

###############################################################################
# Download and cache the data and return the path to it on disk
fname = ensaio.fetch_southern_africa_gravity()
fname = ensaio.fetch_southern_africa_gravity(version=1)
print(fname)

###############################################################################
Expand Down
Loading

0 comments on commit 87db21f

Please sign in to comment.