Skip to content

Latest commit

 

History

History
528 lines (335 loc) · 27.5 KB

CHANGELOG.md

File metadata and controls

528 lines (335 loc) · 27.5 KB

CHANGELOG

v0.2.0 (2024-01-17)

Build

  • build: remove dev dependencies to lighten package

Remove developer dependencies from the list of user/production dependencies to lighten the installation process for users. (5496256)

Ci

  • ci: allow GHA to bypass protection rules on release

Permit GitHub Actions to bypass repository branch protection rules to enable Python Semantic Release to commit release-related changes back to the main branch. (d3c012b)

  • ci: fix failing release workflow

Update the failing release workflow (GitHub Action) to enable Python Semantic Release to automatically create a new release when the development branch is merged into main.

  • Add the GITHUB_TOKEN to the semantic release step, where it is necessary for committing changes, using the standard syntax for referencing secrets.
  • Include the missing versioning command in the semantic release step to ensure the new version is calculated.
  • Correct the outdated syntax for referencing the version number in the pyproject.toml file to align with the requirements of the current version of Python Semantic Release. (49c986f)
  • ci: update release token reference

Update the name of the authentication token used by the CD GitHub Actions Workflow to reflect the new permissive permissions set on the default repository token, resolving the CD workflow failure. (dc80841)

  • ci: update failing GitHub Action workflows

Adjust failing GitHub Actions workflows for 'checkout' and 'setup-python' used in the CD workflow. This workflow differs from the CI workflow in that authentication is required for git commit and merging operations. (bf30458)

  • ci: ignore pylint c-extension-no-member messages

Ignore pylint 'c-extension-no-member' (I1101) messages, originating from lxml, for the sake of a readable message log.

Another option, adding lxml to the pylint --extension-pkg-allow-list, may run arbitrary code and is a decision that shouldn't be made for collaborators running pylint in the context of this project. (3b9cefb)

  • ci: update GitHub Actions

Assign GitHub Actions branch merge permissions to ensure that 'development' remains up-to-date with the main branch after 'main' is tagged during the release process.

Declare Pylint checks in 'pyproject.toml' to ensure synchronization between local checks and CI pipeline checks. (19e784f)

Documentation

  • docs: add blank space for release commit

Insert an empty blank space into the README to ease the creation of a commit message, via Python Semantic Release, intended solely for bumping the major version number.

BREAKING CHANGE: This marks the first fully functioning release of the gbif_registrar package. The APIs of previously released functionality have been considerably modified. (86b3da0)

  • docs: update CONTRIBUTING for current project status

Revise the CONTRIBUTING file to align with the current status of the project. (71e6380)

  • docs: emphasize running main workflow after creation

Emphasize the importance of running the main workflow after creation, as skipping this step can result in incomplete registration and uploading of a package to GBIF. (20d937e)

  • docs: update installation instructions

Revise installation instructions to recommend using pip from GitHub rather than conda. While installation from conda is possible, the pip method is more straightforward. (49d3ffa)

  • docs: add examples of public-facing API usage

Add missing examples of public-facing API usage to provide users with demonstrations of how to use the functions. (85f5364)

  • docs: encourage subscription to API mailing list

Add a note to the developer section of the README, advising maintainers to subscribe to the EDI and GBIF API mailing lists for timely updates on outages and changes. This ensures they can adjust expectations or the codebase accordingly. (d62edb6)

  • docs: correct subsection formatting

Apply a small fix to ensure consistent subsection formatting throughout the document. (71a837e)

  • docs: clarify dataset synchronization concept

Enhance clarity in the documentation regarding the concept of dataset synchronization to preempt any potential confusion. (1ba25e7)

  • docs: update README for release

Add major missing components to the README in preparation for release. (e55cccf)

  • docs: standardize descriptions for a consistent API

Standardize function descriptions for API clarity and consistency. (947093b)

  • docs: standardize parameters for consistent API

Standardize function parameter names and definitions for a consistent public facing API. (92a56ec)

  • docs: comment for clarity and understanding

Update code and test comments for improved clarity and understanding. (88842cb)

  • docs: revise descriptions of a few utilities

Enhance descriptions and provide examples for the utility functions 'get_local_dataset_group_id,' 'get_local_dataset_endpoint,' and 'get_gbif_dataset_uuid' to facilitate better understanding. (eae5990)

  • docs: fix outdated references to read_registrations

Address outdated mentions of the 'read_registrations' function that were missed in commit 39367cc76f73aaa2159c627be37c0c4a508b4472. (37745d6)

  • docs: address RTD build deprecation

Switch to 'build.os' instead of 'build.image' to address the deprecation of the 'build.image' config key in Read the Docs. This change is necessary for successful documentation building. (85e3fde)

Feature

  • feat: upload new and revised datasets to GBIF

Implement a new function for uploading both new and revised datasets to GBIF. Build the workflow to handle typical conditions and edge cases. Additionally, create integration tests for making actual HTTP calls, extended tests meant for occasional manual execution, and mock HTTP calls, which are always run and provide faster results. (53219b6)

  • feat: enable registration repair on demand

Modify 'complete_registration_records' to operate on a single record when directed to do so, rather than always processing all incomplete registrations. (e164d13)

  • feat: check synchronization of local dataset w/GBIF

Report the success or failure of a dataset creation or update operation to alert users of synchronization issues. Define success and failure by comparing the publication date of the local dataset EML metadata and the endpoint of the zip archive download, with that of the remote GBIF instance.

Move get_local_dataset_endpoint to utilities.py to prevent a circular reference. (c9ebad3)

  • feat: wrap get GBIF dataset details for general use

Wrap calls for GBIF dataset details to simplify response handling and to be DRY when calling from different contexts. (c3ec165)

  • feat: post local datasets to GBIF

Publish a set of functions for posting a local dataset to GBIF and maintaining synchronization as the local dataset evolves over time. (14336ac)

Fix

  • fix: resolve reference to dataset group, not endpoint

To retrieve the corresponding 'gbif_dataset_uuid' without errors, utilize the 'local_dataset_group_id' instead of the 'local_dataset_endpoint.' The 'local_dataset_endpoint' does not reference previously used gbif_dataset_uuid values due to its one-to-one cardinality. (5139a62)

  • fix: update dependencies to resolve doc build failures

Update the 'autoapi.extension' to prevent the exception "'Module' object has no attribute 'doc'" and to enable successful local and Read the Docs documentation builds.

Pin project documentation dependencies to address the deprecation of default project dependencies on Read the Docs (see: https://blog.readthedocs.com/newsletter-september-2023/).

Update related project dependencies and resolve associated deprecation errors and warnings to maintain a functional code base. (f23fc23)

  • fix: use PASTA environment consistently

Use the PASTA_ENVIRONMENT variable to ensure consistent alignment of data package references. Using different environments results in data package reference mismatches and various errors throughout the application code. (2370ab0)

  • fix: use synchronized dataset for testing

Add a dataset that has been synchronized between EDI and GBIF to 'registrations.csv' for testing purposes. (cc30d49)

  • fix: get new uuid if it does not exist

Fix the logic in 'get_gbif_dataset_uuid' for determining an empty gbif_dataset_uuid to ensure a new value is requested if it doesn't yet exist.

Additionally, use pytest-mock to simulate both success and failure conditions for this feature. (1095985)

  • fix: address pylint messages

Address lingering pylint messages to adhere to best practices and clean up the message log, which has become quite lengthy. (c9ab920)

  • fix: update outdated dependency files

Update the outdated dependency files to build the project without error. (fa4d11c)

Refactor

  • refactor: rename module for improved descriptiveness

Rename the 'crawl.py' module to 'upload.py' to better reflect its purpose, which involves the user posting content to GBIF rather than performing crawling operations. (50205d6)

  • refactor: enhance credential security in config file
  • Relocate the configuration file to an external location, removing it from version control to ensure the safety of credentials.
  • Introduce a 'write configuration file' helper function, which generates a boilerplate configuration to be completed by the user.
  • Create utility functions for loading and unloading the configuration as environmental variables, making them accessible throughout the package.
  • Note: The current implementation doesn't fully restore the user's environmental variables to their original state, as any variables with the same names will be overwritten by the load_configuration function and removed by the unload_configuration function. Addressing this issue is a potential improvement for future implementation. (dfa5e39)
  • refactor: expand abbreviations for clarity

Expand abbreviated references to the registrations data frame for improved clarity and comprehension (9ea61b1)

  • refactor: eliminate useless '_has_metadata' function

Remove the '_has_metadata' function as it does not serve a purpose. Initially, it was designed to determine whether a local dataset group had a member on GBIF and was used to guide decision logic concerning resetting dataset endpoints and re-uploading metadata in the event of a dataset update. However, it became apparent that this function returned 'True' even if only boilerplate stand-in metadata was posted to GBIF before the actual metadata was posted during a crawl operation. (bff89ba)

  • refactor: check for 'NA' instead of 'None'

When performing decision logic (boolean operations) based on values retrieved from the registrations file, ensure that the values are 'NA' rather than 'None.' This change is necessary to avoid the 'boolean value of NA is ambiguous' error potentially arising from the recent implementation at commit f23fc2386f3f80a7100c696d8b620392b2ed260f, which transitions from using 'None' to 'NA' values in the registrations file in preparation for addressing a future deprecation in pandas. (4b213cf)

  • refactor: clarify definition of 'synchronization'

Rename the 'is_synchronized' column to 'synchronized' to clarify its meaning, shifting from "this dataset is currently synchronized with GBIF" to "this dataset has in the past been synchronized with GBIF." Also, updated the 'check_is_synchronized' function to align with this renaming. (2cb5392)

  • refactor: internalize utilities for backwards compatibility

Internalize utility functions to reduce the risk of introducing backward compatibility issues in the public-facing API when refactoring the codebase. (6a71eec)

  • refactor: default synchronization value to 'False'

Change the default synchronization indicator from 'None' to 'False' to align the code with example and test usage. (aa6353b)

  • refactor: deprecate extended validation checks

Enforce consistent validation of registration file contents using extended checks, always. Remove the controlling parameter for this half-implemented external repository customization feature, which we have decided not to support. (e5be3d9)

  • refactor: separate concerns of register_dataset

Refactor 'register_dataset' to exclusively handle the registration of a single dataset, removing the attempt to repair partially registered datasets resulting from past registration failures. Move the repair action to 'complete_registration_records'. This separation of concerns improves code maintainability and usability. (12742f4)

  • refactor: enhance clarity of complete_registrations

Rename the 'complete_registrations' function and update documentation to reflect that it handles the completion of all components within registration records, not solely the 'gbif_dataset_uuid'. (6266ef9)

  • refactor: enhance clarity in read_registrations

Rename the 'read_registrations' function and the 'file_path' parameter to indicate that the registrations file is being read, and to follow a consistent call pattern being implemented throughout the codebase. (39367cc)

  • refactor: enhance clarity in the register function
  • Rename the 'register' function to explicitly indicate that it registers a dataset.
  • Move the 'dataset' parameter to the first position for improved function call readability.
  • Rename the 'file_path' parameter to better convey that it represents registration information as a file for better understanding. (c92b496)
  • refactor: improve clarity in initialize_registrations
  • Rename the 'initialize_registrations' function to enhance understanding, making it clear that it initializes a file.
  • Enhance file content descriptions and their mappings to concepts in the EDI repository for better comprehension.
  • Move the function to the 'register.py' module, where it joins similar code for improved findability. (ee98c9e)
  • refactor: deprecate gbif_endpoint_set_datetime

Deprecate gbif_endpoint_set_datetime in favor of is_synchronized to indicate the synchronization status of an EDI dataset with GBIF.

Is related to c9ebad36bcacdc1351e171092f30cae06f13c035. (2c7ea77)

  • refactor: apply read_gbif_dataset_metadata

Apply 'read_gbif_dataset_metadata' to functions requiring this information in their custom implementations to maintain a DRY codebase. (c5896e3)

  • refactor: rename get_gbif_datatset_details

Improve user understanding by renaming 'get_gbif_datatset_details.' Replace the 'get' prefix with 'read' to clarify the operation as an I/O operation with possible parsing. (b1c4786)

  • refactor: fail 'has_metadata' gracefully

Handle HTTP errors gracefully in the 'has_metadata' function to prevent systematic failures.

Employ pytest-mock to simulate both success and failure conditions. (7381637)

  • refactor: fail 'read_local_dataset_metadata' gracefully

Handle HTTP errors gracefully in the 'read_local_dataset_metadata' function to prevent systematic failures.

Employ pytest-mock to simulate both success and failure conditions. (a0e9515)

  • refactor: fail 'request_gbif_dataset_uuid' gracefully

Handle HTTP errors gracefully in the 'request_gbif_dataset_uuid' function to prevent systematic failures. Employ pytest-mock to simulate both success and failure conditions. (f1bca32)

  • refactor: check for metadata before replacing

Prior to replacement, verify the presence of a metadata document in GBIF. This precaution prevents potential errors when attempting to replace a metadata document that does not currently exist. (01ddd6c)

  • refactor: reorder func params for better semantics

Reorder the parameter positions of functions in the 'crawl' module to align function calls more effectively with the underlying semantics. (45f94c6)

Test

  • test: eliminate empty test module

Remove the empty 'test_validate.py' module as testing for these routines is consolidated in the 'test__utilities.py' module. (4e11f80)

  • test: verify reuse of 'gbif_dataset_uuid' for updates

Confirm that the 'register_dataset' function reuses the 'gbif_dataset_uuid' for members sharing the same 'local_dataset_group_id,' enabling updates of the GBIF dataset instance. (f7210e1)

  • test: register the first dataset w/o error

Test that the 'complete_registration_records' function works for the scenario of the first dataset to ensure that this situation does not trigger an error. (3030cf9)

  • test: validate iterative registration repair

Executed 'complete_registration_records' on a registrations file containing two incomplete registrations to verify the functionality of iterative registration repair under this specific use case. (07e2e67)

  • test: include missing test for failed registration

Add a test case to verify that a failed registration does not write a GBIF dataset UUID in the registrations file, returns 'NA,' and does not raise an exception. (e974ac9)

  • test: share fixtures with conftest.py

Utilize 'conftest.py' for sharing test fixtures, currently isolated within test modules. (e067bae)

  • test: mock HTTP requests for 'register'

Utilize pytest-mock to simulate both success and failure conditions for the 'register' function. (dabeec1)

  • test: use pytest-mock to mock tests

Utilize pytest-mock to mock tests that involve remote API calls, allowing tests to run even when offline. This approach enhances the ability to thoroughly examine both pass and fail conditions. (4943677)

v0.1.1 (2023-06-22)

Build

  • build: use Python 3.9 to fix readthedocs issue

When specifying Python 3.10 in .readthedocs.yml, an error is raised in the build: 'AttributeError module types has no attribute UnionType'. A temporary fix to this issue is to use Python 3.9. While not optimal, using 3.9 allows the current package docs to build, the conda environment to resolve, and the package tests to pass.

This commit specifies Python 3.9 in .readthedocs.yml. (58d0960)

v0.1.0 (2023-06-22)

Build

  • build: update environment and requirements files (62cda52)

  • build: update poetry lock (a5471ba)

Ci

  • ci: format and lint PRs and merges (cd3b5c4)

Documentation

  • docs: credit Margaret for authorship (05d8e27)

  • docs: update contributing guidelines (12b94f4)

  • docs: mention Pylint usage (f81b0ec)

  • docs: reformat project names (6ad14a8)

  • docs: Update contributing guidelines (41eb175)

Feature

  • feat: create function to register datasets w/GBIF (3cd7d14)

  • feat: extend registration checks

Closes #11 (d1b662e)

  • feat: validate registrations file

Closes #7 (2430c58)

  • feat: read registrations file

Closes #8 (95de046)

  • feat: Initialize an empty registrations file

Closes #2 (73cd544)

Refactor

  • refactor: format recent changes (40661c6)

  • refactor: format and lint recent changes (15d71f4)

  • refactor: rename a few things (92e6475)

Test

  • test: fix code formatting in test (4f70e00)

  • test: add test/example registrations file

Closes #5 (2ae51b2)

Unknown

  • Ignore sensitive data in config.py

The config.py file is used to store sensitive GBIF API client data and should not be publicly exposed.

This commit adds the config.py file to .gitignore. (9fa6ae1)

  • Run CI-CD actions on PR to main/dev branches (82a082a)

  • Note commit format requirements (39e1854)

  • First commit (487cf6a)