- build: remove dev dependencies to lighten package
Remove developer dependencies from the list of user/production
dependencies to lighten the installation process for users. (5496256
)
- ci: allow GHA to bypass protection rules on release
Permit GitHub Actions to bypass repository branch protection rules to
enable Python Semantic Release to commit release-related changes back
to the main branch. (d3c012b
)
- ci: fix failing release workflow
Update the failing release workflow (GitHub Action) to enable Python Semantic Release to automatically create a new release when the development branch is merged into main.
- Add the GITHUB_TOKEN to the semantic release step, where it is necessary for committing changes, using the standard syntax for referencing secrets.
- Include the missing versioning command in the semantic release step to ensure the new version is calculated.
- Correct the outdated syntax for referencing the version number in the
pyproject.toml file to align with the requirements of the current version
of Python Semantic Release. (
49c986f
)
- ci: update release token reference
Update the name of the authentication token used by the CD GitHub
Actions Workflow to reflect the new permissive permissions set on the
default repository token, resolving the CD workflow failure. (dc80841
)
- ci: update failing GitHub Action workflows
Adjust failing GitHub Actions workflows for 'checkout' and
'setup-python' used in the CD workflow. This workflow differs from the
CI workflow in that authentication is required for git commit and
merging operations. (bf30458
)
- ci: ignore pylint c-extension-no-member messages
Ignore pylint 'c-extension-no-member' (I1101) messages, originating from lxml, for the sake of a readable message log.
Another option, adding lxml to the pylint --extension-pkg-allow-list,
may run arbitrary code and is a decision that shouldn't be made for
collaborators running pylint in the context of this project. (3b9cefb
)
- ci: update GitHub Actions
Assign GitHub Actions branch merge permissions to ensure that 'development' remains up-to-date with the main branch after 'main' is tagged during the release process.
Declare Pylint checks in 'pyproject.toml' to ensure synchronization
between local checks and CI pipeline checks. (19e784f
)
- docs: add blank space for release commit
Insert an empty blank space into the README to ease the creation of a commit message, via Python Semantic Release, intended solely for bumping the major version number.
BREAKING CHANGE: This marks the first fully functioning release of the
gbif_registrar package. The APIs of previously released functionality have
been considerably modified. (86b3da0
)
- docs: update CONTRIBUTING for current project status
Revise the CONTRIBUTING file to align with the current status of the project. (71e6380
)
- docs: emphasize running main workflow after creation
Emphasize the importance of running the main workflow after creation,
as skipping this step can result in incomplete registration and uploading
of a package to GBIF. (20d937e
)
- docs: update installation instructions
Revise installation instructions to recommend using pip from GitHub
rather than conda. While installation from conda is possible, the pip
method is more straightforward. (49d3ffa
)
- docs: add examples of public-facing API usage
Add missing examples of public-facing API usage to provide users with
demonstrations of how to use the functions. (85f5364
)
- docs: encourage subscription to API mailing list
Add a note to the developer section of the README, advising maintainers
to subscribe to the EDI and GBIF API mailing lists for timely updates on
outages and changes. This ensures they can adjust expectations or the
codebase accordingly. (d62edb6
)
- docs: correct subsection formatting
Apply a small fix to ensure consistent subsection formatting throughout the
document. (71a837e
)
- docs: clarify dataset synchronization concept
Enhance clarity in the documentation regarding the concept of dataset
synchronization to preempt any potential confusion. (1ba25e7
)
- docs: update README for release
Add major missing components to the README in preparation for release. (e55cccf
)
- docs: standardize descriptions for a consistent API
Standardize function descriptions for API clarity and consistency. (947093b
)
- docs: standardize parameters for consistent API
Standardize function parameter names and definitions for a consistent public
facing API. (92a56ec
)
- docs: comment for clarity and understanding
Update code and test comments for improved clarity and understanding. (88842cb
)
- docs: revise descriptions of a few utilities
Enhance descriptions and provide examples for the utility functions
'get_local_dataset_group_id,' 'get_local_dataset_endpoint,' and
'get_gbif_dataset_uuid' to facilitate better understanding. (eae5990
)
- docs: fix outdated references to read_registrations
Address outdated mentions of the 'read_registrations' function that were
missed in commit 39367cc76f73aaa2159c627be37c0c4a508b4472. (37745d6
)
- docs: address RTD build deprecation
Switch to 'build.os' instead of 'build.image' to address the deprecation
of the 'build.image' config key in Read the Docs. This change is
necessary for successful documentation building. (85e3fde
)
- feat: upload new and revised datasets to GBIF
Implement a new function for uploading both new and revised datasets to
GBIF. Build the workflow to handle typical conditions and edge cases.
Additionally, create integration tests for making actual HTTP calls,
extended tests meant for occasional manual execution, and mock HTTP
calls, which are always run and provide faster results. (53219b6
)
- feat: enable registration repair on demand
Modify 'complete_registration_records' to operate on a single record
when directed to do so, rather than always processing all incomplete
registrations. (e164d13
)
- feat: check synchronization of local dataset w/GBIF
Report the success or failure of a dataset creation or update operation to alert users of synchronization issues. Define success and failure by comparing the publication date of the local dataset EML metadata and the endpoint of the zip archive download, with that of the remote GBIF instance.
Move get_local_dataset_endpoint to utilities.py to prevent a circular
reference. (c9ebad3
)
- feat: wrap get GBIF dataset details for general use
Wrap calls for GBIF dataset details to simplify response handling and
to be DRY when calling from different contexts. (c3ec165
)
- feat: post local datasets to GBIF
Publish a set of functions for posting a local dataset to GBIF and
maintaining synchronization as the local dataset evolves over time. (14336ac
)
- fix: resolve reference to dataset group, not endpoint
To retrieve the corresponding 'gbif_dataset_uuid' without errors,
utilize the 'local_dataset_group_id' instead of the
'local_dataset_endpoint.' The 'local_dataset_endpoint' does not
reference previously used gbif_dataset_uuid values due to its
one-to-one cardinality. (5139a62
)
- fix: update dependencies to resolve doc build failures
Update the 'autoapi.extension' to prevent the exception "'Module' object has no attribute 'doc'" and to enable successful local and Read the Docs documentation builds.
Pin project documentation dependencies to address the deprecation of default project dependencies on Read the Docs (see: https://blog.readthedocs.com/newsletter-september-2023/).
Update related project dependencies and resolve associated deprecation
errors and warnings to maintain a functional code base. (f23fc23
)
- fix: use PASTA environment consistently
Use the PASTA_ENVIRONMENT variable to ensure consistent alignment of
data package references. Using different environments results in data package
reference mismatches and various errors throughout the application code. (2370ab0
)
- fix: use synchronized dataset for testing
Add a dataset that has been synchronized between EDI and GBIF to
'registrations.csv' for testing purposes. (cc30d49
)
- fix: get new uuid if it does not exist
Fix the logic in 'get_gbif_dataset_uuid' for determining an empty gbif_dataset_uuid to ensure a new value is requested if it doesn't yet exist.
Additionally, use pytest-mock to simulate both success and failure conditions for this feature. (1095985
)
- fix: address pylint messages
Address lingering pylint messages to adhere to best practices and clean
up the message log, which has become quite lengthy. (c9ab920
)
- fix: update outdated dependency files
Update the outdated dependency files to build the project without error. (fa4d11c
)
- refactor: rename module for improved descriptiveness
Rename the 'crawl.py' module to 'upload.py' to better reflect its
purpose, which involves the user posting content to GBIF rather
than performing crawling operations. (50205d6
)
- refactor: enhance credential security in config file
- Relocate the configuration file to an external location, removing it from version control to ensure the safety of credentials.
- Introduce a 'write configuration file' helper function, which generates a boilerplate configuration to be completed by the user.
- Create utility functions for loading and unloading the configuration as environmental variables, making them accessible throughout the package.
- Note: The current implementation doesn't fully restore the user's environmental variables to their original state, as any variables with the same names will be overwritten by the load_configuration function and removed by the unload_configuration function. Addressing this issue is a potential improvement for future implementation. (
dfa5e39
)
- refactor: expand abbreviations for clarity
Expand abbreviated references to the registrations data frame for
improved clarity and comprehension (9ea61b1
)
- refactor: eliminate useless '_has_metadata' function
Remove the '_has_metadata' function as it does not serve a purpose.
Initially, it was designed to determine whether a local dataset group
had a member on GBIF and was used to guide decision logic concerning
resetting dataset endpoints and re-uploading metadata in the event of a
dataset update. However, it became apparent that this function returned
'True' even if only boilerplate stand-in metadata was posted to GBIF
before the actual metadata was posted during a crawl operation. (bff89ba
)
- refactor: check for 'NA' instead of 'None'
When performing decision logic (boolean operations) based on values
retrieved from the registrations file, ensure that the values are 'NA'
rather than 'None.' This change is necessary to avoid the 'boolean value
of NA is ambiguous' error potentially arising from the recent
implementation at commit
f23fc2386f3f80a7100c696d8b620392b2ed260f, which transitions from
using 'None' to 'NA' values in the registrations file in preparation for
addressing a future deprecation in pandas. (4b213cf
)
- refactor: clarify definition of 'synchronization'
Rename the 'is_synchronized' column to 'synchronized' to clarify its
meaning, shifting from "this dataset is currently synchronized with
GBIF" to "this dataset has in the past been synchronized with GBIF."
Also, updated the 'check_is_synchronized' function to align with this
renaming. (2cb5392
)
- refactor: internalize utilities for backwards compatibility
Internalize utility functions to reduce the risk of introducing
backward compatibility issues in the public-facing API when
refactoring the codebase. (6a71eec
)
- refactor: default synchronization value to 'False'
Change the default synchronization indicator from 'None' to 'False' to
align the code with example and test usage. (aa6353b
)
- refactor: deprecate extended validation checks
Enforce consistent validation of registration file contents using
extended checks, always. Remove the controlling parameter for this
half-implemented external repository customization feature, which we
have decided not to support. (e5be3d9
)
- refactor: separate concerns of register_dataset
Refactor 'register_dataset' to exclusively handle the registration of a
single dataset, removing the attempt to repair partially registered
datasets resulting from past registration failures. Move the repair
action to 'complete_registration_records'. This separation of
concerns improves code maintainability and usability. (12742f4
)
- refactor: enhance clarity of complete_registrations
Rename the 'complete_registrations' function and update documentation
to reflect that it handles the completion of all components within
registration records, not solely the 'gbif_dataset_uuid'. (6266ef9
)
- refactor: enhance clarity in read_registrations
Rename the 'read_registrations' function and the 'file_path' parameter
to indicate that the registrations file is being read, and to follow a
consistent call pattern being implemented throughout the codebase. (39367cc
)
- refactor: enhance clarity in the register function
- Rename the 'register' function to explicitly indicate that it registers a dataset.
- Move the 'dataset' parameter to the first position for improved function call readability.
- Rename the 'file_path' parameter to better convey that it represents
registration information as a file for better understanding. (
c92b496
)
- refactor: improve clarity in initialize_registrations
- Rename the 'initialize_registrations' function to enhance understanding, making it clear that it initializes a file.
- Enhance file content descriptions and their mappings to concepts in the EDI repository for better comprehension.
- Move the function to the 'register.py' module, where it joins similar
code for improved findability. (
ee98c9e
)
- refactor: deprecate gbif_endpoint_set_datetime
Deprecate gbif_endpoint_set_datetime in favor of is_synchronized to indicate the synchronization status of an EDI dataset with GBIF.
Is related to c9ebad36bcacdc1351e171092f30cae06f13c035. (2c7ea77
)
- refactor: apply read_gbif_dataset_metadata
Apply 'read_gbif_dataset_metadata' to functions requiring this
information in their custom implementations to maintain a DRY
codebase. (c5896e3
)
- refactor: rename get_gbif_datatset_details
Improve user understanding by renaming 'get_gbif_datatset_details.'
Replace the 'get' prefix with 'read' to clarify the operation as an I/O
operation with possible parsing. (b1c4786
)
- refactor: fail 'has_metadata' gracefully
Handle HTTP errors gracefully in the 'has_metadata' function to prevent systematic failures.
Employ pytest-mock to simulate both success and failure conditions. (7381637
)
- refactor: fail 'read_local_dataset_metadata' gracefully
Handle HTTP errors gracefully in the 'read_local_dataset_metadata' function to prevent systematic failures.
Employ pytest-mock to simulate both success and failure conditions. (a0e9515
)
- refactor: fail 'request_gbif_dataset_uuid' gracefully
Handle HTTP errors gracefully in the 'request_gbif_dataset_uuid'
function to prevent systematic failures. Employ pytest-mock to simulate
both success and failure conditions. (f1bca32
)
- refactor: check for metadata before replacing
Prior to replacement, verify the presence of a metadata document in
GBIF. This precaution prevents potential errors when attempting to
replace a metadata document that does not currently exist. (01ddd6c
)
- refactor: reorder func params for better semantics
Reorder the parameter positions of functions in the 'crawl' module to
align function calls more effectively with the underlying semantics. (45f94c6
)
- test: eliminate empty test module
Remove the empty 'test_validate.py' module as testing for these routines
is consolidated in the 'test__utilities.py' module. (4e11f80
)
- test: verify reuse of 'gbif_dataset_uuid' for updates
Confirm that the 'register_dataset' function reuses the
'gbif_dataset_uuid' for members sharing the same
'local_dataset_group_id,' enabling updates of the GBIF
dataset instance. (f7210e1
)
- test: register the first dataset w/o error
Test that the 'complete_registration_records' function works for the
scenario of the first dataset to ensure that this situation does not
trigger an error. (3030cf9
)
- test: validate iterative registration repair
Executed 'complete_registration_records' on a registrations file
containing two incomplete registrations to verify the functionality of
iterative registration repair under this specific use case. (07e2e67
)
- test: include missing test for failed registration
Add a test case to verify that a failed registration does not write a GBIF dataset UUID
in the registrations file, returns 'NA,' and does not raise an exception. (e974ac9
)
- test: share fixtures with conftest.py
Utilize 'conftest.py' for sharing test fixtures, currently isolated
within test modules. (e067bae
)
- test: mock HTTP requests for 'register'
Utilize pytest-mock to simulate both success and failure conditions for
the 'register' function. (dabeec1
)
- test: use pytest-mock to mock tests
Utilize pytest-mock to mock tests that involve remote API calls,
allowing tests to run even when offline. This approach enhances the
ability to thoroughly examine both pass and fail conditions. (4943677
)
- build: use Python 3.9 to fix readthedocs issue
When specifying Python 3.10 in .readthedocs.yml, an error is raised in the build: 'AttributeError module types has no attribute UnionType'. A temporary fix to this issue is to use Python 3.9. While not optimal, using 3.9 allows the current package docs to build, the conda environment to resolve, and the package tests to pass.
This commit specifies Python 3.9 in .readthedocs.yml. (58d0960
)
- ci: format and lint PRs and merges (
cd3b5c4
)
-
docs: credit Margaret for authorship (
05d8e27
) -
docs: update contributing guidelines (
12b94f4
) -
docs: mention Pylint usage (
f81b0ec
) -
docs: reformat project names (
6ad14a8
) -
docs: Update contributing guidelines (
41eb175
)
-
feat: create function to register datasets w/GBIF (
3cd7d14
) -
feat: extend registration checks
Closes #11 (d1b662e
)
- feat: validate registrations file
Closes #7 (2430c58
)
- feat: read registrations file
Closes #8 (95de046
)
- feat: Initialize an empty registrations file
Closes #2 (73cd544
)
-
refactor: format recent changes (
40661c6
) -
refactor: format and lint recent changes (
15d71f4
) -
refactor: rename a few things (
92e6475
)
-
test: fix code formatting in test (
4f70e00
) -
test: add test/example registrations file
Closes #5 (2ae51b2
)
- Ignore sensitive data in config.py
The config.py file is used to store sensitive GBIF API client data and should not be publicly exposed.
This commit adds the config.py file to .gitignore. (9fa6ae1
)