Development #71

clnsmth · 2024-01-09T15:57:21Z

No description provided.

Update the outdated dependency files to build the project without error.

Publish a set of functions for posting a local dataset to GBIF and maintaining synchronization as the local dataset evolves over time.

Assign GitHub Actions branch merge permissions to ensure that 'development' remains up-to-date with the main branch after 'main' is tagged during the release process. Declare Pylint checks in 'pyproject.toml' to ensure synchronization between local checks and CI pipeline checks.

Reorder the parameter positions of functions in the 'crawl' module to align function calls more effectively with the underlying semantics.

Address lingering pylint messages to adhere to best practices and clean up the message log, which has become quite lengthy.

Prior to replacement, verify the presence of a metadata document in GBIF. This precaution prevents potential errors when attempting to replace a metadata document that does not currently exist.

Utilize pytest-mock to mock tests that involve remote API calls, allowing tests to run even when offline. This approach enhances the ability to thoroughly examine both pass and fail conditions.

Handle HTTP errors gracefully in the 'request_gbif_dataset_uuid' function to prevent systematic failures. Employ pytest-mock to simulate both success and failure conditions.

Utilize pytest-mock to simulate both success and failure conditions for the 'register' function.

Fix the logic in 'get_gbif_dataset_uuid' for determining an empty gbif_dataset_uuid to ensure a new value is requested if it doesn't yet exist. Additionally, use pytest-mock to simulate both success and failure conditions for this feature.

Handle HTTP errors gracefully in the 'read_local_dataset_metadata' function to prevent systematic failures. Employ pytest-mock to simulate both success and failure conditions.

Handle HTTP errors gracefully in the 'has_metadata' function to prevent systematic failures. Employ pytest-mock to simulate both success and failure conditions.

Add a dataset that has been synchronized between EDI and GBIF to 'registrations.csv' for testing purposes.

Use the PASTA_ENVIRONMENT variable to ensure consistent alignment of data package references. Using different environments results in data package reference mismatches and various errors throughout the application code.

Wrap calls for GBIF dataset details to simplify response handling and to be DRY when calling from different contexts.

Improve user understanding by renaming 'get_gbif_datatset_details.' Replace the 'get' prefix with 'read' to clarify the operation as an I/O operation with possible parsing.

Apply 'read_gbif_dataset_metadata' to functions requiring this information in their custom implementations to maintain a DRY codebase.

Report the success or failure of a dataset creation or update operation to alert users of synchronization issues. Define success and failure by comparing the publication date of the local dataset EML metadata and the endpoint of the zip archive download, with that of the remote GBIF instance. Move get_local_dataset_endpoint to utilities.py to prevent a circular reference.

Switch to 'build.os' instead of 'build.image' to address the deprecation of the 'build.image' config key in Read the Docs. This change is necessary for successful documentation building.

Utilize 'conftest.py' for sharing test fixtures, currently isolated within test modules.

Deprecate gbif_endpoint_set_datetime in favor of is_synchronized to indicate the synchronization status of an EDI dataset with GBIF. Is related to c9ebad3.

Ignore pylint 'c-extension-no-member' (I1101) messages, originating from lxml, for the sake of a readable message log. Another option, adding lxml to the pylint --extension-pkg-allow-list, may run arbitrary code and is a decision that shouldn't be made for collaborators running pylint in the context of this project.

- Rename the 'initialize_registrations' function to enhance understanding, making it clear that it initializes a file. - Enhance file content descriptions and their mappings to concepts in the EDI repository for better comprehension. - Move the function to the 'register.py' module, where it joins similar code for improved findability.

- Rename the 'register' function to explicitly indicate that it registers a dataset. - Move the 'dataset' parameter to the first position for improved function call readability. - Rename the 'file_path' parameter to better convey that it represents registration information as a file for better understanding.

Rename the 'read_registrations' function and the 'file_path' parameter to indicate that the registrations file is being read, and to follow a consistent call pattern being implemented throughout the codebase.

Rename the 'complete_registrations' function and update documentation to reflect that it handles the completion of all components within registration records, not solely the 'gbif_dataset_uuid'.

Address outdated mentions of the 'read_registrations' function that were missed in commit 39367cc.

Enhance descriptions and provide examples for the utility functions 'get_local_dataset_group_id,' 'get_local_dataset_endpoint,' and 'get_gbif_dataset_uuid' to facilitate better understanding.

Refactor 'register_dataset' to exclusively handle the registration of a single dataset, removing the attempt to repair partially registered datasets resulting from past registration failures. Move the repair action to 'complete_registration_records'. This separation of concerns improves code maintainability and usability.

Enforce consistent validation of registration file contents using extended checks, always. Remove the controlling parameter for this half-implemented external repository customization feature, which we have decided not to support.

Update the 'autoapi.extension' to prevent the exception "'Module' object has no attribute 'doc'" and to enable successful local and Read the Docs documentation builds. Pin project documentation dependencies to address the deprecation of default project dependencies on Read the Docs (see: https://blog.readthedocs.com/newsletter-september-2023/). Update related project dependencies and resolve associated deprecation errors and warnings to maintain a functional code base.

Rename the 'is_synchronized' column to 'synchronized' to clarify its meaning, shifting from "this dataset is currently synchronized with GBIF" to "this dataset has in the past been synchronized with GBIF." Also, updated the 'check_is_synchronized' function to align with this renaming.

Add a test case to verify that a failed registration does not write a GBIF dataset UUID in the registrations file, returns 'NA,' and does not raise an exception.

When performing decision logic (boolean operations) based on values retrieved from the registrations file, ensure that the values are 'NA' rather than 'None.' This change is necessary to avoid the 'boolean value of NA is ambiguous' error potentially arising from the recent implementation at commit f23fc23, which transitions from using 'None' to 'NA' values in the registrations file in preparation for addressing a future deprecation in pandas.

To retrieve the corresponding 'gbif_dataset_uuid' without errors, utilize the 'local_dataset_group_id' instead of the 'local_dataset_endpoint.' The 'local_dataset_endpoint' does not reference previously used gbif_dataset_uuid values due to its one-to-one cardinality.

Modify 'complete_registration_records' to operate on a single record when directed to do so, rather than always processing all incomplete registrations.

Executed 'complete_registration_records' on a registrations file containing two incomplete registrations to verify the functionality of iterative registration repair under this specific use case.

Test that the 'complete_registration_records' function works for the scenario of the first dataset to ensure that this situation does not trigger an error.

Confirm that the 'register_dataset' function reuses the 'gbif_dataset_uuid' for members sharing the same 'local_dataset_group_id,' enabling updates of the GBIF dataset instance.

Implement a new function for uploading both new and revised datasets to GBIF. Build the workflow to handle typical conditions and edge cases. Additionally, create integration tests for making actual HTTP calls, extended tests meant for occasional manual execution, and mock HTTP calls, which are always run and provide faster results.

Remove the '_has_metadata' function as it does not serve a purpose. Initially, it was designed to determine whether a local dataset group had a member on GBIF and was used to guide decision logic concerning resetting dataset endpoints and re-uploading metadata in the event of a dataset update. However, it became apparent that this function returned 'True' even if only boilerplate stand-in metadata was posted to GBIF before the actual metadata was posted during a crawl operation.

Expand abbreviated references to the registrations data frame for improved clarity and comprehension

- Relocate the configuration file to an external location, removing it from version control to ensure the safety of credentials. - Introduce a 'write configuration file' helper function, which generates a boilerplate configuration to be completed by the user. - Create utility functions for loading and unloading the configuration as environmental variables, making them accessible throughout the package. - Note: The current implementation doesn't fully restore the user's environmental variables to their original state, as any variables with the same names will be overwritten by the load_configuration function and removed by the unload_configuration function. Addressing this issue is a potential improvement for future implementation.

Rename the 'crawl.py' module to 'upload.py' to better reflect its purpose, which involves the user posting content to GBIF rather than performing crawling operations.

Update code and test comments for improved clarity and understanding.

Standardize function parameter names and definitions for a consistent public facing API.

Remove the empty 'test_validate.py' module as testing for these routines is consolidated in the 'test__utilities.py' module.

Standardize function descriptions for API clarity and consistency.

Add major missing components to the README in preparation for release.

Enhance clarity in the documentation regarding the concept of dataset synchronization to preempt any potential confusion.

Apply a small fix to ensure consistent subsection formatting throughout the document.

Add a note to the developer section of the README, advising maintainers to subscribe to the EDI and GBIF API mailing lists for timely updates on outages and changes. This ensures they can adjust expectations or the codebase accordingly.

Add missing examples of public-facing API usage to provide users with demonstrations of how to use the functions.

Remove developer dependencies from the list of user/production dependencies to lighten the installation process for users.

Revise installation instructions to recommend using pip from GitHub rather than conda. While installation from conda is possible, the pip method is more straightforward.

Emphasize the importance of running the main workflow after creation, as skipping this step can result in incomplete registration and uploading of a package to GBIF.

Revise the CONTRIBUTING file to align with the current status of the project.

Insert an empty blank space into the README to ease the creation of a commit message, via Python Semantic Release, intended solely for bumping the major version number. BREAKING CHANGE: This marks the first fully functioning release of the gbif_registrar package. The APIs of previously released functionality have been considerably modified.

codecov-commenter · 2024-01-09T17:17:05Z

Codecov Report

All modified and coverable lines are covered by tests ✅

❗ No coverage uploaded for pull request base (main@550491a). Click here to learn what that means.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@           Coverage Diff           @@
##             main      #71   +/-   ##
=======================================
  Coverage        ?   85.00%           
=======================================
  Files           ?        6           
  Lines           ?      260           
  Branches        ?        0           
=======================================
  Hits            ?      221           
  Misses          ?       39           
  Partials        ?        0

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

clnsmth added 30 commits September 6, 2023 09:45

fix: update outdated dependency files

fa4d11c

Update the outdated dependency files to build the project without error.

feat: post local datasets to GBIF

14336ac

Publish a set of functions for posting a local dataset to GBIF and maintaining synchronization as the local dataset evolves over time.

refactor: reorder func params for better semantics

45f94c6

Reorder the parameter positions of functions in the 'crawl' module to align function calls more effectively with the underlying semantics.

fix: address pylint messages

c9ab920

Address lingering pylint messages to adhere to best practices and clean up the message log, which has become quite lengthy.

refactor: check for metadata before replacing

01ddd6c

Prior to replacement, verify the presence of a metadata document in GBIF. This precaution prevents potential errors when attempting to replace a metadata document that does not currently exist.

test: use pytest-mock to mock tests

4943677

Utilize pytest-mock to mock tests that involve remote API calls, allowing tests to run even when offline. This approach enhances the ability to thoroughly examine both pass and fail conditions.

refactor: fail 'request_gbif_dataset_uuid' gracefully

f1bca32

Handle HTTP errors gracefully in the 'request_gbif_dataset_uuid' function to prevent systematic failures. Employ pytest-mock to simulate both success and failure conditions.

test: mock HTTP requests for 'register'

dabeec1

Utilize pytest-mock to simulate both success and failure conditions for the 'register' function.

fix: get new uuid if it does not exist

1095985

Fix the logic in 'get_gbif_dataset_uuid' for determining an empty gbif_dataset_uuid to ensure a new value is requested if it doesn't yet exist. Additionally, use pytest-mock to simulate both success and failure conditions for this feature.

refactor: fail 'read_local_dataset_metadata' gracefully

a0e9515

Handle HTTP errors gracefully in the 'read_local_dataset_metadata' function to prevent systematic failures. Employ pytest-mock to simulate both success and failure conditions.

refactor: fail 'has_metadata' gracefully

7381637

Handle HTTP errors gracefully in the 'has_metadata' function to prevent systematic failures. Employ pytest-mock to simulate both success and failure conditions.

fix: use synchronized dataset for testing

cc30d49

Add a dataset that has been synchronized between EDI and GBIF to 'registrations.csv' for testing purposes.

fix: use PASTA environment consistently

2370ab0

Use the PASTA_ENVIRONMENT variable to ensure consistent alignment of data package references. Using different environments results in data package reference mismatches and various errors throughout the application code.

feat: wrap get GBIF dataset details for general use

c3ec165

Wrap calls for GBIF dataset details to simplify response handling and to be DRY when calling from different contexts.

refactor: rename get_gbif_datatset_details

b1c4786

Improve user understanding by renaming 'get_gbif_datatset_details.' Replace the 'get' prefix with 'read' to clarify the operation as an I/O operation with possible parsing.

refactor: apply read_gbif_dataset_metadata

c5896e3

Apply 'read_gbif_dataset_metadata' to functions requiring this information in their custom implementations to maintain a DRY codebase.

docs: address RTD build deprecation

85e3fde

Switch to 'build.os' instead of 'build.image' to address the deprecation of the 'build.image' config key in Read the Docs. This change is necessary for successful documentation building.

test: share fixtures with conftest.py

e067bae

Utilize 'conftest.py' for sharing test fixtures, currently isolated within test modules.

refactor: deprecate gbif_endpoint_set_datetime

2c7ea77

Deprecate gbif_endpoint_set_datetime in favor of is_synchronized to indicate the synchronization status of an EDI dataset with GBIF. Is related to c9ebad3.

refactor: enhance clarity in read_registrations

39367cc

Rename the 'read_registrations' function and the 'file_path' parameter to indicate that the registrations file is being read, and to follow a consistent call pattern being implemented throughout the codebase.

refactor: enhance clarity of complete_registrations

6266ef9

Rename the 'complete_registrations' function and update documentation to reflect that it handles the completion of all components within registration records, not solely the 'gbif_dataset_uuid'.

docs: fix outdated references to read_registrations

37745d6

Address outdated mentions of the 'read_registrations' function that were missed in commit 39367cc.

docs: revise descriptions of a few utilities

eae5990

Enhance descriptions and provide examples for the utility functions 'get_local_dataset_group_id,' 'get_local_dataset_endpoint,' and 'get_gbif_dataset_uuid' to facilitate better understanding.

refactor: deprecate extended validation checks

e5be3d9

Enforce consistent validation of registration file contents using extended checks, always. Remove the controlling parameter for this half-implemented external repository customization feature, which we have decided not to support.

clnsmth added 28 commits October 11, 2023 12:48

test: include missing test for failed registration

e974ac9

Add a test case to verify that a failed registration does not write a GBIF dataset UUID in the registrations file, returns 'NA,' and does not raise an exception.

feat: enable registration repair on demand

e164d13

Modify 'complete_registration_records' to operate on a single record when directed to do so, rather than always processing all incomplete registrations.

test: validate iterative registration repair

07e2e67

Executed 'complete_registration_records' on a registrations file containing two incomplete registrations to verify the functionality of iterative registration repair under this specific use case.

test: register the first dataset w/o error

3030cf9

Test that the 'complete_registration_records' function works for the scenario of the first dataset to ensure that this situation does not trigger an error.

test: verify reuse of 'gbif_dataset_uuid' for updates

f7210e1

Confirm that the 'register_dataset' function reuses the 'gbif_dataset_uuid' for members sharing the same 'local_dataset_group_id,' enabling updates of the GBIF dataset instance.

refactor: expand abbreviations for clarity

9ea61b1

Expand abbreviated references to the registrations data frame for improved clarity and comprehension

refactor: rename module for improved descriptiveness

50205d6

Rename the 'crawl.py' module to 'upload.py' to better reflect its purpose, which involves the user posting content to GBIF rather than performing crawling operations.

docs: comment for clarity and understanding

88842cb

Update code and test comments for improved clarity and understanding.

docs: standardize parameters for consistent API

92a56ec

Standardize function parameter names and definitions for a consistent public facing API.

test: eliminate empty test module

4e11f80

Remove the empty 'test_validate.py' module as testing for these routines is consolidated in the 'test__utilities.py' module.

docs: standardize descriptions for a consistent API

947093b

Standardize function descriptions for API clarity and consistency.

docs: update README for release

e55cccf

Add major missing components to the README in preparation for release.

docs: clarify dataset synchronization concept

1ba25e7

Enhance clarity in the documentation regarding the concept of dataset synchronization to preempt any potential confusion.

docs: correct subsection formatting

71a837e

Apply a small fix to ensure consistent subsection formatting throughout the document.

docs: encourage subscription to API mailing list

d62edb6

Add a note to the developer section of the README, advising maintainers to subscribe to the EDI and GBIF API mailing lists for timely updates on outages and changes. This ensures they can adjust expectations or the codebase accordingly.

docs: add examples of public-facing API usage

85f5364

Add missing examples of public-facing API usage to provide users with demonstrations of how to use the functions.

build: remove dev dependencies to lighten package

5496256

Remove developer dependencies from the list of user/production dependencies to lighten the installation process for users.

docs: update installation instructions

49d3ffa

Revise installation instructions to recommend using pip from GitHub rather than conda. While installation from conda is possible, the pip method is more straightforward.

docs: emphasize running main workflow after creation

20d937e

Emphasize the importance of running the main workflow after creation, as skipping this step can result in incomplete registration and uploading of a package to GBIF.

docs: update CONTRIBUTING for current project status

71e6380

Revise the CONTRIBUTING file to align with the current status of the project.

clnsmth merged commit 86b3da0 into main Jan 9, 2024
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Development #71

Development #71

clnsmth commented Jan 9, 2024

codecov-commenter commented Jan 9, 2024

Development #71

Development #71

Conversation

clnsmth commented Jan 9, 2024

codecov-commenter commented Jan 9, 2024

Codecov Report