-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Development #71
Merged
Merged
Development #71
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Update the outdated dependency files to build the project without error.
Publish a set of functions for posting a local dataset to GBIF and maintaining synchronization as the local dataset evolves over time.
Assign GitHub Actions branch merge permissions to ensure that 'development' remains up-to-date with the main branch after 'main' is tagged during the release process. Declare Pylint checks in 'pyproject.toml' to ensure synchronization between local checks and CI pipeline checks.
Reorder the parameter positions of functions in the 'crawl' module to align function calls more effectively with the underlying semantics.
Address lingering pylint messages to adhere to best practices and clean up the message log, which has become quite lengthy.
Prior to replacement, verify the presence of a metadata document in GBIF. This precaution prevents potential errors when attempting to replace a metadata document that does not currently exist.
Utilize pytest-mock to mock tests that involve remote API calls, allowing tests to run even when offline. This approach enhances the ability to thoroughly examine both pass and fail conditions.
Handle HTTP errors gracefully in the 'request_gbif_dataset_uuid' function to prevent systematic failures. Employ pytest-mock to simulate both success and failure conditions.
Utilize pytest-mock to simulate both success and failure conditions for the 'register' function.
Fix the logic in 'get_gbif_dataset_uuid' for determining an empty gbif_dataset_uuid to ensure a new value is requested if it doesn't yet exist. Additionally, use pytest-mock to simulate both success and failure conditions for this feature.
Handle HTTP errors gracefully in the 'read_local_dataset_metadata' function to prevent systematic failures. Employ pytest-mock to simulate both success and failure conditions.
Handle HTTP errors gracefully in the 'has_metadata' function to prevent systematic failures. Employ pytest-mock to simulate both success and failure conditions.
Add a dataset that has been synchronized between EDI and GBIF to 'registrations.csv' for testing purposes.
Use the PASTA_ENVIRONMENT variable to ensure consistent alignment of data package references. Using different environments results in data package reference mismatches and various errors throughout the application code.
Wrap calls for GBIF dataset details to simplify response handling and to be DRY when calling from different contexts.
Improve user understanding by renaming 'get_gbif_datatset_details.' Replace the 'get' prefix with 'read' to clarify the operation as an I/O operation with possible parsing.
Apply 'read_gbif_dataset_metadata' to functions requiring this information in their custom implementations to maintain a DRY codebase.
Report the success or failure of a dataset creation or update operation to alert users of synchronization issues. Define success and failure by comparing the publication date of the local dataset EML metadata and the endpoint of the zip archive download, with that of the remote GBIF instance. Move get_local_dataset_endpoint to utilities.py to prevent a circular reference.
Switch to 'build.os' instead of 'build.image' to address the deprecation of the 'build.image' config key in Read the Docs. This change is necessary for successful documentation building.
Utilize 'conftest.py' for sharing test fixtures, currently isolated within test modules.
Deprecate gbif_endpoint_set_datetime in favor of is_synchronized to indicate the synchronization status of an EDI dataset with GBIF. Is related to c9ebad3.
Ignore pylint 'c-extension-no-member' (I1101) messages, originating from lxml, for the sake of a readable message log. Another option, adding lxml to the pylint --extension-pkg-allow-list, may run arbitrary code and is a decision that shouldn't be made for collaborators running pylint in the context of this project.
- Rename the 'initialize_registrations' function to enhance understanding, making it clear that it initializes a file. - Enhance file content descriptions and their mappings to concepts in the EDI repository for better comprehension. - Move the function to the 'register.py' module, where it joins similar code for improved findability.
- Rename the 'register' function to explicitly indicate that it registers a dataset. - Move the 'dataset' parameter to the first position for improved function call readability. - Rename the 'file_path' parameter to better convey that it represents registration information as a file for better understanding.
Rename the 'read_registrations' function and the 'file_path' parameter to indicate that the registrations file is being read, and to follow a consistent call pattern being implemented throughout the codebase.
Rename the 'complete_registrations' function and update documentation to reflect that it handles the completion of all components within registration records, not solely the 'gbif_dataset_uuid'.
Address outdated mentions of the 'read_registrations' function that were missed in commit 39367cc.
Enhance descriptions and provide examples for the utility functions 'get_local_dataset_group_id,' 'get_local_dataset_endpoint,' and 'get_gbif_dataset_uuid' to facilitate better understanding.
Refactor 'register_dataset' to exclusively handle the registration of a single dataset, removing the attempt to repair partially registered datasets resulting from past registration failures. Move the repair action to 'complete_registration_records'. This separation of concerns improves code maintainability and usability.
Enforce consistent validation of registration file contents using extended checks, always. Remove the controlling parameter for this half-implemented external repository customization feature, which we have decided not to support.
Update the 'autoapi.extension' to prevent the exception "'Module' object has no attribute 'doc'" and to enable successful local and Read the Docs documentation builds. Pin project documentation dependencies to address the deprecation of default project dependencies on Read the Docs (see: https://blog.readthedocs.com/newsletter-september-2023/). Update related project dependencies and resolve associated deprecation errors and warnings to maintain a functional code base.
Rename the 'is_synchronized' column to 'synchronized' to clarify its meaning, shifting from "this dataset is currently synchronized with GBIF" to "this dataset has in the past been synchronized with GBIF." Also, updated the 'check_is_synchronized' function to align with this renaming.
Add a test case to verify that a failed registration does not write a GBIF dataset UUID in the registrations file, returns 'NA,' and does not raise an exception.
When performing decision logic (boolean operations) based on values retrieved from the registrations file, ensure that the values are 'NA' rather than 'None.' This change is necessary to avoid the 'boolean value of NA is ambiguous' error potentially arising from the recent implementation at commit f23fc23, which transitions from using 'None' to 'NA' values in the registrations file in preparation for addressing a future deprecation in pandas.
To retrieve the corresponding 'gbif_dataset_uuid' without errors, utilize the 'local_dataset_group_id' instead of the 'local_dataset_endpoint.' The 'local_dataset_endpoint' does not reference previously used gbif_dataset_uuid values due to its one-to-one cardinality.
Modify 'complete_registration_records' to operate on a single record when directed to do so, rather than always processing all incomplete registrations.
Executed 'complete_registration_records' on a registrations file containing two incomplete registrations to verify the functionality of iterative registration repair under this specific use case.
Test that the 'complete_registration_records' function works for the scenario of the first dataset to ensure that this situation does not trigger an error.
Confirm that the 'register_dataset' function reuses the 'gbif_dataset_uuid' for members sharing the same 'local_dataset_group_id,' enabling updates of the GBIF dataset instance.
Implement a new function for uploading both new and revised datasets to GBIF. Build the workflow to handle typical conditions and edge cases. Additionally, create integration tests for making actual HTTP calls, extended tests meant for occasional manual execution, and mock HTTP calls, which are always run and provide faster results.
Remove the '_has_metadata' function as it does not serve a purpose. Initially, it was designed to determine whether a local dataset group had a member on GBIF and was used to guide decision logic concerning resetting dataset endpoints and re-uploading metadata in the event of a dataset update. However, it became apparent that this function returned 'True' even if only boilerplate stand-in metadata was posted to GBIF before the actual metadata was posted during a crawl operation.
Expand abbreviated references to the registrations data frame for improved clarity and comprehension
- Relocate the configuration file to an external location, removing it from version control to ensure the safety of credentials. - Introduce a 'write configuration file' helper function, which generates a boilerplate configuration to be completed by the user. - Create utility functions for loading and unloading the configuration as environmental variables, making them accessible throughout the package. - Note: The current implementation doesn't fully restore the user's environmental variables to their original state, as any variables with the same names will be overwritten by the load_configuration function and removed by the unload_configuration function. Addressing this issue is a potential improvement for future implementation.
Rename the 'crawl.py' module to 'upload.py' to better reflect its purpose, which involves the user posting content to GBIF rather than performing crawling operations.
Update code and test comments for improved clarity and understanding.
Standardize function parameter names and definitions for a consistent public facing API.
Remove the empty 'test_validate.py' module as testing for these routines is consolidated in the 'test__utilities.py' module.
Standardize function descriptions for API clarity and consistency.
Add major missing components to the README in preparation for release.
Enhance clarity in the documentation regarding the concept of dataset synchronization to preempt any potential confusion.
Apply a small fix to ensure consistent subsection formatting throughout the document.
Add a note to the developer section of the README, advising maintainers to subscribe to the EDI and GBIF API mailing lists for timely updates on outages and changes. This ensures they can adjust expectations or the codebase accordingly.
Add missing examples of public-facing API usage to provide users with demonstrations of how to use the functions.
Remove developer dependencies from the list of user/production dependencies to lighten the installation process for users.
Revise installation instructions to recommend using pip from GitHub rather than conda. While installation from conda is possible, the pip method is more straightforward.
Emphasize the importance of running the main workflow after creation, as skipping this step can result in incomplete registration and uploading of a package to GBIF.
Revise the CONTRIBUTING file to align with the current status of the project.
Insert an empty blank space into the README to ease the creation of a commit message, via Python Semantic Release, intended solely for bumping the major version number. BREAKING CHANGE: This marks the first fully functioning release of the gbif_registrar package. The APIs of previously released functionality have been considerably modified.
Codecov ReportAll modified and coverable lines are covered by tests ✅
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #71 +/- ##
=======================================
Coverage ? 85.00%
=======================================
Files ? 6
Lines ? 260
Branches ? 0
=======================================
Hits ? 221
Misses ? 39
Partials ? 0 ☔ View full report in Codecov by Sentry. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.