Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Development #71

Merged
merged 60 commits into from
Jan 9, 2024
Merged

Development #71

merged 60 commits into from
Jan 9, 2024

Conversation

clnsmth
Copy link
Collaborator

@clnsmth clnsmth commented Jan 9, 2024

No description provided.

Update the outdated dependency files to build the project without error.
Publish a set of functions for posting a local dataset to GBIF and 
maintaining synchronization as the local dataset evolves over time.
Assign GitHub Actions branch merge permissions to ensure that
'development' remains up-to-date with the main branch after 'main'
is tagged during the release process.

Declare Pylint checks in 'pyproject.toml' to ensure synchronization
between local checks and CI pipeline checks.
Reorder the parameter positions of functions in the 'crawl' module to
align function calls more effectively with the underlying semantics.
Address lingering pylint messages to adhere to best practices and clean
up the message log, which has become quite lengthy.
Prior to replacement, verify the presence of a metadata document in
GBIF. This precaution prevents potential errors when attempting to
replace a metadata document that does not currently exist.
Utilize pytest-mock to mock tests that involve remote API calls,
allowing tests to run even when offline. This approach enhances the
ability to thoroughly examine both pass and fail conditions.
Handle HTTP errors gracefully in the 'request_gbif_dataset_uuid'
function to prevent systematic failures. Employ pytest-mock to simulate
both success and failure conditions.
Utilize pytest-mock to simulate both success and failure conditions for
the 'register' function.
Fix the logic in 'get_gbif_dataset_uuid' for determining an empty gbif_dataset_uuid to ensure a new value is requested if it doesn't yet exist.

Additionally, use pytest-mock to simulate both success and failure conditions for this feature.
Handle HTTP errors gracefully in the 'read_local_dataset_metadata'
function to prevent systematic failures.

Employ pytest-mock to simulate both success and failure conditions.
Handle HTTP errors gracefully in the 'has_metadata' function to prevent
systematic failures.

Employ pytest-mock to simulate both success and failure conditions.
Add a dataset that has been synchronized between EDI and GBIF to
'registrations.csv' for testing purposes.
Use the PASTA_ENVIRONMENT variable to ensure consistent alignment of
data package references. Using different environments results in data package
reference mismatches and various errors throughout the application code.
Wrap calls for GBIF dataset details to simplify response handling and
to be DRY when calling from different contexts.
Improve user understanding by renaming 'get_gbif_datatset_details.'
Replace the 'get' prefix with 'read' to clarify the operation as an I/O
operation with possible parsing.
Apply 'read_gbif_dataset_metadata' to functions requiring this
information in their custom implementations to maintain a DRY
codebase.
Report the success or failure of a dataset creation or update operation
to alert users of synchronization issues. Define success and failure by
comparing the publication date of the local dataset EML metadata
and the endpoint of the zip archive download, with that of the remote
GBIF instance.

Move get_local_dataset_endpoint to utilities.py to prevent a circular
reference.
Switch to 'build.os' instead of 'build.image' to address the deprecation
of the 'build.image' config key in Read the Docs. This change is
necessary for successful documentation building.
Utilize 'conftest.py' for sharing test fixtures, currently isolated
within test modules.
Deprecate gbif_endpoint_set_datetime in favor of is_synchronized to
indicate the synchronization status of an EDI dataset with GBIF.

Is related to c9ebad3.
Ignore pylint 'c-extension-no-member' (I1101) messages, originating
from lxml, for the sake of a readable message log.

Another option, adding lxml to the pylint --extension-pkg-allow-list,
may run arbitrary code and is a decision that shouldn't be made for
collaborators running pylint in the context of this project.
- Rename the 'initialize_registrations' function to enhance understanding,
making it clear that it initializes a file.
- Enhance file content descriptions and their mappings to concepts in
the EDI repository for better comprehension.
- Move the function to the 'register.py' module, where it joins similar
code for improved findability.
- Rename the 'register' function to explicitly indicate that it
registers a dataset.
- Move the 'dataset' parameter to the first position for improved
function call readability.
- Rename the 'file_path' parameter to better convey that it represents
registration information as a file for better understanding.
Rename the 'read_registrations' function and the 'file_path' parameter
to indicate that the registrations file is being read, and to follow a
consistent call pattern being implemented throughout the codebase.
Rename the 'complete_registrations' function and update documentation
to reflect that it handles the completion of all components within
registration records, not solely the 'gbif_dataset_uuid'.
Address outdated mentions of the 'read_registrations' function that were
missed in commit 39367cc.
Enhance descriptions and provide examples for the utility functions
'get_local_dataset_group_id,' 'get_local_dataset_endpoint,' and
'get_gbif_dataset_uuid' to facilitate better understanding.
Refactor 'register_dataset' to exclusively handle the registration of a
single dataset, removing the attempt to repair partially registered
datasets resulting from past registration failures. Move the repair
action to 'complete_registration_records'. This separation of
concerns improves code maintainability and usability.
Enforce consistent validation of registration file contents using
extended checks, always. Remove the controlling parameter for this
half-implemented external repository customization feature, which we
have decided not to support.
Update the 'autoapi.extension' to prevent the exception "'Module' object 
has no attribute 'doc'" and to enable successful local and Read the Docs 
documentation builds.

Pin project documentation dependencies to address the deprecation of 
default project dependencies on Read the Docs (see: https://blog.readthedocs.com/newsletter-september-2023/).

Update related project dependencies and resolve associated deprecation 
errors and warnings to maintain a functional code base.
Rename the 'is_synchronized' column to 'synchronized' to clarify its
meaning, shifting from "this dataset is currently synchronized with
GBIF" to "this dataset has in the past been synchronized with GBIF."
Also, updated the 'check_is_synchronized' function to align with this
renaming.
Add a test case to verify that a failed registration does not write a GBIF dataset UUID
in the registrations file, returns 'NA,' and does not raise an exception.
When performing decision logic (boolean operations) based on values
retrieved from the registrations file, ensure that the values are 'NA'
rather than 'None.' This change is necessary to avoid the 'boolean value
of NA is ambiguous' error potentially arising from the recent
implementation at commit
f23fc23, which transitions from
using 'None' to 'NA' values in the registrations file in preparation for
addressing a future deprecation in pandas.
To retrieve the corresponding 'gbif_dataset_uuid' without errors,
utilize the 'local_dataset_group_id' instead of the
'local_dataset_endpoint.' The 'local_dataset_endpoint' does not
reference previously used gbif_dataset_uuid values due to its
one-to-one cardinality.
Modify 'complete_registration_records' to operate on a single record
when directed to do so, rather than always processing all incomplete
registrations.
Executed 'complete_registration_records' on a registrations file
containing two incomplete registrations to verify the functionality of
iterative registration repair under this specific use case.
Test that the 'complete_registration_records' function works for the
scenario of the first dataset to ensure that this situation does not
trigger an error.
Confirm that the 'register_dataset' function reuses the
 'gbif_dataset_uuid' for members sharing the same
'local_dataset_group_id,' enabling updates of the GBIF
dataset instance.
Implement a new function for uploading both new and revised datasets to
GBIF. Build the workflow to handle typical conditions and edge cases.
Additionally, create integration tests for making actual HTTP calls,
extended tests meant for occasional manual execution, and mock HTTP
calls, which are always run and provide faster results.
Remove the '_has_metadata' function as it does not serve a purpose.
Initially, it was designed to determine whether a local dataset group
had a member on GBIF and was used to guide decision logic concerning
resetting dataset endpoints and re-uploading metadata in the event of a
dataset update. However, it became apparent that this function returned
 'True' even if only boilerplate stand-in metadata was posted to GBIF
before the actual metadata was posted during a crawl operation.
Expand abbreviated references to the registrations data frame for
improved clarity and comprehension
- Relocate the configuration file to an external location, removing it from version control to ensure the safety of credentials.
- Introduce a 'write configuration file' helper function, which generates a boilerplate configuration to be completed by the user.
- Create utility functions for loading and unloading the configuration as environmental variables, making them accessible throughout the package.
- Note: The current implementation doesn't fully restore the user's environmental variables to their original state, as any variables with the same names will be overwritten by the load_configuration function and removed by the unload_configuration function. Addressing this issue is a potential improvement for future implementation.
Rename the 'crawl.py' module to 'upload.py' to better reflect its
purpose, which involves the user posting content to GBIF rather
than performing crawling operations.
Update code and test comments for improved clarity and understanding.
Standardize function parameter names and definitions for a consistent public
facing API.
Remove the empty 'test_validate.py' module as testing for these routines
is consolidated in the 'test__utilities.py' module.
Standardize function descriptions for API clarity and consistency.
Add major missing components to the README in preparation for release.
Enhance clarity in the documentation regarding the concept of dataset
synchronization to preempt any potential confusion.
Apply a small fix to ensure consistent subsection formatting throughout the
document.
Add a note to the developer section of the README, advising maintainers
to subscribe to the EDI and GBIF API mailing lists for timely updates on
outages and changes. This ensures they can adjust expectations or the
codebase accordingly.
Add missing examples of public-facing API usage to provide users with
demonstrations of how to use the functions.
Remove developer dependencies from the list of user/production
dependencies to lighten the installation process for users.
Revise installation instructions to recommend using pip from GitHub
rather than conda. While installation from conda is possible, the pip
method is more straightforward.
Emphasize the importance of running the main workflow after creation,
as skipping this step can result in incomplete registration and uploading
of a package to GBIF.
Revise the CONTRIBUTING file to align with the current status of the project.
Insert an empty blank space into the README to ease the creation of a
commit message, via Python Semantic Release, intended solely for bumping
the major version number.

BREAKING CHANGE: This marks the first fully functioning release of the
gbif_registrar package. The APIs of previously released functionality have
been considerably modified.
@codecov-commenter
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

❗ No coverage uploaded for pull request base (main@550491a). Click here to learn what that means.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@           Coverage Diff           @@
##             main      #71   +/-   ##
=======================================
  Coverage        ?   85.00%           
=======================================
  Files           ?        6           
  Lines           ?      260           
  Branches        ?        0           
=======================================
  Hits            ?      221           
  Misses          ?       39           
  Partials        ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@clnsmth clnsmth merged commit 86b3da0 into main Jan 9, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants