diff --git a/src/README.md b/src/README.md deleted file mode 100644 index 4368f67..0000000 --- a/src/README.md +++ /dev/null @@ -1,167 +0,0 @@ -# provenaclient - -A client library for interfacing with a Provena instance. - -## Usage - -### How to get started with Poetry - -1) Run the Command: `curl -sSL https://install.python-poetry.org | python3 -`. -2) Check if poetry was successfully installed by running `poetry --version`. -3) You should now have be able to see your poetry version successfully. -4) Now run the command `poetry shell`. This will activate the poetry virtual environment for you. -5) Now finally run the command `poetry install`. This will install all dependencies defined within pyproject.toml. - -### Using local .venv - -1) Set poetry to use local `.venv` with `poetry config virtualenvs.in-project true` -2) List and remove any existing venvs with `poetry env list` and `poetry env remove ` -3) Install `poetry install` - -You can then make vs code use this environment easily with `ctrl + shift + p` select python interpreter choosing `.venv/bin/python`. - -### My Poetry Installation is not being detected? -1) This means that your PATH variable does not include the Poetry directory. -2) Open your .bashrc file, using the command `nano ~/.bashrc`. -3) Add the following line at the end of the file, `export PATH= "$HOME/.local/bin:$PATH"`. -4) Reload your .bashrc file, using the command `source ~/.bashrc`. -5) Verify that your poetry is now running, using the command `poetry --version`. - - -# Provena Client CI/CD and Release Process - -## Overview - -The Provena client uses GitHub Actions for CI/CD, producing automated deployments to our PyPI account. - -## Continuous Integration (CI) - -**Executed:** Creation of pull requests and during merge of pull requests to the main branch. - -**Triggers:** Push and pull requests to the main branch. - -**Steps:** -1. Set up Python environment. -2. Check out the repository. -3. Install dependencies with Poetry. -4. Run type checks with Mypy. -5. Run tests with Pytest (unit tests, integration tests). -6. Track coverage with Codecov. -7. Build documentation. - -## Continuous Deployment (CD) - -**Executed:** Merge of pull requests to the main branch. - -**Triggers:** Push to the main branch and merged pull requests to the main branch. - -**Steps:** -1. Set up Python environment. -2. Check out repository. -3. Use `python-semantic-release` to prepare release. -4. Publish to TestPyPI and PyPI. -5. Test install from TestPyPI. -6. Upload distributions to GitHub Releases. - -## Semantic Versioning and Release Automation - -The Provena Client uses `python-semantic-release` for automated versioning and releases. - -### Configuration in `pyproject.toml` `[tool.semantic_release]` - -- **Version Management:** Package versions are managed through `pyproject.toml` and `src/provenaclient/__init__.py`. -- **Release Branch:** Releases of the Provena Client are made from the main branch only. -- **Changelog:** Release changelog and commit documentation/history is maintained in `CHANGELOG.md`. -- **Upload to PyPI and GitHub Releases:** Set to true. -- **Automatic Version Commit:** `commit_version_number = true` ensures that the version number is automatically committed back to the repository after a release, keeping the `pyproject.toml` and `src/provenaclient/__init__.py` files up to date. - -### Commit Message Conventions - -Follow the Conventional Commits specification. Examples: - -- **feat:** A new feature. -- **fix:** A bug fix. -- **chore:** Changes to the build process or auxiliary tools and libraries. -- **docs:** Documentation only changes. -- **style:** Changes that do not affect the meaning of the code (white-space, formatting, missing semi-colons, etc). -- **refactor:** A code change that neither fixes a bug nor adds a feature. -- **perf:** A code change that improves performance. -- **test:** Adding missing tests or correcting existing tests. - -**Note:** Commits with types of `docs`, `chore`, `style`, `refactor`, and `test` will not trigger a version change. - -More information can be found here: [Conventional Commits Specification](https://www.conventionalcommits.org/en/v1.0.0/). - -## Release Process - -On merging to the main branch, `python-semantic-release` automates the following steps: - -### Bumps the Version - -Based on commit messages, the version is incremented following semantic versioning rules: - -- **feat:** Increments the MINOR version. -- **fix:** Increments the PATCH version. -- **BREAKING CHANGE:** Increments the MAJOR version. - -### Creates a New Tag - -A new Git tag is created for the version. - -### Publishes to PyPI - -The new version is published to PyPI. - -### Uploads to GitHub Releases - -The distribution files are uploaded to GitHub Releases. - -### Commits the New Version Number - -The new version number is committed back to the repository, ensuring the `pyproject.toml` and `src/provenaclient/__init__.py` files are up-to-date. - -## Best Practices for Adding New Features or Making Changes - -### Create a New Branch - -Name your branch descriptively, e.g., `-`. - -### Develop and Commit - -- Make changes in your branch. -- Use meaningful commit messages following the Conventional Commits specification. - -### Open a Pull Request (PR) - -- When your feature is complete, open a PR to the main branch. -- Ensure that your PR title adheres to the Conventional Commits specification. -- Ensure your PR description is clear and outlines the changes made. -- Ensure that the CI (Continuous Integration) has successfully passed for your latest commit. - -### Review and Squash Merge - -- Request reviews from at least one team member within the Provena organization and part of the client library development. -- Once approved, squash merge the PR into main with a specific commit message that summarizes the changes, e.g., `feat: added new endpoint in job-api` or re-use the PR title. - -### CI/CD Flow - -After merging, the CI/CD pipeline will run automatically, deploying the changes and updating the version as needed. - -## Overall Summary - -This setup ensures a streamlined and automated release process for the Provena Client, with CI/CD pipelines handling testing and deployment, and `python-semantic-release` managing semantic versioning and PyPI releases. - - - -## Contributing -TODO - -## License - -`provenaclient` was created by Provena Development Team (CSIRO). Provena Development Team (CSIRO) retains all rights to the source and it may not be reproduced, distributed, or used to create derivative works. - -## Credits - -`provenaclient` was created with [`cookiecutter`](https://cookiecutter.readthedocs.io/en/latest/) and the `py-pkgs-cookiecutter` [template](https://github.com/py-pkgs/py-pkgs-cookiecutter). - - diff --git a/src/provenaclient/modules/datastore.py b/src/provenaclient/modules/datastore.py index e9aee70..c069698 100644 --- a/src/provenaclient/modules/datastore.py +++ b/src/provenaclient/modules/datastore.py @@ -10,11 +10,11 @@ HISTORY: Date By Comments ---------- --- --------------------------------------------------------- +22-08-2024 | Parth Kulkarni | Completed Interactive Dataset class + Doc Strings. 15-08-2024 | Parth Kulkarni | Added a prototype/draft of the Interactive Dataset Class. ''' -from distutils.version import Version from provenaclient.auth.manager import AuthManager from provenaclient.utils.config import Config from provenaclient.clients import DatastoreClient, SearchClient @@ -23,7 +23,7 @@ from provenaclient.models import HealthCheckResponse, LoadedSearchResponse, LoadedSearchItem, UnauthorisedSearchItem, FailedSearchItem, RevertMetadata from provenaclient.utils.exceptions import * from provenaclient.modules.module_helpers import * -from ProvenaInterfaces.RegistryAPI import NoFilterSubtypeListRequest, VersionRequest, VersionResponse, SortOptions, SortType, DatasetListResponse +from ProvenaInterfaces.RegistryAPI import NoFilterSubtypeListRequest, VersionRequest, VersionResponse, SortOptions, DatasetListResponse from provenaclient.modules.submodules import IOSubModule from typing import AsyncGenerator, List @@ -124,37 +124,135 @@ async def action_approval_request(self, action_approval_request: ActionApprovalR class InteractiveDataset(ModuleService): + dataset_id: str + auth: AuthManager + datastore_client: DatastoreClient + io: IOSubModule + def __init__(self, dataset_id: str, auth: AuthManager, datastore_client: DatastoreClient, io: IOSubModule) -> None: + """Initialise an interactive dataset session. + + Parameters + ---------- + dataset_id : str + The unique identifier of the dataset to interact with. + datastore_client : DatastoreClient + The client responsible for interacting with the datastore API. + io : IOSubModule + The input/output submodule for handling dataset IO operations. + auth : AuthManager + An abstract interface containing the user's requested auth flow method. + """ + self.dataset_id = dataset_id self._auth = auth self._datastore_client = datastore_client self.io = io async def fetch_dataset(self) -> RegistryFetchResponse : + """Fetches current dataset from the datastore. + + Returns + ------- + RegistryFetchResponse + A interactive python datatype of type RegistryFetchResponse + containing the dataset details. + + """ + return await self._datastore_client.fetch_dataset(id=self.dataset_id) async def download_all_files(self, destination_directory: str) -> None: + """ + Downloads all files to the destination path for your current dataset. + + - Fetches info + - Fetches creds + - Uses s3 cloud path lib to download all files to specified location + + Parameters: + --------- + destination_directory (str): + The destination path to save files to - use a directory + """ + return await self.io.download_all_files(destination_directory=destination_directory, dataset_id=self.dataset_id) async def upload_all_files(self, source_directory: str) -> None: + """ + Uploads all files in the source path to the current dataset's storage location. + + - Fetches info + - Fetches creds + - Uses s3 cloud path lib to upload all files to specified location + + Parameters + ---------- + source_directory (str): + The source path to upload files from - use a directory + """ + return await self.io.upload_all_files(source_directory=source_directory, dataset_id=self.dataset_id) async def version(self, reason: str) -> VersionResponse: + """Versioning operation which creates a new version from the current dataset. + + Parameters + ---------- + reason : str + The reason for versioning this dataset. + Returns + ------- + VersionResponse + Response of the versioning of the dataset, containing new version ID and + job session ID. + """ + version_request: VersionRequest = VersionRequest( id = self.dataset_id, reason = reason ) + return await self._datastore_client.version_dataset(version_dataset_payload=version_request) async def revert_dataset_metadata(self, history_id: int, reason: str) -> StatusResponse: + """Reverts the metadata for the current dataset to a previous identified historical version. + + Parameters + ---------- + history_id : int + The identifier of the historical version to revert to. + reason : str + The reason for reverting the dataset's metadata. + + Returns + ------- + StatusResponse + Response indicating whether your dataset metadata revert request was successful. + """ + revert_request: RevertMetadata = RevertMetadata( id=self.dataset_id, history_id=history_id, reason=reason ) + return await self._datastore_client.revert_metadata(metadata_payload=revert_request) async def generate_read_access_credentials(self, console_session_required: bool) -> CredentialResponse: + """Given an S3 location, will attempt to generate programmatic access keys for + the storage bucket at this particular subdirectory. + + Parameters + ---------- + console_session_required : bool + Specifies whether a console session URL is required. + + Returns + ------- + CredentialResponse + The AWS credentials creating read level access into the subset of the bucket requested in the S3 location object. + """ credentials_request = CredentialsRequest( dataset_id=self.dataset_id, @@ -164,6 +262,19 @@ async def generate_read_access_credentials(self, console_session_required: bool) return await self._datastore_client.generate_read_access_credentials(read_access_credentials=credentials_request) async def generate_write_access_credentials(self, console_session_required: bool) -> CredentialResponse: + """Given an S3 location, will attempt to generate programmatic access keys for + the storage bucket at this particular subdirectory. + + Parameters + ---------- + console_session_required : bool + Specifies whether a console session URL is required. + + Returns + ------- + CredentialResponse + The AWS credentials creating write level access into the subset of the bucket requested in the S3 location object. + """ credentials_request = CredentialsRequest( dataset_id=self.dataset_id, @@ -550,6 +661,22 @@ async def search_datasets(self, query: str, limit: int = DEFAULT_SEARCH_LIMIT) - ) async def interactive_dataset(self, dataset_id: str) -> InteractiveDataset: + """Creates an interactive "session" with a dataset that allows you + to perform further operations without re-supplying dataset id and + creating objects required for other methods. + + Parameters + ---------- + dataset_id : str + The unique identifier of the dataset to be retrieved. + For example: "10378.1/1451860" + + Returns + ------- + InteractiveDataset + An instance that allows you to perform various operations on the provided dataset. + """ + return InteractiveDataset( dataset_id=dataset_id, datastore_client=self._datastore_client,