Skip to content

Commit

Permalink
move Zenodo jupyterLab doc to vre-hub main docs (#8)
Browse files Browse the repository at this point in the history
* move Zenodo jupyterLab doc to vre-hub main docs

* deco and style

* test tests on branches

* trigger on PRs

* test yarn build on tests

* update links

* Update test_build.yml
  • Loading branch information
garciagenrique authored Sep 26, 2024
1 parent 4905da4 commit 9def299
Show file tree
Hide file tree
Showing 9 changed files with 720 additions and 6 deletions.
7 changes: 1 addition & 6 deletions .github/workflows/test_build.yml
Original file line number Diff line number Diff line change
@@ -1,8 +1,6 @@
name: Test build
on:
pull_request:
branches:
- 'main'
paths-ignore:
- 'README.md'

Expand All @@ -24,9 +22,6 @@ env:
jobs:
# Single deploy job since we're just deploying
build:
environment:
name: github-pages
url: ${{ steps.deployment.outputs.page_url }}
runs-on: ubuntu-latest
steps:
- name: Checkout
Expand All @@ -39,4 +34,4 @@ jobs:
- name: Install dependencies
run: yarn install --frozen-lockfile --non-interactive
- name: Build
run: yarn build
run: yarn build
48 changes: 48 additions & 0 deletions docs/extensions/zenodo-jupyterlab/extension-usage.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# Frontend Usage Guide

## Searching

The user has the ability to search both records and communities, which are collections of records. The actual search query ability is derived from [Elasticsearch](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html), just as the [Zenodo REST API](https://developers.zenodo.org/#rest-api) is.

The results are returned in pages of 25 results, with the ability to navigate the pages of results. This leverages the built-in `page` and `size` parameters present in the Zenodo REST API, which appear as `kwargs` in `eOSSR`, the python library on which all backend communication with Zenodo is based.

For searched records, the title, resource type, and date published are displayed in the table. For communities, only the title and date published are displayed. Both searches are returned and sorted by most recently updated.

*Note*: As each page is a new API call, if records are added that match the search parameters in between page calls, there is a possibility for repeating records, as they are pushed to later pages. Though, given the probable specificity of users' searches, this should not be an issue.

### Record Information
When a record is selected, more record information is displayed:
* Title of the record as a link to the Zenodo record entry on `zenodo.org`
* Author List, with institutions available upon hovering
* List of files as download links (WIP, currently install to local computer, but in future updates will install to the `$HOME` directory in the Jupyter instance)

This interface is meant to mimic the record entries on `zenodo.org`.

### Community Searching
Upon searching for a community, when clicked, the user will be able to search for specific records within that community, and a banner will display which community is being searched through above the results. Note: the community searching can be canceled by either pressing the `X` next to this community banner or searching for a new community.

### Potential Future Features
Depending on the usefulness of the feature in terms of the scope of this extension, advanced search settings might be added in future updates. These would mimic those on `zenodo.org`: restrict file type, resource type, access level, ...


## ZenodoAPI interactions {#zenodoAPI}
This section involves all actions that interact with the `eossr.api.zenodo.ZenodoAPI` module, which creates a connection to Zenodo with an API access token for continued use.

### Logging in
The user is able to either log in to the main [Zenodo](https://zenodo.org/) software or the [Sandbox Zenodo](https://sandbox.zenodo.org/) software (for testing) using their Personl Access Token. This token is created in the Zenodo User Settings > Applications > Personal Access Tokens. When the user logs in, the validity of their token in the chosen software is determined via a query of that user's deposits. If the token is invalid, this is displayed clearly below the access token field.

The entered access token and choice of whether or not to work within the main or sandbox software are saved as environmental variables within the Jupyter Session: `ZENODO_API_KEY` and `SANDBOX_ZENODO`, respectively. These variables will be accessible across the Jupyter Session, however they will not be reflected in any open terminals; new terminals/notebooks must be opened to reflect the change.

### Uploading A Record
The sandbox checkbox at the top of the Upload page is readonly and indicates whether of not the user logged in to the main software or the sandbox version. This is to inform the user of in which software their deposit will be made.

In order to continue, the user had to fill in the necessary information:
* Select at least one file to upload (the file browser initially shows the contents of the `$Home` directory in the Jupyter Session)
* Title
* Select a Resource Type
* Include at least one Creator's name
* Optional Information:
** DOI (otherwise generated automatically)
** Multiple Creators and include affiliations

Once the `Next` button is selected, a confirmation page is shown with all of the entered information. When `Confirm` is pressed, a deposit is made and the appropriate metadata is assigned. ***Note***: As is currently stands, only the title is able to be set via this mechanism, and the file upload has yet to be implemented.
178 changes: 178 additions & 0 deletions docs/extensions/zenodo-jupyterlab/implementation/backend.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,178 @@
# Backend Documentation

## General Framework
The backend for this extension is built as a Jupyter Server Extension. The project entry points are specified with the `pyproject.toml` file in the root directory. These point to the `zenodo_jupyterlab.server` module, which contains the `extenion.py` and `__init__.py` files which run the function that sets up the API handlers defined within other files in that directory. This guide will go through each section, with explanation of functionality.

# Files in `zenodo_jupyterlab/server/`

## `extension.py`
`_load_jupyter_server_extenion` is a basic function that calls on `setup_handlers` which is defined in `handlers.py` and passes the `server_app.web_app` object. This is automatically passed via the server extension points.

## `__init__.py`
Defines the `_jupyter_server_extension_points` function that signals to the primary extension in `pyproject.toml` how to access the server extension and to build when it built.

## `handlers.py`
This file generates the API endpoints for use by the frontend components. All handlers inherit from `jupyter_server.base.handlers.APIHandler` (except the XSRFTokenHandler inherits from JupyterHandler). For simplicity, parameters are placed within the function definitions here, though they are accessed via the `APIHandler.get_argument()` function because they are URI encoded in the API calls. In addition, return statements are defined plainly here, though they are actually returned via the `APIHandler.finish()` function.

### `class EnvHandler`
Interacts with environmental variables in the Jupyter instance. Exploits the `os` module.
#### `get(env_var: string)`
Simple function to return the value of a stored environmental variable. Returns a dict containing the variable name and value.
> **Returns:** `{env_var: *value*}`
#### `post(key: string, value: string)`
Simple function to set the value of an environmental variable. Returns a dict with the parameters.
> **Returns:** `{key, value}`
### `class XSRFTokenHandler`
Note: this inherits from JupyterHandler, not APIHandler.

#### `get(self: JupyterHandler)`
Accesses JupyterHandler.xsrf_token and returns it.
> **Returns:** `{'xsrfToken': xsrf_token}`
### `class Search RecordHandler`
Handler designed to interact with `searchRecords` function defined in [`search.py`](#search).

#### `get(search_field: string, page: int, communities: string)`
Awaits response from `searchRecords` after passing all of the arguments and returns the list of corresponding records (max size: 25).
> **Returns:** `{'records': *response*}`
### `class SearchCommunityHandler`
Handler designed to interact with `searchCommunities` function defined in [`search.py`](#search).

#### `get(search_field: string, page: int)`
Awaits response from `searchCommunities` after passing all of the arguments and returns the list of corresponding communities (max size: 25).
> **Returns:** `{'communities': *response*}`
### `class RecordInfoHandler`
Handler designed to interact with `recordInformation` function defined in [`search.py`](#search).

#### `get(record-id: string)`
Awaits response from `recordInformation` after passing the `record-id` and returns the desired data.
> **Returns:** `{'data': *response*}`
### `class FileBrowserHandler`
Interacts with environmental information about the folder and files in the `$HOME` directory and child directories.

#### `get(path: string)
Pulls the Jupyter instance home directory from the `$HOME` environmental variable. Then appends the path passed to this to generate a relative path and verifies whether it exists (if not, returns an "error"). Exploits the `os.listdir` function to iterate through entries within the folder. Note: it explicitly ignores all entries that start with ".", though, in principle, these should be excluded by `listdir`. It then accumulates a list of directories of entry information:
```json
entry = {
"name": file name,
"type": directory or file,
"path": relative path from home directory,
"modified": isoformatted timestamp of last modification,
"size": size
}
```
This information is all drawn from the returned data from `os.listdir`.
> **Returns:** `{'entries': *list of entry dictionaries*}`
### `class ZenodoAPIHandler`
Interacts with all functions that make calls to the `eossr.api.zenodo.ZenodoAPI` object. They are contained within this class to limit the need of generating new `ZenodoAPI` objects with each interaction.

#### `property zAPI`
Once logged in, this will hold a `ZenodoAPI` object initialized with the input access token and sandbox boolean. As long as they do not log in again, this object will remain for the lifetime of the Jupyter instance.

#### `post(form_data: json dictionary)`
Takes in a `form_data` JSON dictionary, that contains at least an `action` entry, which specifies which code to run.

##### `if action == 'check-connection'`
Verifies the validity of a Zenodo access token via the `checkZenodoConnection` function defined in [`testConnection.py`](#testConnection). If valid, sets `zAPI` to an initialized `ZenodoAPI` object. Returns the response from the called function.
> **Returns:** `{'status': *response*}`
##### `if action == 'upload'`
Interacts with the `upload` function defined in `upload.py`. If `zAPI` is not yet initialized, the function returns a status of "Please log in before attempting to upload." Otherwise, returns the response from the function.
> **Returns:** `{'status': *response*}`
### `class ServerInfoHandler`

#### `get`
Retrieves the home directory.
> **Returns:** `{'root_dir': *home directory*}`
### `setup_handlers`
`setup_handlers(web_app)`

Defines the API endpoints for access from the frontend. Builds the urls off of the "web_app" base path and "zenodo-jupyterlab". Thus all handlers are of the form: base_path + "zenodo-jupyterlab-*action*". This function then adds these endpoints to the web_app.

## `search.py`
This files handles all search requests to Zenodo via the [`eOSSR`](https://gitlab.com/escape-ossr/eossr) library.

### `searchRecords`
`searchRecords(search_field: string, page: int, **kwargs)`

Calls the `eossr.api.zenodo.search_records` with the given arguments, as well as restricts the size of the response to 25 (i.e. passes `size = 25` as well). Parses the returned list of `eossr.api.zenodo.Record` objects and returns a list of the following kinds of dictionaries:
```json
record = {
"id": Record ID,
"title": Record title,
"date": Date Published,
"resource_type": Records resource type (from within the metadata)
}
```
If this call to `eOSSR` fails, the function simply returns `["failed"]`.
> **Returns:** [list of records]
### `searchCommunities`
`searchCommunities(search_field: string, page: int)`

Calls the `eossr.api.zenodo.search_communities` with the given arguments, as well as restricts the size of the response to 25 (i.e. passes `size = 25` as well). Parses the returned list of dictionaries and returns a list of the following kinds of dictionaries:
```json
community = {
"id": Community ID,
"title": Community title,
"date": Date Published,
}
```
If this call to `eOSSR` fails, the function simply returns `["failed"]`.
> **Returns:** [list of communities]
### `recordInformation`
`recordInformation(recordID: string)`

Returns more specific information about a specified record via `eossr.api.zenodo.get_record`. Parses the returned data from this function and creates the following dictionary:
```json
record= {
'authors': authors with affiliations as listed in the record,
'filelist': the list of files (full download links) attached to record
}
```
Note: The existing information such as title and id are still held on the frontend in the tabular display, so no need to repass that information. Secondary note: if this call fails, this function returns `{"status": "failed"}`.
> **Returns:** *record*
## `testConnection.py`
File devoted to validating Zenodo access tokens.

### `checkZenodoConnection`
Extracts the access token and sandbox boolean from the environmental variables `ZENODO_API_KEY` and `ZENODO_SANDBOX`, respectively. Parses the string value of the sandbox boolean into a Python boolean (stored as a string due to the mismatch is syntax between Typescript and Python booleans). Creates a `eossr.api.zenodo.ZenodoAPI` object with those extracted variables as arguments and stores it in `zAPI`. Then, uses the `eossr.api.zenodo.ZenodoAPI.query_user_deposits` function and extracts the status from this response, which indicates whether or not the access token is valid or if a connection can be made to the Zenodo REST API. If either stage fails, `zAPI` is initialized to `None` and the query status is set to 0.
> **Returns:** status code of the query, `zAPI`
## `upload.py`
Devoted to generating and populating new Zenodo record deposits.

### `createDeposit`
`createDeposit(zAPI: eossr.api.zenodo.ZenodoAPI)`

Uses `eossr.api.zenodo.ZenodoAPI.create_new_deposit` to create an empty deposit. Note: whether or not this deposit exists on Zenodo or Zenodo Sandbox is entirely dependent on what option the user selected when logging in.
> **Returns:** ID of the newly created record
### `createMetadata`
`createMetadata(zAPI: eossr.api.zenodo.ZenodoAPI, recordID: int, form_data: FormData object)`

*Work in Progress*\
Extracts title from form_data and creates a JSON dict as follows:
```json
json_metadata = {
"title": given title
}
```
Uses `eossr.api.zenodo.ZenodoAPI.set_deposit_metadata` to take in the recordID and the JSON data and add this metadata to the existing Zenodo object.
> **Returns:** *response*
### `upload`
`upload(zAPI: eossr.api.zenodo.ZenodoAPI, form_data: FormData object)`

Verifies if zAPI has been initialized; if not, returns `None`. Calls `createDeposit` and passes `zAPI` and captures returns record ID. Then uses this record ID, form_data, and the `zAPI` to call `createMetadata`.
> **Returns:** "Success" if *response* doesn't equal `None`
Loading

0 comments on commit 9def299

Please sign in to comment.