Skip to content

Commit

Permalink
Submit data improvements
Browse files Browse the repository at this point in the history
  • Loading branch information
michaelfitzo committed Oct 10, 2024
1 parent 819a097 commit 3f31f59
Show file tree
Hide file tree
Showing 5 changed files with 582 additions and 105 deletions.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@

# Semi-structured Data

Semi-structured data is organized as unique identifiers with flexible key/value pairs (including nesting). The key/value pairs may be consistent between records, but are not required to be. This is typically used for storing publicly available metadata about available datasets or additional public metadata about samples. The MDS and AggMDS both include semi-structured data and power the Data Portal Discovery Page.


Because the structure of a commons' MDS and the Discovery Page configuration are very closely coupled, all content related to creating MDS records are included in the [Customize Gen3 Search Interface Section][Customize Gen3 Search Interface Section].

Instructions for the creation and modification of an MDS record can be found here as part of the [Gen3 SDK][Gen3 SDK Discovery Page]

## Discovery Page
![BRH Discovery Page](/gen3-resources/operator-guide/img/BRH_Discovery_Page.png)

<!-- Links -->

[Customize Gen3 Search Interface Section]: (/gen3-resources/operator-guide/customize-search/)
[Gen3 SDK Discovery Page]: https://github.com/uc-cdis/gen3sdk-python/blob/master/gen3/cli/discovery.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ Template TSVs are provided in each node's page in the data dictionary.

The prepared TSV files must be submitted in a specific order due to node links. Referring back to the graphical data model, a record cannot be submitted without first submitting the record(s) to which it is linked upstream (its "parent"). If metadata are submitted out of order, such as submitting a TSV with links to parent records that don't yet exist, the validator will reject the submission on the basis that the dependency is not present with the error message, "INVALID_LINK".

In a Gen3 Data Commons, **programs** and **projects** are two administrative nodes in the graph database that serve as the most upstream nodes. A program must be created first, followed by a project. Any subsequent data submission and data access, along with control of access to data, is done through the project scope. In some projects only a subset of submitters may have access to create a program or project.
In a Gen3 Data Commons, `programs` and `projects` are two administrative nodes in the graph database that serve as the most upstream nodes. A `program` must be created first, followed by a `project`. Any subsequent data submission and data access, along with control of access to data, is done through the project scope. In some projects only a subset of submitters may have access to create a program or project.

Before you create a program and a project or submit any data, you need to grant yourself permissions. First, you will need to grant yourself access to **create** a program and second, you need to grant yourself access to *see* the program. You can **create** the program before or after having access to *see* it.

Expand All @@ -39,13 +39,14 @@ Make sure to update user privileges:
docker exec -it fence-service fence-create sync --arborist http://arborist-service --yaml user.yaml
```

To create a program, visit the URL where your Gen3 Commons is hosted and append `/_root`. If you are running the Docker Compose setup locally, then this will be `localhost/_root`. Otherwise, this will be whatever you set the `hostname` field to in the creds files for the services with `/_root` added to the end. Here, you can choose to either use form submission or upload a file. We will go through the process of using form submission here, as it will show you what your file would need to look like if you were using file upload. Choose form submission, search for "program" in the drop-down list and then fill in the "dbgap_accession_number" and "name" fields. As an example, you can use "123" as "dbgap accession number" and "Program1" as "name". Click 'Upload submission json from form' and then 'Submit'. If the message is green ("succeeded:200"), that indicates success, while a grey message indicates failure. More details can be viewed by clicking on the "DETAILS" button. If you don't see the green message, you can control the sheepdog logs for possible errors and check the Sheepdog database (`/datadictionary`), where programs and projects are stored. If you see your program in the data dictionary, neglect the fact that at this time the green message does not appear and continue to create a project.
To create a program, visit the URL where your Gen3 Commons is hosted and append `/_root`. If you are running the Docker Compose setup locally, then this will be `localhost/_root`. Otherwise, this will be whatever you set the `hostname` field to in the creds files for the services with `/_root` added to the end. Here, you can choose to either use form submission or upload a file. We will go through the process of using form submission here, as it will show you what your file would need to look like if you were using file upload. Choose form submission, search for "program" in the drop-down list and then fill in the "dbgap_accession_number" and "name" fields. As an example, you can use "123" as "dbgap accession number" and "Program1" as "name". Click 'Upload submission json from form' and then 'Submit'. If the message is green ("succeeded:200"), that indicates success, while a grey message indicates failure. More details can be viewed by clicking on the "DETAILS" button. If you don't see the green message, you can control the sheepdog logs for possible errors and check the Sheepdog database (`/datadictionary`), where programs and projects are stored. If you see your program in the data dictionary, neglect the fact that at this time the green message does not appear and continue to create a project.

To create a project, visit the URL where your Gen3 Commons is hosted and append the name of the program you want to create the project under. For example, if you are running the Docker Compose setup locally and would like to create a project under the program "Program1", the URL you will visit will be `localhost/Program1`. You will see the same options to use form submission or upload a file. This time, search for "project" in the drop-down list and then fill in the fields. As an example, you can use "P1" as "code", "phs1" as "dbgap_accession_number", and "project1" as "name". If you use different entries, make a note of the dbgap_accession_number for later. Click 'Upload submission json from form' and then 'Submit'. Again, a green message indicates success while a grey message indicates failure, and more details can be viewed by clicking on the "DETAILS" button. You can control in the `/datadictionary` whether the program and project have been correctly stored.

After that, you're ready to start submitting data for that project keeping in mind that you must submit from "top to bottom" in the data model to make sure each new node points to an existing node in the database. If metadata are submitted out of order, such as submitting a TSV with links to parent records that don't yet exist, the validator will reject the submission on the basis that the dependency is not present with the error message, "INVALID_LINK".

Alternatively,the [Gen3 Submission sdk](https://uc-cdis.github.io/gen3sdk-python/_build/html/_modules/gen3/submission.html) has a comprehensive set of tools to enable users to script submission of programs and projects.

As an alternative to creating the program and project nodes in the Data Portal, you can instead use the [Gen3 Submission SDK](https://uc-cdis.github.io/gen3sdk-python/_build/html/_modules/gen3/submission.html), which has a comprehensive set of tools to enable users to script submission of programs and projects.

Sample Code for submission of a Program and Project to a data commons:
```
Expand Down
Loading

0 comments on commit 3f31f59

Please sign in to comment.