Skip to content

Commit

Permalink
Model refactor and expansion (#147)
Browse files Browse the repository at this point in the history
**Added**

* Add Study Key to all schemas

* Add "Component" Key attributes to shared and in relevant schemas (e.g., Biospecimen Key, ImagingLevel1 Key, etc.). Keys use cross-manifest validation. 

* Add Data Use Codes attribute to File View and Dataset View models; component-specific Data Use Codes to Study and Sharing Plans

* Add attributes and schemas for Collections, Model, Individual, Imaging (Level 1 - 4 and Channel), Sequencing (Level 1- 3, derived from CDS, GeoMx, Visium, and Shared), and RNA Sequencing (Level 1)

* Add CDS attributes to Biospecimen and Study schemas (e.g., Primary Diagnosis, )

* Add longitudinal attributes to File View schema

* Add controlled vocabulary terms associated with CDS attributes

* Add controlled vocab terms from Data Sharing Pilots
- "Whole Animal" as specimen type
- "Light Sheet Microscopy" as an assay type


**Updates**

* Update DSP attributes and schema
- Add keys, governance, and CDS-related attributes (e.g, DSP IRB Form attribute, )
- Additional attributes are intended to provide governance and CDS-related submission information
- DataDSP schema should now approximate features in the DSP google doc, aside from written instructions.

* Update ToolView attributes

* Update species controlled vocabulary to include more human-related terms


**Removed**

* Remove unused schemas: Consortium Grant, Dataset, Dataset Grant, Grant, Institution Grant, Person Consortium, Publication, Publication View DCA, Theme Grant, Tool, Tool Grant, 

* Change Tumor Type to Disease Type for experimental metadata that will be sent to CDS/Data Hub - Will be mapped to existing tumor types for annotation purposes, Disease Type is requested by CDS

* Update release_workflow.sh: Add template types, Change output type to CSV (-o flag) and generate google sheets (-s flag)

* Replace old XLSX templates with CSVs: Generated with release_workflow.sh

* Update GeoMx schema definitions
- Remove Parent Biospecimen ID attribute
- Remove Filename as GeoMx-specific attribute, since it is now shared
- Label GeoMx-specific File Format

* Remove Tumor Subtype valid values, to be replaced by an expanded set of subtypes at a later date


**Organization**

* Separate assay-specific models into level folders

* Move attributes and controlled vocab to relevant module subfolders
- Terms associated with a model component/introduced in a component will be defined in the annotationProperty.csv of the component
- Controlled vocab associated with the attribute is stored in the module/component subfolder
- The annotationProperty.csv in module/shared is reserved for generalized attributes that are intended to be used across many schemas (e.g., Data Use Codes, Component Keys, Workflow Link, etc.)

* Relocate RNA-specific attributes: Sequencing Level 1 - 3 will serve as a broad sequencing assay template. Sub models will be added that contain more specific information types. SequencingRNALevel1 will surface RNA-seq focused metadata elements for raw/unprocessed data

* Replace commas in CDS terms with semicolons
- Addresses an issue with terms being incorrectly read when valid values are parsed from CSV format

* Update build-jsonld.yml
- bump schematicpy to 24.10.2

* Update release_workflow.sh: Add additional templates to datatypes and add an 'if' statement to control make. Model conversion is taking ~16 minutes, so we only want to run that when necessary.

* Retain json data model schemas in model: Schemas can be bound to entities in Synapse via the client
  • Loading branch information
Bankso authored Oct 31, 2024
1 parent fb40f73 commit b3ff791
Show file tree
Hide file tree
Showing 162 changed files with 390,828 additions and 55,001 deletions.
10,407 changes: 10,029 additions & 378 deletions all_valid_values.csv

Large diffs are not rendered by default.

147 changes: 147 additions & 0 deletions json_schemas/mc2.10xVisiumAuxiliaryFiles.schema.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
{
"$schema": "http://json-schema.org/draft-07/schema#",
"$id": "http://example.com/MC2",
"title": "MC2",
"type": "object",
"properties": {
"Capture Area": {},
"10xVisiumAuxiliaryFiles_id": {
"not": {
"type": "null"
},
"minLength": 1
},
"Run ID": {},
"Workflow Version": {},
"10xVisiumRNALevel3 Key": {},
"10xVisiumRNALevel1 Key": {},
"Slide ID": {},
"Filename": {
"not": {
"type": "null"
},
"minLength": 1
},
"Workflow Link": {},
"10xVisiumRNALevel2 Key": {},
"File Format": {
"enum": [
"GCT",
"FIG",
"FASTQ",
"HDF5",
"SCN",
"R File Format",
"AVI",
"DB",
"TDF",
"cel",
"PKL",
"LIF",
"FASTA",
"RPROJ",
"XML",
"JPG",
"RAW",
"TAR Format",
"RTF",
"unspecified",
"SF",
"CLS",
"pptx",
"CSV",
"xls",
"FREQ",
"DAE",
"MATLAB script",
"Pending Annotation",
"MTX",
"TSV",
"MGF",
"TXT",
"H5AD",
"H5",
"GFF3",
"bed12",
"JSON",
"FCS",
"cloupe",
"MAT",
"BAI",
"ROUT",
"STAT",
"VCF",
"GTF",
"PZFX",
"PNG",
"SGI",
"TIFF",
"RDS",
"Python Script",
"BIGWIG",
"BAM",
"IDAT",
"BED",
"GCG",
"WIG",
"mzIdentML",
"mzXML",
"docx",
"DS_Store",
"SVS",
"CHP",
"bedgraph",
"PDF",
"GCTx",
"MAP",
"HDF",
"maf",
"COOL",
"BPM",
"SRA",
"HTML",
"rcc",
"ZIP",
"GZIP Format",
"xlsx",
"MSF",
"CDS"
]
},
"10xVisiumRNALevel4 Key": {},
"Component": {
"not": {
"type": "null"
},
"minLength": 1
},
"Biospecimen Key": {},
"Visium File Type": {
"enum": [
"reference png",
"json scale factors",
"reference jpg",
"filtered mex",
"fiducial image png",
"tissue_positions",
"features",
"qc result html",
"fiducial image jpg",
"detected jpg",
"detected image png",
"barcodes",
"low res image",
"unfiltered mex",
"probe dataset csv",
"high res image"
]
}
},
"required": [
"10xVisiumAuxiliaryFiles_id",
"Filename",
"File Format",
"Component",
"Visium File Type"
]
}
Loading

0 comments on commit b3ff791

Please sign in to comment.