MorPhiC Metadata Schema

This repository contains a modified version of the Human Cell Atlas Metadata Schema, obtained from their repository.

This is a tentative first approach at metadata standardisation and validation for the MorPhiC consortia.

As such, it will be constantly evolving and adapting to the community needs.

Metadata model

For the metadata model, we suggest a similar approach to the Human Cell Atlas; i.e.defining each step of the experimental process as a separate entity and linking all the entities together to form an experimental graph. With this model, any step in the experiment (e.g. sequence file GT22_04578_R1.fastq.gz) can lead the user through the experiment, extracting the important metadata in the process of reconstructing the assay.

The suggested metadata model is comprised of 3 main types of information:

General information about the project

Attributes: a unique identifier, a description, institutions involved, etc.
Use: To be displayed on a catalogue or the data portal when available

Information of scientific value for each of the samples (sample metadata)

Attributes: age of the donor, type of gene expression alteration, etc.
Use: To be displayed as filters on the data portal when available, collected in a standardised way so that analysis/data collection tools can be developed over the MorPhiC collection of data.

Information about the data files (file metadata)

Attributes: file name, content of the file, description, read index (In the case of RNA-Seq experiments), etc
Use: allows analysis, quality control filters and helps develop standardised analysis workflows with little to none manual input.

Alongside the metadata model, we are also trying to understand and organise the data model, by organising each of the entities described into a graph that can be parsed, understood and visualised.

Transcriptomics

Adaptations HCA --> Morphic

Technical changes

"describedBy" fields to point out to github raw instead of the bucket deployment of HCA schemas

Schema changelog

E = Entity

F = Field

Project

Added

project.target_genes (F)

Deleted

project.estimated_cell_count (F)
project.publications.official_hca_publication (F)

Cell line

Added

cell_line.genbank_assembly_accession (F)

Protocols

Added

type/protocol/biomaterial_collection/gene_silencing_protocol.json (E)
module/protocol/crisproff.json (E)

Ontology

Added

module/ontology/target_gene_ontology.json (E)
module/ontology/gene_silencing_method_ontology.json (E)

MVP data model for buckets

graph LR;
Z[Bucket root]
A[study 1]
B>Metadata spreadsheet.xlsx]
C["data type(s)"]
D(data files)

Z --> A
A --> B
A --> C
C --> D

It is a very simplified data model used to deliver an effectively understandable MVP.

The metadata spreadsheet contains the necessary metadata to understand the data within the bucket.

Data files within each of the data type folders don't have a specific ordering or hierarchy; However, the full paths to the files must be specified for each file.

A full path is considered a whole s3 path (e.g. s3://morphic-bio-jax/<study_name>/RNA-Seq/PAX6/7777.fastq.gz)

Potential improvements list

Adapt to Json Schema draft 09
Add cellosaurus ontology

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
Assets		Assets
Human-readable schemas		Human-readable schemas
Metadata spreadsheet		Metadata spreadsheet
json_schema		json_schema
setup_utilities		setup_utilities
src		src
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MorPhiC Metadata Schema

Metadata model

Transcriptomics

Adaptations HCA --> Morphic

Technical changes

Schema changelog

Project

Cell line

Protocols

Ontology

MVP data model for buckets

Potential improvements list

About

Releases

Packages

Languages

schurerlab/morphic-metadata-schema

Folders and files

Latest commit

History

Repository files navigation

MorPhiC Metadata Schema

Metadata model

Transcriptomics

Adaptations HCA --> Morphic

Technical changes

Schema changelog

Project

Cell line

Protocols

Ontology

MVP data model for buckets

Potential improvements list

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages