You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are emerging requirements for reusing the cellxgene-schema CLIschema+validator for scenarios that are more relaxed than CELLxGENE Discover's current requirements.
Relaxation
The following sections blue-sky possible approaches to documenting relaxed requirements; however, the solution should be driven by concrete scenarios and not theory.
Fine Granularity: Per Schema variant
A limited number of schema variants could be documented such as the "cross modality schema". schema_reference could be reused for the curator to define the preferred schema for validation.
Fine Granularity: Per Metadata field
For each metadata field, the schema defines separate requirements for strict and relaxed. Generally, relaxed will indicate that the field MUST NOT be present, but it's also possible to relax other requirements.
uns (Dataset Metadata)
relaxed
Key
relaxed
Annotator
Curator MAY annotate.
Value
list[str]. str values MUST match one or more of the values in the set:
"obs['cell_type_ontology_term_id']"
"obs['development_stage_ontology_term_id']"
...
If present, relaxed validation MUST be performed on the specified metadata field.
Concrete example: If the assay is silver tierVisium Spatial Gene Expression then assuming that cell_type_ontology_term_id defined its relaxed validation as:
cell_type_ontology_term_id MUST NOT be present in obs
"cell_type_onotlogy_term_id" MUST be annotated in uns['relaxed']
Then the silver tier dataset would simply meet those requirements.
Coarse Granularity: Per Dataset
The schema documents a relaxed subset of the current required fields. This subset may not include cell_type_ontology_term_id or perhaps development_stage_ontology_term_id. If a current required field is not included in the relaxed subset, then it MUST NOT be present in the dataset.
Curators annotate whether strict or relaxed validation is desired.
uns (Dataset Metadata)
strict
Key
strict
Annotator
Curator MUST annotate.
Value
bool. This MUST be True for strict validation and MUST be False for relaxed validation.
Compliance to the MiAIRR Data Standard is currently a binary state, i.e., a data either is or is not compliant, there are not “grades” of compliance. However, additional requirements for specific use cases might be defined in the future.
The text was updated successfully, but these errors were encountered:
I'd prefer not to overload "relaxed" to mean anything besides "MUST NOT contain". If we want to "relax" in some other way, it should probably be a new schema variant or additional flag.
I like the idea of using a combination of schema_reference to point to variant schemas, and uns.relaxed to point to which requirements to ignore in that given schema reference.
We may have dependent columns that need to be relaxed, like tissue_type and tissue_ontology_term_id. Just wanted to note that we'll have to account for that dependency either by logging an error if tissue_ontology_term_id is relaxed and tissue_type is not, or automatically relaxing dependent columns of relaxed columns.
Context
There are emerging requirements for reusing the
cellxgene-schema CLI
schema+validator for scenarios that are more relaxed than CELLxGENE Discover's current requirements.Relaxation
The following sections blue-sky possible approaches to documenting relaxed requirements; however, the solution should be driven by concrete scenarios and not theory.
Fine Granularity: Per Schema variant
A limited number of schema variants could be documented such as the "cross modality schema".
schema_reference
could be reused for the curator to define the preferred schema for validation.Fine Granularity: Per Metadata field
For each metadata field, the schema defines separate requirements for strict and relaxed. Generally, relaxed will indicate that the field MUST NOT be present, but it's also possible to relax other requirements.
uns
(Dataset Metadata)relaxed
list[str]
.str
values MUST match one or more of the values in the set:If present, relaxed validation MUST be performed on the specified metadata field.
Concrete example: If the assay is silver tier Visium Spatial Gene Expression then assuming that
cell_type_ontology_term_id
defined its relaxed validation as:cell_type_ontology_term_id
MUST NOT be present inobs
uns['relaxed']
Then the silver tier dataset would simply meet those requirements.
Coarse Granularity: Per Dataset
The schema documents a relaxed subset of the current required fields. This subset may not include
cell_type_ontology_term_id
or perhapsdevelopment_stage_ontology_term_id
. If a current required field is not included in the relaxed subset, then it MUST NOT be present in the dataset.Curators annotate whether strict or relaxed validation is desired.
uns
(Dataset Metadata)strict
bool
. This MUST beTrue
for strict validation and MUST beFalse
for relaxed validation.References
The text was updated successfully, but these errors were encountered: