Skip to content

Version 1.2

Latest
Compare
Choose a tag to compare
@quirinmanz quirinmanz released this 16 Feb 17:57
· 5 commits to main since this release

Version 1.2

At this time, this repository is only for sample metadata, not experiment metadata.
There is metadata available for 2279 EpiRR entries.
The CSV for the sample metadata can be found
at openrefine/v1.2/IHEC_metadata_harmonization.v1.2.csv and the extended version at openrefine/v1.2/IHEC_metadata_harmonization.v1.2.extended.csv

News

  • Added 63 entries that had erroneously been removed in v1.1.
  • The columns harmonized_donor_sex and harmonized_donor_life_stage have been complemented and corrected, based on
    the prediction of the EpiClass tool. For more information on this, please contact Pierre-Étienne Jacques.
  • Some minor changes to sample_disease and donor_health_status columns.
  • Added column epiATLAS_status which is equivalent to harmonized_EpiRR_status but referring to the reprocessed data
    rather than original submitted data, describing the status of the reference epigenome with the additional information
    of full epigenomes when using imputed data.
  • Extended version: Added columns for each assay type (histone marks, wgbs, and
    rna-seq) automated_experiments_${assay} containing the uuid for observed data, or imputed if only imputed data is
    available.
  • Extended version: Added column harmonized_sample_ontology_term_high_order_fig1
  • Extended version: Columns sample_ontology_term_high_order_JeffreyHyacinthe
    and sample_ontology_term_high_order_JonathanSteif have been removed and replaced
    by harmonized_sample_ontology_term_high_order_fig1 containing the sample labels corresponding to the annotations in
    the overview figure.
  • Extended version: Added columns harmonized_sample_[...]_order_AnetaMikulasova containing manually assigned
    labels by Aneta
    Mikulasova, which contain information about organ, cell, and cancer (sub-)types.
  • Extended version: Removed columns automated_harmonized_($column)_($order)(_unique)?,
    e.g., automated_harmonized_sample_ontology_term_intermediate_order_unique containing the automatic extraction higher
    order as decribed in v0.9. These columns
    were used to derive the harmonized_sample_ontology_intermediate and harmonized_sample_disease_intermediate
    columns, but this was based on older versions of these columns. The columns are still generated internally, for
    checking purposes, but could confuse users and are not necessary for the metadata.

Diff

The overall diff between v1.1 and v1.2 can be found at openrefine/v1.2/diff_v1.1_v1.2.json

Metadata Standard

Please keep in mind that we try to stay as close to the IHEC Metadata Standard as possible.

Column descriptions:

The table below describes the columns included in the metadata table and the extended metadata table.

Column Examples Explanation # Not Null (%)
EpiRR IHECRE00000001.4 EpiRR identifier. The number behind the dot (.) is the version. 2279 (100.0%)
project CEEHRC BLUEPRINT The project from which the epigenome originated. 2279 (100.0%)
harmonized_biomaterial_type cell line primary cell primary cell culture primary tissue One of primary cell,primary cell culture, cell line, primary tissue. 2279 (100.0%)
harmonized_sample_ontology_intermediate T cell epithelial cell derived cell line A manually refined higher level annotation describing the samples using ancestors in the ontology. 2279 (100.0%)
harmonized_sample_disease_high Healthy/None Cancer Disease A manually refined higher level annotation describing the disease using only three categories: Healthy/None, Cancer, Disease. 2279 (100.0%)
harmonized_sample_disease_intermediate Carcinoma Leukemia A manually refined higher level annotation describing the disease for this particular sample using ancestors in the NCIT ontology. NCIM CURIEs were mapped to NCIT CURIES, see version 0.9 for explanation. 2279 (100.0%)
harmonized_EpiRR_status Complete Partial Whether this epigenome is Complete or Partial. 2279 (100.0%)
epiATLAS_status Complete Partial Complete_imputed Equivalent to harmonized_EpiRR_status but referring to the reprocessed data rather than original submitted data, describing the status of the reference epigenome with the additional information of full epigenomes when using imputed data. 2279 (100.0%)
harmonized_cell_type myeloid cell effector memory CD8-positive, alpha-beta T cell The cell type and main sample ontology classification for entries where biomaterial_type is primary cell or primary cell culture. 1561 (68.5%)
harmonized_cell_line MCF 10A The cell line and main sample ontology classification for entries where biomaterial_type is cell line. 73 (3.2%)
harmonized_tissue_type skeletal muscle tissue amygdala The cell line and main sample ontology classification for entries where biomaterial_type is primary tissue. 2008 (88.1%)
harmonized_sample_ontology_curie CL:0000990 UBERON:0001876 EFO:0001200 The CURIE identifying the sample ontology term. Different ontologies are used, depending on the biomaterial_type: 'CL' for primary cell or primary cell culture, 'EFO' for cell line and 'UBERON' for primary tissue. 2279 (100.0%)
harmonized_cell_markers CD3+ CD4+ CD45RA+ CD3- CD19- CD56- Markers used to isolate and identify the cell type, when applicable. 1144 (50.2%)
automated_harmonized_sample_ontology CL UBERON EFO Extended only Automatic extraction from harmonized_sample_ontology_curie. The ontology corresponding to the curie, mostly used for other automatic extractions. 2279 (100.0%)
automated_harmonized_sample_ontology_term myeloid cell MCF 10A amygdala Extended only Automatic extraction from harmonized_sample_ontology_curie. The term corresponding to the curie, mostly used for detecting inconsistencies. 2279 (100.0%)
harmonized_sample_ontology_term_high_order_fig1 T lymphocyte epithelial stem cell Extended only Semi-manual merging of values from harmonized_sample_ontology_intermediate by Jonathan Steif. Had been applied to a preliminary v1.2. 2279 (100.0%)
harmonized_sample_organ_system_order_AnetaMikulasova Immune System Nervous Extended only Annotation of organ system by Aneta Mikulasova. Had been applied to a preliminary v1.2. 2279 (100.0%)
harmonized_sample_organ_order_AnetaMikulasova blood-venous brain x Extended only Annotation of organ by Aneta Mikulasova. Had been applied to a preliminary v1.2. x if not applicable. 2279 (100.0%)
harmonized_sample_organ_part_or_lineage _order_AnetaMikulasova Myeloid Lymphoid x frontal-lobe-brodmann-area-9 Extended only Annotation of organ part or lineage by Aneta Mikulasova. Had been applied to a preliminary v1.2. x if not applicable. 2279 (100.0%)
harmonized_sample_cell_order_AnetaMikulasova Tcell Bcell x Extended only Annotation of cell type by Aneta Mikulasova. Had been applied to a preliminary v1.2. x if not applicable. 2279 (100.0%)
harmonized_sample_cell_2_order_AnetaMikulasova CD4 mature x Extended only Annotation of cell subtype by Aneta Mikulasova. Had been applied to a preliminary v1.2. x if not applicable. 2279 (100.0%)
harmonized_sample_cell_3_order_AnetaMikulasova alpha-beta helper x Extended only Annotation of cell subsubtype by Aneta Mikulasova. Had been applied to a preliminary v1.2. x if not applicable. 2279 (100.0%)
harmonized_sample_cancer_type_order_AnetaMikulasova CLL AML x Extended only Annotation of cancer type by Aneta Mikulasova. Had been applied to a preliminary v1.2. x if not applicable. 2279 (100.0%)
harmonized_sample_cancer_subtype_order_AnetaMikulasova hepatocellular anaplastic x Extended only Annotation of cancer subtype by Aneta Mikulasova. Had been applied to a preliminary v1.2. x if not applicable. 2279 (100.0%)
harmonized_sample_disease Breast Carcinoma Acute Promyelocytic Leukemia with PML-RARA This attribute reflects the disease for this particular sample, not the donor health condition. 2142 (94.0%)
harmonized_sample_disease_ontology_curie NCIM:C0678222 NCIM:C0023487 The CURIE identifying the NCIM disease ontology term. 2142 (94.0%)
automated_harmonized_sample_disease _ontology_curie_ncit NCIT:C41132 NCIT:C4872 Extended only Automatic exctraction from harmonized_sample_disease_ontology_curie, mostly used for other automatic extractions. 2134 (93.6%)
harmonized_donor_type Single donor Composite Pooled samples Composite is a reference generated from analysis objects generated from multiple individuals, ie H3K27ac ChIP-seq is subject A; RNA-seq is Subject B. Pooled samples are references generated from a biological pool, for example cord blood from 134 individual cords pooled together. 2279 (100.0%)
harmonized_donor_id CEMT0007 C07015 Identifier for donors within their projects. 2116 (92.8%)
harmonized_donor_age 60-65 unknown 46 Age of donor. Can be an interval. 2279 (100.0%)
harmonized_donor_age_unit year day week unknown Age unit of donor. 2279 (100.0%)
automated_harmonized_donor_age_in_years 32.5 67.5 Age of donor converted to years (mean for intervals). 1678 (73.6%)
harmonized_donor_life_stage adult child embryonic fetal newborn postnatal unknown Life stage of donor. Corrected and imputed using EpiClass. 2279 (100.0%)
harmonized_donor_sex female male mixed unknown Sex of donor. Corrected and imputed using EpiClass. 2279 (100.0%)
harmonized_donor_health_status Breast Carcinoma Acute Promyelocytic Leukemia with PML-RARA The health status of the donor that provided the sample. Does not describe the disease for this particular sample. 982 (43.1%)
harmonized_donor_health_status_ontology_curie NCIM:C0023487 NCIM:C0678222 The CURIE identifying the NCIM donor health status ontology term. 982 (43.1%)
automated_harmonized_donor_health_status _ontology_curie_ncit Extended only Automatic exctraction from harmonized_donor_health_status_ontology_curie, mostly used for other automatic extractions. 961 (42.2%)
automated_experiments_H3K27ac f71ea030-5c25-4b10-8d23-afc537e49870 imputed Extended only Contains the uuid for observed data, or imputed if only imputed data is available. 1698 (74.5%)
automated_experiments_H3K27me3 " Extended only Contains the uuid for observed data, or imputed if only imputed data is available. 1698 (74.5%)
automated_experiments_H3K36me3 " Extended only Contains the uuid for observed data, or imputed if only imputed data is available. 1698 (74.5%)
automated_experiments_H3K4me1 " Extended only Contains the uuid for observed data, or imputed if only imputed data is available. 1698 (74.5%)
automated_experiments_H3K4me3 " Extended only Contains the uuid for observed data, or imputed if only imputed data is available. 1698 (74.5%)
automated_experiments_H3K9me3 " Extended only Contains the uuid for observed data, or imputed if only imputed data is available. 1698 (74.5%)
automated_experiments_WGBS " Extended only Contains the uuid for observed data, or imputed if only imputed data is available. 1898 (83.3%)
automated_experiments_RNA-Seq " Extended only Contains the uuid for observed data, or imputed if only imputed data is available. 1467 (64.4%)
epirr_id_without_version IHECRE00000001 EpiRR identifier without version. 2279 (100.0%)