Welcome to the Standardized Metadata Collection for Omics Data repository! This repository is dedicated to the collection, standardization, and sharing of metadata associated with various omics data types, currently focusing on metagenomics but with plans to expand to metatranscriptomics, metaproteomics, and metabolomics.
Our goal is to present metadata in a standardized fashion, both semantically and syntactically, ensuring it is ready for analysis. We will provide guidance and examples to help others contribute. By pooling our efforts, we aim to avoid the duplication of work involved in preparing individual metadata sets and make the process more efficient, considering the significant amount of effort required for metadata preparation.
We are starting by providing the workflow we followed (see scripts/README.md) to get the metadata for the projects listed below. Of course sharing the final product: metagenomes.txt.
The purpose of this section is to list the metagenomes included in the current metadata curation effort. This includes the project name, (data) publication, date range, depth range, number of samples, number of runs, project accession number, and other relevant information about the projects and their metadata below.
For details of the metadata curation efforts that include the following datasets and publications that originally describe them see scripts/README.md
Note
Please note, that the current collection of curated metadata is limited to runs from the projects noted below that
- are metagenomes
- are paired-end
- are from samples collected in ≤100 m depth
- are from samples associated with at least environmental metadata on
temperature
Observatory/Cruise | Project acronym | Accession number | Metadata citation | (Data) publication |
---|---|---|---|---|
Bermuda Atlantic Time-series Study | BATS | PRJNA385855 | European Nucleotide Archive (ENA). (2024). Sample Metadata for Project Accession PRJNA385855 [Data set]. Retrieved from https://www.ebi.ac.uk/ena | Biller, S., Berube, P., Dooley, K. et al. Marine microbial metagenomes sampled across space and time. Sci Data 5, 180176 (2018). https://doi.org/10.1038/sdata.2018.176 | Bermuda Atlantic Time-series Study (BATS). (2024). BATS Oceanographic and Biogeochemical Data [bats_bottle.txt]. Retrieved from https://bats.bios.asu.edu/data/ | Biller, S., Berube, P., Dooley, K. et al. Marine microbial metagenomes sampled across space and time. Sci Data 5, 180176 (2018). https://doi.org/10.1038/sdata.2018.176 |
bioGEOTRACES | BGT | PRJNA385854 | European Nucleotide Archive (ENA). (2024). Sample Metadata for Project Accession PRJNA385854 [Data set]. Retrieved from https://www.ebi.ac.uk/ena | Biller, S., Berube, P., Dooley, K. et al. Marine microbial metagenomes sampled across space and time. Sci Data 5, 180176 (2018). https://doi.org/10.1038/sdata.2018.176 | GEOTRACES Intermediate Data Product Group (2023). The GEOTRACES Intermediate Data Product 2021v2 (IDP2021v2). NERC EDS British Oceanographic Data Centre NOC. doi:10.5285/ff46f034-f47c-05f9-e053-6c86abc0dc7e | Biller, S., Berube, P., Dooley, K. et al. Marine microbial metagenomes sampled across space and time. Sci Data 5, 180176 (2018). https://doi.org/10.1038/sdata.2018.176 |
Bio-GO-SHIP | BGS | PRJNA656268 | European Nucleotide Archive (ENA). (2024). Sample Metadata for Project Accession PRJNA656268 [Data set]. Retrieved from https://www.ebi.ac.uk/ena | Larkin, A.A., Garcia, C.A., Garcia, N. et al. High spatial resolution global ocean metagenomes from Bio-GO-SHIP repeat hydrography transects. Sci Data 8, 107 (2021). https://doi.org/10.1038/s41597-021-00889-9 | Larkin, A.A., Garcia, C.A., Garcia, N. et al. High spatial resolution global ocean metagenomes from Bio-GO-SHIP repeat hydrography transects. Sci Data 8, 107 (2021). https://doi.org/10.1038/s41597-021-00889-9 |
Hawaii Ocean Time-Series ALOHA (2003-2004; 2009) | HOT1 | PRJNA385855 | European Nucleotide Archive (ENA). (2024). Sample Metadata for Project Accession PRJNA385855 [Data set]. Retrieved from https://www.ebi.ac.uk/ena | Biller, S., Berube, P., Dooley, K. et al. Marine microbial metagenomes sampled across space and time. Sci Data 5, 180176 (2018). https://doi.org/10.1038/sdata.2018.176 | Data obtained via the Hawaii Ocean Time-series HOT-DOGS application; University of Hawai'i at Mānoa. National Science Foundation Award # 1756517 | Hawaii Ocean Time-series (HOT). (2024). HOT-DOGS: Data Organization & Graphical System for the Hawaii Ocean Time-series [Bottle_Extraction]. Retrieved from https://hahana.soest.hawaii.edu/hot/hot-dogs/index.html | Biller, S., Berube, P., Dooley, K. et al. Marine microbial metagenomes sampled across space and time. Sci Data 5, 180176 (2018). https://doi.org/10.1038/sdata.2018.176 |
Hawaii Ocean Time-Series ALOHA (2010-2016) | HOT3 | PRJNA352737 | European Nucleotide Archive (ENA). (2024). Sample Metadata for Project Accession PRJNA352737 [Data set]. Retrieved from https://www.ebi.ac.uk/ena | Mende, D.R., Bryant, J.A., Aylward, F.O. et al. Environmental drivers of a microbial genomic transition zone in the ocean’s interior. Nat Microbiol 2, 1367–1373 (2017). https://doi.org/10.1038/s41564-017-0008-3 | |
Malaspina Expedition | MAL | PRJEB52452 | European Nucleotide Archive (ENA). (2024). Sample Metadata for Project Accession PRJEB52452 [Data set]. Retrieved from https://www.ebi.ac.uk/ena| Sánchez, P., Coutinho, F.H., Sebastián, M. et al. Marine picoplankton metagenomes and MAGs from eleven vertical profiles obtained by the Malaspina Expedition. Sci Data 11, 154 (2024). https://doi.org/10.1038/s41597-024-02974-1 | Sánchez, P., Coutinho, F.H., Sebastián, M. et al. Marine picoplankton metagenomes and MAGs from eleven vertical profiles obtained by the Malaspina Expedition. Sci Data 11, 154 (2024). https://doi.org/10.1038/s41597-024-02974-1 |
Ocean Sampling Day 2014 | OSD | PRJEB8682 | European Nucleotide Archive (ENA). (2024). Sample Metadata for Project Accession PRJEB8682 [Data set]. Retrieved from https://www.ebi.ac.uk/ena| Ocean Sampling Day Consortium, Participants (2015): Registry of samples and environmental context from the Ocean Sampling Day 2014 [dataset]. PANGAEA, https://doi.org/10.1594/PANGAEA.854419 | |
Tara Oceans Project | TARA | PRJEB1787 | European Nucleotide Archive (ENA). (2024). Sample Metadata for Project Accession PRJEB1787 [Data set]. Retrieved from https://www.ebi.ac.uk/ena |
Observatory/Cruise | #s in metadata | #s metagenomes and paired end only | #s depth filtering ≤100m | #s after metadata filtering |
---|---|---|---|---|
Bermuda Atlantic Time-series Study | 62 | 62 | 40 | 40 |
bioGEOTRACES | 480 | 480 | 323 | 323 |
Bio-GO-SHIP | 996 | 971 | 969 | 969 |
Hawaii Ocean Time-Series ALOHA (2003-2004; 2009) | 68 | 68 | 33 | 28 |
Hawaii Ocean Time-Series ALOHA (2007-2009) | 54 | 0 | - | - |
Hawaii Ocean Time-Series ALOHA (2010-2016) | 773 | 597 | 230 | 230 |
Malaspina | 81 | 81 | 16 | 16 |
Ocean Sampling Day 2014 | 162 | 150 | 150 | 127 |
Tara Oceans Project | 136 | 136 | 95 | 92 |
Western Channel Observatory | 10 | 0 | - | - |
All | 2822 | 2545 | 1856 | 1825 |
To be added as the collection of metadata grows.
To be added as the collection of metadata grows.
To be added as the collection of metadata grows.