Skip to content

Latest commit

 

History

History
45 lines (44 loc) · 18.6 KB

README.md

File metadata and controls

45 lines (44 loc) · 18.6 KB

#Bio Datasets

To contribute, make changes to bio_datasets.csv and run python create_readme.py

Dataset name Link Short Description API available TAGS
1 RCSB Protein Data Bank (PDB) https://www.rcsb.org/ A comprehensive database for the three-dimensional structural data of large biological molecules, including proteins and nucleic acids. Yes dna, proteins, rna, small_molecules
2 PubChem https://pubchem.ncbi.nlm.nih.gov/ A database of chemical molecules and their activities against biological assays, containing information on small molecules, nucleotides, and carbohydrates. Yes interactions, small_molecules
3 UniProt https://www.uniprot.org/ A comprehensive resource for protein sequence and annotation data, providing information about the function and structure of proteins. Yes proteins
4 The Human Protein Atlas https://www.proteinatlas.org/ An interactive database providing high-resolution insights into the spatial distribution of proteins in human tissues and cells. No proteins
5 BindingDB https://www.bindingdb.org/rwd/bind/index.jsp BindingDB is a public, web-accessible database of measured binding affinities, focusing chiefly on the interactions of proteins considered to be candidate drug-targets with ligands that are small, drug-like molecules. Yes proteins, small_molecules
6 DrugBank https://go.drugbank.com/ A unique bioinformatics and cheminformatics resource that combines detailed drug data with comprehensive drug target information. Yes with registration drugs, interactions
7 KEGG: Kyoto Encyclopedia of Genes and Genomes https://www.genome.jp/kegg/ A collection of databases dealing with genomes, biological pathways, diseases, drugs, and chemical substances. Limited drugs, genes, glycans, human_diseases, interactions, pathways, proteins, small_molecules
8 STRING https://string-db.org/ A database of known and predicted protein-protein interactions, including direct (physical) and indirect (functional) associations. Yes interactions, proteins
9 NCBI Gene Expression Omnibus (GEO) https://www.ncbi.nlm.nih.gov/geo/ A public repository that archives and freely distributes comprehensive sets of microarray, next-generation sequencing, and other forms of high-throughput functional genomic data. Yes genes
10 ChEMBL https://www.ebi.ac.uk/chembl/ A manually curated database of bioactive molecules with drug-like properties, focusing on the chemical, bioactivity and genomic data. Yes interactions, small_molecules
11 Ensembl https://www.ensembl.org/ A comprehensive source of genomic information, integrating genomic, transcriptomic, proteomic, genetic, and other data. Yes genes
12 Reactome https://reactome.org/ A free, open-source, curated and peer-reviewed pathway database that provides insights into molecular processes and pathways in human biology. Yes interactions, pathways
13 Gene Ontology Consortium http://geneontology.org/ A major bioinformatics initiative that provides a controlled vocabulary to describe gene and gene product attributes in any organism. Yes gene_ontology, pathways
14 Human Metabolome Database (HMDB) https://hmdb.ca/ A richly annotated resource that offers detailed information about small molecule metabolites found in the human body. Yes small_molecules
15 InterPro https://www.ebi.ac.uk/interpro/ A database that provides predictive information about protein families, domains, and functional sites. Yes nan
16 Pfam https://pfam.xfam.org/ A comprehensive database of protein families, each represented by multiple sequence alignments and hidden Markov models (HMMs). No proteins
17 The Cancer Genome Atlas (TCGA) https://www.cancer.gov/tcga A comprehensive and coordinated effort to accelerate the understanding of the molecular basis of cancer through genome analysis techniques. Limited cancer, drugs, genes, human_diseases, treatments
18 Allen Brain Atlas https://www.brain-map.org/ A growing collection of online public resources integrating extensive gene expression, connectivity, and histology data, with the aim to further our understanding of the brain. Yes cells, genes, rna
19 GlyTouCan https://glytoucan.org/ The international glycan structure repository, which provides a platform for the registration of glycan (sugar chain) structure information. Yes glycans
20 The Zebrafish Information Network (ZFIN) https://zfin.org/ The premier database for zebrafish genetic, genomic, developmental, and physiological information. Yes gene_ontology, genes, human_diseases, proteins
21 FlyBase http://flybase.org/ A comprehensive database for information on the genetics and molecular biology of Drosophila (fruit flies). Yes gene_ontology, genes, human_diseases, pathways, proteins
22 WormBase https://www.wormbase.org/ A database for biology and genome information for the nematode model organism, C. elegans, and related species. Yes cells, gene_ontology, genes, human_diseases, pathways, proteins, rnai
23 Mouse Genome Informatics (MGI) http://www.informatics.jax.org/ A comprehensive resource for data on the laboratory mouse, integrating genetic, genomic, and biological data. Yes genes, human_diseases, pathways, proteins
24 YeastMine https://yeastmine.yeastgenome.org/yeastmine/begin.do A data warehouse for the budding yeast Saccharomyces cerevisiae, providing access to gene, protein, and network data. Yes genes, human_diseases, interactions, pathways, proteins
25 BRENDA https://www.brenda-enzymes.org/ A comprehensive enzyme information system providing data on enzyme nomenclature, structure, function, and related properties. Yes genes, interactions, ligands, proteins
26 TAIR (The Arabidopsis Information Resource) https://www.arabidopsis.org/ A database of genetic and molecular biology data for the model higher plant Arabidopsis thaliana. Yes with registration dna, genes, proteins
27 ArrayExpress https://www.ebi.ac.uk/arrayexpress/ A repository for functional genomics experiments including gene expression where you can query and download data collected to MIAME and MINSEQE standards. Yes experiments, genes, proteins
28 Europe PMC https://europepmc.org/ A free, comprehensive database of life science and biomedical literature. Yes articles
29 dbSNP (Database of Single Nucleotide Polymorphisms) https://www.ncbi.nlm.nih.gov/snp/ A central repository for both single base nucleotide substitutions and short deletion and insertion polymorphisms. Yes genes
30 miRBase http://www.mirbase.org/ A database of published miRNA sequences and annotations, providing information on microRNA biology. Yes genes, rna
31 GTEx Portal https://gtexportal.org/home/ Provides data on gene expression and regulation in multiple human tissues, facilitating studies on the relationship between genotype and phenotype. Yes cells, genes
32 BioGRID https://thebiogrid.org/ A resource for studying protein-protein and genetic interactions in multiple organisms, including humans, yeast, flies, and worms. Yes interactions, proteins
33 GenBank https://www.ncbi.nlm.nih.gov/genbank/ The NIH genetic sequence database, an annotated collection of all publicly available DNA sequences. Yes genes
34 SILVA https://www.arb-silva.de/ A comprehensive database of high-quality ribosomal RNA sequence data, supporting research in the phylogeny and taxonomy of microbial and other organisms. Yes genes, rna
35 ENCODE (Encyclopedia of DNA Elements) https://www.encodeproject.org/ A project that aims to catalog all the functional elements in the human genome, including regions of transcription, transcription factor association, chromatin structure, and histone modification. Yes functions, genes, pathways, rna
36 EMBL-EBI Metabolights https://www.ebi.ac.uk/metabolights/ A resource for metabolomics experiments and derived information, hosting a wide range of metabolomics data including raw and processed data, metabolite structures, and bioinformatics analyses. Yes experiments, pathways, reactions, small_molecules
37 PharmGKB https://www.pharmgkb.org/ A knowledge base that collects, curates, and disseminates information about the impact of human genetic variation on drug response. Yes drugs, genes, human_diseases, pathways, treatments
38 The Cancer Imaging Archive (TCIA) https://www.cancerimagingarchive.net/ A service providing access to a large archive of medical images of cancer, available for public download. Yes cancer, imaging
39 RxRx3 https://www.rxrx.ai/rxrx3 RxRx3 is a publicly available map of biology that represents a small subset – less than 1% – of Recursion’s total dataset. Yes with registration cells, genes, imaging
40 ImmPort (Immunology Database and Analysis Portal) https://www.immport.org/ A repository of data from diverse immunology studies, including vaccine trials, infectious disease research, and autoimmune diseases. Yes articles, experiments, human_diseases