A versatile software package for pan-genome analysis, including both GET_HOMOLOGUES and GET_HOMOLOGUES-EST. It includes algorithms designed for:
- Clustering coding sequences in homologous (possibly orthologous) groups, on the grounds of sequence similarity. By default GET_HOMOLOGUES compares protein sequences, while GET_HOMOLOGUES-EST aligns nucleotide sequences (CDS or transcripts).
- Definition of pan- and core-genomes by calculation of overlapping sets of protein or nucleotide sequences.
GET_HOMOLOGUES has been used mostly with bacterial data (see citing papers).
Instead, GET_HOMOLOGUES-EST has been used mostly with plants (see citing papers) and was originally benchmarked with genomes and transcriptomes of Arabidopsis thaliana and Hordeum vulgare and the pan-genomes of Brachypodium distachyon and Brachypodium hybridum (press release).
Installation instructions, including the bioconda package, are available in the manual and the README.txt file.
Check also the Docker image.
Manuals are available at:
version | HTML |
---|---|
original, for the analysis of bacterial pan-genomes | manual |
EST, for the analysis of intra-species eukaryotic pan-genomes | manual-est |
In addition, there are two tutorials are available:
-
Pangenome analysis of plant transcripts and coding sequences, published in 2022.
-
From genomes to pangenomes: understanding variation among individuals and species, which includes step by step instructions for both bacterial and plant data, first released in 2017.
The original GET_HOMOLOGUES, suitable for bacterial genomes, was described in:
Contreras-Moreira B, Vinuesa P (2013) Appl. Environ. Microbiol. 79:7696-7701
Vinuesa P, Contreras-Moreira B (2015) Methods in Molecular Biology Volume 1231, 203-232
GET_HOMOLOGUES-EST, adapted to the study of intra-specific eukaryotic pan-genomes and pan-transcriptomes, was described in:
Contreras-Moreira B, Cantalapiedra CP et al (2017) Front. Plant Sci. 10.3389/fpls.2017.00184
GET_HOMOLOGUES is designed, created and maintained at the Computational and Structural Biology group at Estación Experimental de Aula Dei, Consejo Superior de Investigaciones Científicas (EEAD-CSIC) and at the Center for Genomic Sciences of Universidad Nacional Autónoma de México (CCG/UNAM).
The program was written mostly by Bruno Contreras-Moreira and Pablo Vinuesa, with contributions from Carlos P Cantalapiedra, Alvaro Rodríguez del Rio, Rubén Sancho, Roland Wilhelm, David A Wilkinson and many others (see CHANGES.txt). It also includes code and binaries from other authors:
- OrthoMCL v1.4, PubMed:12952885)
- mcl v14-137, PubMed=11917018)
- COGtriangles v2.1, PubMed=20439257)
- NCBI Blast-2.16.0+, PubMed=9254694,20003500
- BioPerl v1.5.2, PubMed=12368254)
- HMMER 3.1b2
- Pfam, PubMed=19920124)
- PHYLIP 3.695
- Transdecoder r20140704, PubMed=23845962)
- MVIEW 1.60.1, PubMed=9632837)
- diamond 0.8.25, PubMed=25402007)
GET_PHYLOMARKERS uses twin nucleotide & peptide clusters produced by GET_HOMOLOGUES to compute robust multi-gene and pangenome phylogenies. Check the manual, the tutorial, and the Docker image.
A related piece of software was released in 2023 called GET_PANGENES, which takes FASTA and GFF files as input and explicitely considers gene collinearity by computing whole genome alignments.
The code is regularly patched (see CHANGES.txt) in each release. We kindly ask you to report errors or bugs as GitHub issues and to acknowledge the use of the software in scientific publications.
Fundación ARAID, Consejo Superior de Investigaciones Científicas, DGAPA-PAPIIT UNAM, CONACyT, FEDER, MINECO, DGA-Obra Social La Caixa.
GET_HOMOLOGUES is part of the INB/ELIXIR-ES resources portfolio: