Skip to content

Section 2 CLI Config Files

Gavin Huttley edited this page Jul 7, 2024 · 3 revisions

The Command Line Interface

EnsemblLite installation creates a elt command. In the terminal, type

elt

This will initialise the package and then display the available list of subcommands. Try typing elt exportrc

The Config File

Selecting what data you want from Ensembl is currently being specified using a plain text config file. The command line tool can generate a template for you. So let's do that now and then discuss the files and the config options.

Just for kicks, let's use the TUI. (A TUI is a terminal user interface.) This provides a point-and-click interface for a command-line application. (It makes it easier to explore the commands, but it also has some shortcomings.)

elt tui

You can use your mouse, or the tab and arrow keys to navigate. The main thing to look for now is the exportrc subcommand. We will enter sample as the value for the --outpath option. Note that as you type, elt is completing the terminal command for you. Click on the "Close & Run" button.

(The caveat with this interface is that the command is not recorded within your shell's history, so using up arrows will not recover it.)

The config files have distinct sections for the different types of data.

remote path

This is where you specify the FTP address for the Ensembl server containing the genomes of interest to you. You can pick any server you like as long as it matches exactly the one that's in this file. (Sorry, that's a lame joke, but at present we don't support any other Ensembl FTP servers.)

local path

Here the staging_path is the name of the directory where you want the download data to be put. The install_path is where you want the installation to go.

release

The version of Ensembl that you want data from.

compara

At present, we allow more options under the comparer section than we truly support. So let me focus on the two that really matter.

The align_names option is the name of a directory on the Ensembl FTP server containing the alignments that you are interested in. This can be a comma separated list of names. Those names must match exactly the names listed at the following location on the FTP site: https://ftp.ensembl.org/pub/release-112/maf/ensembl-compara/multiple_alignments/

The homology option indicates whether or not you want homology data for the genomes that you are going to be selecting. No information is required here aside from a correct syntax expression which is the word homology followed by an equal sign. If you have specified an alignment, you do not need to specify the homology option as this will automatically be added.

Note Currently don't support pairwise alignments. So if this is a feature that's important to you, create a discussion topic and tell us. Of course if somebody else has created that topic make sure and vote for it.

[species names]

At present indicating which species you want is done by indicating either the Latin name or common name inside square brackets followed by the line db = core. To see which names are supported take a look at the species.tsv file which was also created by the exportrc subcommand (see below).

There is a shortcut to naming species. At present if you want all of the species that are included as a part of one of the whole genome alignment sets then you only need to specify that alignment.

The species.tsv file

This file is a tab delimited file that contains the latin name and common names of the species present at ensembl.org. (At the moment it also includes a column for the species prefix of ensemble identifiers however this will be discontinued.)

The contents of this file are used to validate the species names entered into the file indicated above and other operations that are executed by elt.

WARNING While at present this file is included in the repository we will be changing this so it is downloaded and always up to date.

The exercise

Edit the sample.cfg file so that it will download the genomes for yeast and c. elegans and homology data.

Make sure you specify a sensible destination for the staging data and for the installation.

When you have done this we will proceed to the next step.