GfaViz is a graphical interactive tool for the visualization of sequence graphs in GFA format. It supports both the GFA1 and GFA2 formats.
GFA (Graphical Fragment Assembly) is an emerging standard format for representing sequence graphs. Although it was originally conceived as a format for sequence assembly (hence the name), and this remains its core application, it is more general, and able to represent many different types of sequence graphs, including scaffolding graphs, alignment graphs, variant graphs and splicing graphs.
The specification of GFA is available here: https://github.com/GFA-spec/GFA-spec. Two versions of the format were developed. GFA1 was the first released specification and is more focused on sequence assembly. GFA2 is a more powerful and extensible version of the format. New applications should implement the GFA2 format.
GfaViz can be compiled and installed on Linux and macOS operating systems. For the macOS system, version 10.12 or newer is required.
GfaViz is implemented in C++, using the Qt framework. For this reason, Qt5 needs to be installed on your system: for more information, please see the Qt installation manual.
Compiling GfaViz was successfully completed using GCC version 7.1.0 or newer and clang version 3.8.0 or newer. Warning: using an older version of the compiler can result in compilation errors.
The graph computation in GfaViz is performed by the OGDF library, which comes prebundled with GfaViz.
The following code will install GfaViz:
git clone https://github.com/ggonnella/gfaviz
cd gfaviz
qmake-qt5
make
Some systems have QT preinstalled, but without the libraries necessary for the QT SVG support. If this is the case, you may
either install the missing headers, or disable
the SVG support in GfaViz by using the NOSVG
parameter in the qmake-qt5
call:
qmake-qt5 NOSVG=true
The result will be a version of GfaViz unable to export the graph as SVG. All other functionality will be available.
GfaViz is run executing the binary file gfaviz
.
GfaViz has two user interfaces: a command line interface (CLI) and a graphic user interface (GUI).
The command line interface is not interactive as the GUI, but it is useful e.g. for very large graphs, for batch processing or for including GfaViz in software pipelines.
The default interface is the GUI. Using the option --no-gui
or -n
will allow to run in CLI mode.
When running in GUI mode, one or multiple filenames can be provided as argument of the command line. If none is provided, the files can be loaded from inside the GUI. When running in CLI mode, a single filename shall be provided. In the GUI, files can be opened from the File menu, under "Open GFA file". The GFA1 and GFA2 formats are accepted. To be sure that the input files are compatible, one can use format checkers such as GfaLint. To modify Gfa files we suggest to use the Python library GfaPy.
Two possible algorithms can be used for computing the layout. These algorithms are implemented by the OGDF library. The algorithms are: "Stress Minimization" (SM) and "Fast Multipole Multilevel Method" (FMMM). While each algorithm has its advantages and disadvantages, in general FMMM is faster than SM, but SM offers a better quality and is the default.
In order to compute the graph using FMMM, use the --fmmm
option. In CLI mode, this will apply to the rendered graph. In GUI mode, this will apply when the graph is first rendered, however, it can be then changed from inside the GUI.
In the GUI the layout options are located in the lower section of the right pane. Currently some parameters for the two layout algorithms can be set in the GUI (these cannot be set from the CLI in the current implementation).
Multiple options are available for changing the visual representation of specific kind of elements. This section does not handle the label styles (see the following section for them).
To change the style of specific element types, use the label options in the element type tab in the Style widget. To change the style of specific elements, select the element and use the options in the element type tab, making sure that the checkbox "apply changes to selection only" is on.
Several options can be used for changing the representation style of elements.
The options names are constructed by indicating first the type of objects on which the option applies (seg for segments; edge; dovetail and internal for particular kind of edges, i.e. dovetail overlaps and internal overlaps; group; gap; fragment) followed by the kind of option, e.g. width, outline width, color, etc. The name of the options combined with the help obtained using the --help
command line option should help in identifying the function of each option.
The representation of a segment is a polygon, which aims at being proportional to the length of the DNA segment which it represents. The polygon itself has a color (--seg-color
) and a given width (--seg-width
). Furthermore, it has, by default, a colored outline (--seg-outline-color
), which can be disabled by setting its width to zero (--seg-outline-width
). The segments can be represented optionally as arrows (--seg-as-arrow
), which is appropriate for some kind of graphs to indicate the directionality of the DNA strand.
A weight factor, applied to segments and fragments, controls the length of the segment in proportion to the DNA length (--weight-factor
). For a visible representation of very small segments, a minimal length is set by the --minweight
option.
Segments are internally represented by computing the position of some points, e.g. their extreme points, and connecting them. If only extreme points would be connected, segments would be straight. For a visually better representation, the representation of segments is not always straight. Thus the segment is divided into subsegments, and some internal points are so connected to each other. The number of subsegments depends of the segment length. Using the option --seg-max-sub
allows to control this behaviour. Setting this to 1 will draw all segments as straight lines, which will be desirable in some applications.
The representation of edges (GFA2) and the equivalent links and containments (GFA1) is controlled using the following options. The width of the line representing the edge is set by --edge-width
, its color by --edge-color
.
Further options differentiate between internal edges (which also affect containments) and dovetail edges. The options --internal-width
, --internal-color
, --dovetail-width
, --dovetail-color
allow to set the width and color of the two kind of edges independently. The options --internal-length
and --dovetail-length
allow to set the length of the edges of the two kind. By default internal edges have a longer length, so that they affect less the layout of the segments.
The position of the alignment of edges on the segments can be visualized as colored polygons on the segment. These are called "highlights" in GfaViz. To turn on highlights, use the --edge-highlights-show
option. The color can be set using the --edge-highlights--color
option. To use different random colors for each highlight, use --edge-highligths-color-random
.
Gaps are represented by dotted lines. By default, they connect segments, but their presence does not affect the layout. This can be changed by using the flag --gaps-as-edges
. If so, the length of the lines can be set by using --gap-length
.
Furthermore, the color of the line is set using --gap-color
. This however only affects positive-sized gaps (missing sequences). In some cases scaffolder add negative-sized gaps (overlaps of contigs, found during the scaffolding process). Their color can be changed using --neg-gap-color
.
Fragments represent alignments of external sequences to segments.
In order to represent the information in a fragment, the external sequence must be represented. As the sequence is external, its complete length is not available in the GFA. For this reason, the representation is similar to a segment (a rectangle in this case, whose width can be set by --fragment-width
and color by --fragment-color
), but their length is considered to be the length of the alignment on the external sequence. This is not always appropriate. For this reason, a length multiplier option has been implemented (--fragment-multlength
). The minimum length of the fragment representation can be set using the --fragment-minlenght
. This is useful to allow visibility of very short fragments.
The color of external sequences can be made dependent on the direction of the alignment to the segment. Different colors are set by using the --fwd-fragment-color
and --rev-fragment-color
options.
The second element of the representation is the alignment itself. This is similar to an edge, thus it is represented by a thin connecting line. The distance of the external sequence representation to the segments can be set using the --fragment-dist
option. The connecting line has a color, which can be changed by --fragment-conn-color
and its width can be set by --fragment-conn-width
.
Similarly to what is allowed for edges, the aligned portion of the segment can be highlighted on the segment. Highligths are turned on by setting --fragment-highlights-show
, and their color can be changed using the fragments-highlights-color
option.
Groups are represented in GfaViz by additional color outlines for the segments in the group. Multiple outlines can be nested, as a segment can belong to multiple groups. The width of the outline is set by --group-width
. The colors are set by --group-colors
, as a comma separated list for the groups, in the order they are specified in the GFA file.
To select an entire group from the GUI, double click on a element of the group, by keeping the CTRL key pressed (i.e. adding to the current selection).
Most elements in GFA can have a name, which can be vizualized in GfaViz as a text label. The label style options control the vizualization. Labels are not visualized by default.
Labels can be turned on, by using the --labels
option. If only the labels of specific kind of elements shall be turned on, use instead the --[seg|edge|gap|group|fragment]-labels
options.
The representation of the labels can be changed using specific options for single kind of lines, or for all kind of lines. The latter is done using options which start with --label-
. In particular, the font, size, color, outline width and outline color can be set. The same can be set for single kind of elements adding the prefix seg|gap|group|frag|edge
. An example would be --frag-label-outline-color
.
Segments labels are usually their IDs. Optionally the length of the segment can also be visualized using --seg-label-showlength
. Or the segment sequence can be used as label itself, using --seg-label-seq
.
To enable labels and change the visualization style, use the Label tab in the Style widget. To change the style of specific element types, use the label options in the element type tab in the Style widget. To change the label of specific elements, select the element and use the label options in the element type tab, making sure that the checkbox "apply changes to selection only" is on.
By default, all elements of the GFA file are visualized. Specific kind of elements or single elements can be hidden.
In the GUI, elements of a given type can be hidden using the Style widget, located in the lower section of the right pane. For each kind of element, an "hide element" checkbox is available.
The GUI also allows to hide a single specific element (instead of all elements of a kind). To do this, select an element (by clicking on it in the graph visualization, using the search box or the graph navigation widget) and click on "hide element" in the Style widget, making sure that the checkbox "apply changes to selection only" is on.
Elements of a specific type can be hidden using the CLI, using the options --no-gaps
, --no-fragments
, --no-groups
. NOTE: --no-segments
and --no-edges
seems to be missing.
In the CLI, single elements cannot be hidden directly. However, it is possible to specify in the GFA file, that an element shall not be visualized, by adding to the JSON tag vo
(which must be added if not present yet) the "no-X:true" value, where X is one of "segments, fragments, groups, edges, gaps". For example to hide a specific segment, which has not vo
tag yet, add a tag vo:J:{"no-segments":true}
to the line.
The graph can be rendered to vector graphics and raster bitmap formats, using the following options:
-r, --render Render graph(s) into file(s).
-o, --output <filename> Render graph(s) into <filename>
-f, --output-format <format> File format for the output. If no value
is specified, format will be inferred
from the file suffix specified in the
--output option. Possible values: BMP,
PNG, JPG, JPEG, PBM, XBM, XPM, SVG.
Default: PNG
-W, --width <width> Width of the output file in pixels.
-H, --height <height> Height of the output file in pixels.
-t, --transparency Transparent background in rendered
images (only png).
--bg-color <value> Background color.
Note that the options do not block the start of the GUI. In order to run GfaViz from the command line only, please use the --no-gui
option.
Stylesheets allow to use the same style for different graphs. Usually style files will be created using the GUI from an existing graph (in a following version of this manual, the syntax for style files will be specified in detail, to support creating stylesheets from scratch). Examples of stylesheets are in the directory "style" of the repository.
For applying a stylesheet, the usestyle option shall be used:
--usestyle <filename> Use the style options represented by the
stylesheet <filename>.
The tree navigation pane (top right in the GUI) shows the content of the GFA file in form of a navigable tree.
The functions of the tree are:
- obtain more information about an element, e.g. the length of a segment, or a tag of some element, without opening the text file
- selecting an element, by clicking on it on the tree
- exploring the connections of an element to other elements (e.g. edges connected to a link)
In the search button, the ID of an element or multiple IDs (separated by spaces) can be entered. This allows to search for named elements. Some elements always have an ID (segments), while other elements have an optional ID (e.g. edges, gaps, groups). Unnamed elements cannot be searched for with this method.
To see a list of all available commands use:
./gfaviz --help
The current version of TwoPacCo outputs non-standard GFA1 and GFA2 files. As the GFA2 has less issues (the edge lines miss a field), it is easier to use it (for GFA1 you need to fix C lines, remove duplicated lines and remove the paths lines, which are non-standard).
As long as the bugs in the TwoPacCo output are not fixed yet, you can correct the GFA2 files using the following:
sed 's/^E/E\t*/' mygraph.gfa2 > mygraph.fixed.gfa2
The fixed file can then be correctly visualized using GfaViz.
The graph computation in GfaViz is performed by the OGDF library. Their excellent work can be found here: http://www.ogdf.net/
The software is released under the ISC licence. Please see LICENSE.txt for details.
If you use GfaViz in your research, please cite:
Giorgio Gonnella, Niklas Niehus, Stefan Kurtz. GfaViz: Flexible and interactive visualization of GFA sequence graphs. Bioinformatics, bty1046 (2018). DOI: 10.1093/bioinformatics/bty1046
The GUI allows to select the format for saving the graph, between GFA1 and GFA2. Actually the file is always saved in the same format it was previously, regardless. The format selection option will be removed from the GUI in the following versions. To convert the format between GFA1 and GFA2, please use GfaPy.