Skip to content

Usage details (advanced)

Roan LaPlante edited this page Mar 21, 2014 · 27 revisions

This page is a somewhat disjointed smattering of documentation about how the program runs. Most of the details here document features and design choices for how to do data representation that may not be immediately obvious. Some of these features -- mainly the semantics of ordering files and the handling of max edges and thresholding -- have a considerable effect on the program behavior and performance.

Command line arguments

In order to display connectivity from your data, cvu requires three things:

  • A Freesurfer parcellation (freesurfer .annot files, GIFTI support coming soon)
  • An ordering file (a text file with ROI names)
  • A matrix (common matrix formats such as .npy, .mat, or .txt)

The recommended program operation is to run the program with no arguments. That way, the program will automatically display some sample data and you will be able to specify all of these items via the GUI (see Getting Started).

However, it is also possible to load these arguments, and more, using the command line.

  • -d greg, --subjects-dir=greg The location of SUBJECTS_DIR, which is used to determine the location of the annotation, surface, and segmentation files. The default value of this argument is path_to_cvu/cvu, which is meant to be used with fsaverage5. If you specify this argument, you will also have to provide a parcellation, surface, and subject name, along with an appropriate ordering file (specifying visualization order, but you can also optionally specify adjmat order: see Getting Started), along with an appropriate adjacency matrix.
  • -p greg, --parc=greg The parcellation stem. The parcellations will be located at ${SUBJECTS_DIR}/label/lh.greg.annot and ${SUBJECTS_DIR}/${SUBJECT}/label/rh.greg.annot (freesurfer format). If these files do not exist, ${SUBJECTS_DIR}/${SUBJECT}/label/*h.greg.gii (GIFTI format) will be checked. The default value of this argument is sparc, a custom parcellation for the sample data. If this argument is provided to the command line, you will have to provide at least an ordering file and an appropriate adjacency matrix.
  • -s greg --surf=greg The surface type to be used. The surfaces will be located at ${SUBJECTS_DIR}/surf/lh.greg and ${SUBJECTS_DIR}/${SUBJECT}/surf/rh.greg (freesurfer format). If these files do not exist, ${SUBJECTS_DIR}/${SUBJECT}/surf/lh.greg.gii and ${SUBJECTS_DIR}/${SUBJECT}/surf/rh.greg.gii (GIFTI format) will be checked. The default value of this argument is pial. You could safely change this to an appropriate surface, such as white, without specifying additional arguments. Using the inflated surface is not recommended, but is okay for visualizing just a single hemisphere.
  • -o greg.txt, --order=greg.txt Location of the ordering file. This ordering file specifies visualization (or parcellation) order, not matrix order (see Getting Started). You should make sure this ordering file corresponds to your choice of parcellation and matrix. The default value is an ordering file consisting of the entries of the sparc parcellation in alphabetical order.
  • -a greg.mat, --adjmat=greg.mat Location of the adjacency matrix. The default value is the sample data, a 138x138 matrix registered to the sparc parcellation.
  • --adj-order greg.txt Location of a second ordering file used to specify the matrix order of the provided matrix. The default value is None.
  • -q Quiet, turns off some of the messages that are printed to the console during program operation. Does not affect the errors shown in the GUI.
  • -v Verbose (currently does nothing)
  • --max-edges 46000 Discards all but the strongest ~46000 connections. See the section on max edges below
  • -h --help Display a description of the allowed arguments.
  • --script test_script.txt Runs a script. See Scripting

It is important to note that, everything you can do using the command line, you can also do from within the GUI. The only exceptions are for a few operations that only have an effect on the console. Running an external script can be done within the program, but also from the command line. Verbosity or quietness level also only affects the console output, and you cannot adjust it from the GUI, but this is mostly useful only for purposes of debugging.

There is only one thing you can do in the command line that cannot also be done in the GUI: specify the verbosity level. This does not affect anything except how much infomation is printed to stdout or stderr.

ordering files

This information is described within the Getting started tutorial, but as it is somewhat complex it is reproduced here as a more technical reference that explains the details of how ordering files are used and the program's behavior.

Ordering files are text files that contain entries with region names such as lh_frontalpole_2.

All ordering files must have entries that look like this, with a hemisphere and an underscore before the region name. All entries in ordering files are case insensitive.

There are two types of ordering files. These are visualization ordering files, which refer to the intended order of ROIs present in the visualization and are specified when loading a parcellation, and matrix ordering files, which refer to the order of ROIs as they actually are in the matrix and are specified when loading a matrix. These ordering files have slightly different semantics, as descibed below.

###Visualization ordering files

Every entry in a visualization ordering file should correspond either to a label present in the annotation file for the parcellation, or to a known subcortical structure. If it is anything else, it will be ignored and print a warning to the console (except in quiet mode), unless the entry is the special keyword delete in which case no warning will be printed.

Annotation files are specific to a single hemisphere, and have labels with names like frontalpole_2. The inclusion hemisphere in the ordering file is thus needed because many of the labels will overlap (MNE python takes care of this by appending the hemisphere to the label name). Each label in the annotation that does not correspond to an entry in the visualization ordering file will not be used in the parcellation, and when the annotation is loaded a warning will be printed to the console stating that the label was not used (except in quiet mode).

Subcortical structures are specified a little differently. In this case, a segmentation file (located at ${SUBJECTS_DIR}/${SUBJECT}/mri/aseg.mgz will be used in order to find the locations of the structures. Due to the nature of the segmentation, only certain accepted keywords are considered valid subcortical structures. They are as follows:

  • hippocampus
  • amygdala
  • thalamus
  • caudate
  • putamen
  • pallidum or globus pallidus
  • insula
  • nucleus accumbens or accumbens or accumbens_area

These keywords will automatically be translated into subcortical structures when they appear in a visualization ordering file. If the segmentation file does not exist and one of these structures is specified in the visualization ordering, the operation will return an error.

###Matrix ordering files

Matrix ordering files specify the order of ROIs within a matrix. When a parcellation is loaded successfully, the labnam attribute of the corresponding dataset is populated with the names of items in the parcellation that were successfully used. Each entry in the matrix ordering file must locate a corresponding entry in the labnam attribute -- unless that entry is the keyword delete. If an entry is not found in the parcellation currently loaded in the program, an error will be returned.

If an entry is delete, it denotes that the entry in the matrix is not in the parcellation and should be skipped (unless the ignore deletes option is selected, in which case it is ignored but does not cause a row of the matrix to be skipped). The presence of any delete entries (except with ignore deletes) requires that the matrix be of larger size than the parcellation.

Matrix ordering files can be used for node-specific scalars or community affiliation vectors as well.

###Tips about ordering files

In practice there are a few convenient behaviors of these semantics, with the intention that you should be able to widely reuse ordering files for a particular parcellation.

The most inconvenient feature is that each entry in an ordering file except delete must correspond to an ROI in the parcellation. For instance, if your ordering file contained an entry fishblag, and you tried to use this file as both a visualization and matrix file, it would fail, because visualization order prints a warning and ignores this entry, while in a matrix ordering it would fail to find the corresponding entry in the parcellation. These same semantics would apply if your ordering file contained an incorrect entry such as, say, lh_fronatlpole (note the misspelling).

This considerable inconvenience aside, there are two very good features about ordering file semantics.

  1. You can use the same ordering file for both visualization and matrix order, just to get rid of unwanted regions such as lh_unknown from Freesurfer's aparc or the lausanne parcellations that are automatically inserted into the matrix by tools that operate on annotations (such as mri_segstats).
  2. You can create a matrix in a convenient order to compute (such as alphabetical order), but display the data in a more principled order that has something to do with the spatial organization and/or other properties of the ROIs themselves. cvu comes with adaptations of the lausanne and Freesurfer aparc organized into "cmp order." This order is named after the connectomemapper, which uses a principled ordering that starts at frontal pole, and proceeds to wrap around the brain in a circle, proceeding to motor cortex, parietal cortex, occipital areas, and temporal areas before finishing near temporal pole. This is not necessarily the best ordering, but it is a start (see the tutorial in Getting started for a demonstration).

max-edges and thresholds

A little bit of explanation is required to understand the purpose of the max edges option, which can be specified on the command line or upon loading an adjacency matrix from within the GUI.

When the number of connections is very large, several things happen. One result is that the visualization becomes cluttered and uninformative -- there are so many connections that it is impossible to see anything or usefully make sense of the data. The point at which the data becomes too cluttered and nothing can be seen depends on the strength distribution of the network, but in most situations it becomes very difficult to make sense of a visualization with more than between 1000-5000 connections.

Another problem has to do purely with performance. When cvu has to repeatedly process a very large number of connections, it can become very slow -- so slow that the program becomes pretty much unusable. The bottleneck is actually not the 3D brain (which handles large numbers of connections efficiently by storing them all in the same object) but the circle plot. Each edge in the 3D brain is representing by having a few matrices that are slightly larger than they were before that represent the vector and the nodes connected by the edge. Each edge in the circle plot is represented with its own path object, which is costly. Disabling rendering of the circle plot is one solution that can alleviate this problem somewhat, but still only goes so far because adjusting the connections shown on the 3D brain is still O(n2) in the number of nodes. Realistically, at around roughly 40000 connections the program is much too slow to use (This might be relaxed slightly if circle rendering is disabled).

Addressing these two different problems requires different approaches. The first problem is a visualization problem, which can be fixed by adjusting the number of connections that are shown on the screen at any one time. The user can adjust this by modifying the threshold.

The second problem is a technical problem. Because the visualization problem typically supersedes the technical problem, the strategy cvu uses is to throttle the number of edges that the program is willing to process/store at all, such that any edges beyond the N largest edges (where N is the max edges parameter) are completely ignored and can't be recovered without reloading the matrix. Again, the user has control over this parameter, but in a way that is a little less interactive.

Note that these problems only affect the 3D brain and circle visualizations and representations. The matrix representation, by contrast, is able to easily store visualize an arbitrarily large number of edges. So neither the threshold or max edges option have any effect on the matrix view.

###Thresholds There are two types of thresholds, only one of which is active at any given time. These are a proportional threshold and an absolute threshold. When the program is first started, the threshold type that is chosen initially is the proportional threshold, which has an initial value of .95. This means that 95% of the connections are not currently displayed. For sparse datasets, such as diffusion data where many entries are zero, this offers a poor visualization with only a few connections. In a future release cvu will automatically try to adjust this parameter upon loading a matrix, such that between 500 and 1000 connections are always shown regardless of what proprtion of edges this is, but this isn't done quite yet.

The absolute threshold allows you to specify a specific cutoff point, such that all connections with a value greater than the threshold are shown. This depends on the nature of the connectivity data, which can be anything. For instance, phase-locking value data ranges between 0 and 1, so an appropriate threshold might be 0.1. But other types of data might have different or potentially unbounded ranges. It is up to the user to specify a threshold of an appropriate quantity to use this option, which is never enabled by default. As a rule, I would use proportional thresholds for most purposes (including general exploration of the data), and absolute thresholds for the generation of statistics and figures for which precision is important.

###Max edges When max edges is 0, or left unspecified on the command line (in this case it defaults to 0), cvu takes this as a placeholder default value and actually discards all but the 20,000 largest edges. These discarded edges are completely gone; they are not included at all in the 3D brain or circle visualizations, they are just gone. The only way to show them back in these visualizations is to reload the matrix with a higher value of max edges.

Note that the number 20,000 is not exact. This number is an approximation. Suppose that you had 5 edges tied for 20,000th place, how would you choose one? You might assume it doesn't really matter which one you choose -- and if you have 5 edges, you'd be right. But suppose you had 5,000 or 35,000 edges tied for 20,000th place -- in this case it is a big deal. Whether or not it is a big deal, cvu handles this edge case as follows: all of the edges in the tied position are included, so that the 20,000 largest edges are actually the 20,005 largest edges. No big deal, from a performance perspective.

However, what if the number of edges tied for 20,000th place is 35,000? This is actually not unrealistic, I have had it happen in real data with the following parameters -- suppose you had a parcellation with around 40,000 possible edges but the matrix only had 5,000 data points. In this case, a huge number of edges with value 0 would be tied at the cutoff, that actually aren't needed and you wouldn't ever want to visualize. cvu handles this as follows: if the number of edges that are tied exceeds 200, instead of being kept, they are all thrown away instead. This will work in most situations, but it is possible to run into a matrix where this happens to be the wrong value. In that case, you could adjust this behavior by simply increasing the value of max edges.

There may be purposes for which users want to work with far more connections than I have envisioned. There is one good reason I can imagine: generation of extremely high-resolution images. Increasing the max edges parameter affords this flexibility.

IMPORTANT TL;DR: If you do nothing, and ignore max-edges whenever you load data, only the 20000 strongest connections will be displayed!

Note also that edge thresholding and the max edges parameter affect the 3D brain and circle plot visualizations only, which become cluttered with a huge number of edges. They do not affect the connectivity matrix view in any way, nor do they affect the underlying data held in matrix form. Consequently they are completely unrelated to calculating network statistics. When calculating network statistics, you must specify a threshold from which to threshold the matrix so that the statistical measure can be generated. This threshold is totally unrelated to the threshold used for visualization because it contains all the edges, not just those left by max edges. See Statistics.

Datasets and View manager

cvu is divided into multiple ''datasets.'' A dataset consists of a parcellation, a matrix, and all the settings and metadata that apply to the current visualizations described by that dataset. When the program starts, only one dataset is shown. It is called sample data. This is true even if command line arguments were used to load some data other than the sample data upon loading the program.

Each operation in almost every window operates on a specific dataset. The dataset it refers to is selected using a drop down menu (that is initially populated with only the sample dataset). Different data can be loaded on top of each dataset -- for example, you can start the program with the normal sample data, and then load some other parcellation and matrix without creating a new dataset. In this case, you have replaced the sample dataset -- but it is still called "sample dataset". You can rename this or any other dataset in the manage views dialog by clicking on the entry with the dataset name.

When you load new parcellations, you have the option to select "new dataset." If you do, a new window will be spawned containing the new dataset and program control will proceed as normally. The program now is keeping track of multiple datasets, and different dialogs should be able to operate on different datasets arbitrarily based on the drop down menus. Each menu keeps track of what dataset it refers to separately, so make sure to adjust these before applying operations (this is quite annoying, and I will definitely try to improve it at some point but it works currently and I have other priorities in development).

I suggest loading data on top of the sample dataset unless you have multiple matrices you would like to specifically visually compare.

Note also that closing a window does not cause cvu to lose track of a dataset. You can still reopen that dataset in the view manager, and it will pop up in a new window. In order to cause cvu to really forget a dataset, you have to delete it from the view manager.

fsaverage5 or subject-specific morphology?

Most of the time individual morphology is not necessary to show in cvu nor is it relevant to visualizing the network. In fact, it can be (slightly) detrimental to performance because the average brain fsaverage5 which is used by default is heavily and has an order of magnitude fewer vertices to encode. For this reason, unless there is a specific reason to do so, I recommend using fsaverage5 for most network and not subject brains.

The one exception, is when you wish to use cvu to visualize tractography. Typically the benefit of doing this as compared to using a more specialized tractogaphy viewer such as TrackVis, is that you can overlay the tractography with abstract connectivity and compare the two in the same visualization. In this case, you will need to use the subject's individual morphology.

If you are using a customized parcellation other than the Lausanne parcellations (see Strategies for parcellating and matrix creation), you might need to morph this parcellation to fsaverage5 using freesurfer, using either a slower process of mri_label2label and mris_label2annotation or using mris_ca_label with the parcellation atlas directly on the fsaverage5 brain. You only have to do this once, but it is a long and annoying process. So, use of individual subject morphology in order to not do that, is also a pretty good reason.

circle plot labels

The circle plot needs to be readable. In high resolution parcellations, the default behavior was to display all labels which caused the labels to be anything but readable. As such, cvu uses a very complex algorithm to omit many labels from the plot in order to ensure readability of certain labels and to try to annotate many important regions. Most of the time this is completely automated, and will not require any input from the user or cause any problems -- you probably won't notice anything other than that the circle plot is nice and awesome and the labels can be read, yay!

However, there is a situation in which this is a serious problem: suppose that you wanted to use the circle plot in order to analyze connectivity aound a particular region R. You might want to make a figure showing the connections emanating from this region, and then publish it in a paper. You are particularly interested in whether that region has strong connectivity to other regions A, B, and C. In this case, you might not care about most of the labels on the plot, but the plot should absolutely have labels for R,A,B,C. But they might have been removed by the algorithm which ensures that the plot is readable.

In this case, you can specify a number of labels that need to be included in the plot. These labels are stems of the region names -- for instance, a label name in the ordeing file might be rh_superiorfrontal6, but only rh_superiorfrontal would appear on the circle plot. This is mostly a manual task -- it is not sanitized and it is required to spell the label stems exactly as they are written in the ordering file. These labels all have to be specified at the time when the matrix is loaded. The relevant part of the "load adjmat" dialog is shown here:

cvu guarantees that any labels you specify here will be included in the plot, possibly at the expense of other labels that have not been specified. It is possible in principle to overspecify -- imagine the extreme case where you required every label in the parcellation to be included for a high resolution parcellation. If this happens, cvu will crash and tell you that it could not find a suitable arrangement including all the labels you asked for. This has never happened to me except when I was explicitly testing this functionality. If you legitimately run into this problem without trying to, please let me know, because I will be stunned.

Regardless of which labels are omitted from the plot, they can always be seen by mousing over a region which will display a tooltip with the label.