Skip to content

Network statistics and modular structure

Roan LaPlante edited this page Apr 30, 2014 · 4 revisions

cvu was built with the intention of incorporating network statistics into the visualization. cvu distinguishes between two types of network statistics: modular structure, and scalars.

Modular structure is the partitioning of the network into related (i.e., interconnected) communities. Correctly partitioning a network into an optimal subset of communities is known to be an NP-complete problem; however, there are numerous approximation algorithms available in the scientific literature on graph theory to rapidly estimate community structure. Most of these algorithms are global optimization algorithms focusing on the maximization of modularity, which is the proportion of edges falling within community boundaries and the set of all possible edges. There are a variety of other algoithms, but the algorithms included in cvu to automatically estimate modular structure are of this type and basically they do a good job. Modular structure is visualized specially.

Scalar statistics are almost anything that is not modular structure. Specifically, in cvu scalar statistics are any scalar nodewise measure of the network -- that is, every node in the network has assigned to it a scalar value. For instance, node degree is a scalar measure. (It is worth noting that modular structure can also be coded as a scalar measure -- for instance, any node in module 6 takes value 6 and so on. This is in fact exactly how most algorithms treat it. That said, the discrete value has no quantitative meaning and a specialized, discrete visualizations of modular structures is more suitable.)

Calculating statistics

cvu provides a basic array of network statistics that can be calculated natively, in addition to modular structure. These statistics are calculated using bctpy. To calculate either modular structure or scalar statistics, click on calculate stats.

The calculation window provides a few options -- notably, it either generates statistics, or modules. If modules, it will overwrite any existing modules in the dataset; if statistics, it will overwrite any existing statistics in the dataset. If it calculates statistics, the specific statistics that are calculated are those listed in the corresponding dataset's options menu:

Note that a few of these measures (namely, modularity, participation coefficient, and within-module degree) are dependent upon modular structure. The within-module degree is a measure of node degree exclusive to the network inside a particular modular structure. In order to calculate this, the modular structure must be already specified -- that is, the dataset must already have a modular structure attached to it (typically by having just calculated it beforehand). If these measures are specified but no modular structure exists, the program will return an error.

There is also an optional threshold, which will threshold some of the connections in the adjacency matrix before doing the calculation. This threshold works the same as the threshold in the options menu, except that it only applies to the calculation of modules or scalar statistics. To not enforce a threshold, set it to 0.

Loading scalars

The number of network measures that can be calculated in bctpy is limited by the number of network measures that have been implemented, documented, and discovered. Furthermore, they are limited by the algorithms that have been implemented, which are sometimes approximations of the true values of interest.

For this reason, network statistics -- both modules and scalars can additionally be loaded as common matrix files. For instance, if you were developing an improvement on node degree called the pairwise-mega-awesome-degree, you could calculate that measure in MATLAB (or python, or FORTRAN, or whatever you like), store the result in a common matrix format (MATLAB, numpy, or text file), as an Nx1 matrix, and then load the measure into CVU.

The process to do this is almost identical to that of loading an NxN adjacency matrix, except the scalar values are size Nx1 and not NxN. The ordering file specifies the ordering of nodes in the scalar matrix, and if no ordering file is specified then the scalar matrix is assumed to already be in the same order as the parcellation (for more information see ordering files)

If the measure is a scalar, you will need to give it a name. The reason why will be shown shortly.

Viewing scalars

If you click on show scalars, you will see something like this:

For each of five possible visualizations of the scalar values, you will see a choice of what scalar to place on that visualization, or None to not use that method of conveying scalars.

The scalars are populated from both the scalar statistics that have been calculated, and any scalars that have been loaded. So in the above dialog, clustering coefficient, average strength, eigenvector centrality and binary k-core are measures that are calculated natively in cvu using bctpy. arbitrary scalar quantity is a Nx1 matrix (in this case, containing random numbers) that I created using a random number generator in python and loaded into the program.

The reason why it is necessary to name your scalars when loading them is that any entries in this dialog use their name as the key. So if you loaded a new set of scalars and used the name clustering coefficient, the old scalars for under that name would be replaced.

Four of the five loci of scalar visualization are shown below:

No scalars were projected onto the surface. Projecting the scalar values onto a transparent glass surface is not generally very useful.

Module visualization

There are two modes of module visualization. These are, viewing all modules, and viewing one module. They are shown side by side:

Hopefully, this is fairly self-explanatory: you click on view modules, and it pops up a box that allows you to pick either all modules or one. But there are some interesting tidbits. In the options, you can specify an important option viewing a single module -- you can examine intramodular (within-module) connectivity within a single module, intermodular (between-module) connectivity, or both.

The colors chosen when viewing all modules are randomly chosen from a set of approximately 20 colors (more intermediate colors are randomly generated if the number of modules is greater than 20). You can repeatedly view all modules in order to get a better color scheme, until you are satisfied. There will probably be some functionality to adjust the colors in this mode more finely at some point.

Graph stats panel

If you click on the show statistics button, you will see a panel that looks like this.

I think this panel is not very useful. It would be more useful even to have an actual useful spreadsheet. Traitsui doesn't make that as easy as it might be if I were doing more of the interfaces myself. I might try make it into something more useful, even if it is just exporting an actual spreadsheet.

Implementation detail

This section is important mostly for purposes of scripting.

Each dataset has a unique set of scalars, modules, and statistics. Critically, the scalars shown in the scalar view window are held in a different data structure (dataset.node_scalars) than those shown in the stats panel (dataset.graph_stats). Whenever statistics are calculated natively in bctpy, those statistics are saved immediately to graph_stats, and then each measure is saved to a python dictionary (a hash table, with no type restrictions). When measures are loaded directly, they bypass graph_stats and are inserted directly into the dictionary. This reflects my desire to see the graph panel as a window to navigate an automatically generated set of statistics, not as a viewer to interpret any and all statistics (at which it would do a very bad job anyway).

This also means that name collisions can happen in node_scalars. They can also not happen. One statistic that takes a long time (a few minutes) to calculate is the local efficiency. If you calculate the local efficiency in one subject, and then turn it off to not reproduce it while calculating some other measures, it will still be there.