You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
One of our methods of evaluating reliability will be to compare ICA components across random seeds. From this we can look at the impact of convergence on the results and consistency of classification for equivalent components. I'm trying to figure out how we should do this.
Here are some proposed steps with potential pros/cons:
(Prerequisite) Run tedana with two seeds.
Load ICA mixing matrix and ICA component table from each run. These will have the components sorted in the same order (descending Kappa, I believe).
Correlate mixing matrices across the two runs, resulting in an n_comps X n_comps correlation matrix.
For each row in correlation matrix, identify index of maximum correlation coefficient.
Under optimal circumstances, this index would have each column represented once, with no duplicates. In reality, that does not seem to happen (see the correlation matrix I've added below). As you can see below, the extremely high correlations (yellow squares) sort of disappear further down.
How do we resolve duplicates, where a given component's highest correlation from one run is with more than one component from the other run?
To compare between convergence and non-convergence, compare distributions of these maximum correlation coefficients from converged/converged run pairs to converged/didn't-converge pairs.
We'll get an n_comps array of correlation coefficients from each pair, so to compare across all runs we'll need to use the full distributions.
As with all comparisons of convergence, a problem we'll have to deal with is that convergence failure doesn't happen randomly. Some subjects fail a lot of the time, while others never fail.
To evaluate consistency of classification, we'll need some metric summarizing cross-run comparability of components. Then we can build a contingency table (see example below) for each pair of runs, and can look at the average of that across all runs, I think.
We still have the duplicates issue here.
Example correlation matrix from real data
Example confusion matrix
Note that I'm ignoring the duplicates issue described above. That means that 8 components in run2 are reflected 2-3 times below, and 10 components are not reflected at all.
run1/run2
accepted
ignored
rejected
accepted
40
10
8
ignored
0
0
1
rejected
4
1
8
The text was updated successfully, but these errors were encountered:
I also looked at correlations between the beta maps as well, as a substitute to or in conjunction with the correlations between the time series, but that doesn't do anything to reduce duplicates in the test runs I'm using.
tsalo
changed the title
Method for comparable component identification across runs
Method for identifying comparable components across runs
Feb 9, 2019
One thing to note as I think about this is that if a component correlates highly with several other components, it seems likely those several components are not actually independent anymore, so when this happens we are in a sense failing to create truly independent components. When this occurs, this should be regarded as an undesirable ICA behavior (I'm reluctant to call it an outright failure of the ICA). However, the threshold where we decide that something is too highly correlated is a little bit tricky in the absence of the data itself. I think we will have to take a data set and inspect manually to see if there are scenarios where components might actually be independent but still have high correlation. What data set is the above example?
One of our methods of evaluating reliability will be to compare ICA components across random seeds. From this we can look at the impact of convergence on the results and consistency of classification for equivalent components. I'm trying to figure out how we should do this.
Here are some proposed steps with potential pros/cons:
n_comps
Xn_comps
correlation matrix.n_comps
array of correlation coefficients from each pair, so to compare across all runs we'll need to use the full distributions.Example correlation matrix from real data
Example confusion matrix
Note that I'm ignoring the duplicates issue described above. That means that 8 components in run2 are reflected 2-3 times below, and 10 components are not reflected at all.
The text was updated successfully, but these errors were encountered: