Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Method for identifying comparable components across runs #3

Open
tsalo opened this issue Feb 3, 2019 · 2 comments
Open

Method for identifying comparable components across runs #3

tsalo opened this issue Feb 3, 2019 · 2 comments

Comments

@tsalo
Copy link
Member

tsalo commented Feb 3, 2019

One of our methods of evaluating reliability will be to compare ICA components across random seeds. From this we can look at the impact of convergence on the results and consistency of classification for equivalent components. I'm trying to figure out how we should do this.

Here are some proposed steps with potential pros/cons:

  1. (Prerequisite) Run tedana with two seeds.
  2. Load ICA mixing matrix and ICA component table from each run. These will have the components sorted in the same order (descending Kappa, I believe).
  3. Correlate mixing matrices across the two runs, resulting in an n_comps X n_comps correlation matrix.
  4. For each row in correlation matrix, identify index of maximum correlation coefficient.
    • Under optimal circumstances, this index would have each column represented once, with no duplicates. In reality, that does not seem to happen (see the correlation matrix I've added below). As you can see below, the extremely high correlations (yellow squares) sort of disappear further down.
    • How do we resolve duplicates, where a given component's highest correlation from one run is with more than one component from the other run?
  5. To compare between convergence and non-convergence, compare distributions of these maximum correlation coefficients from converged/converged run pairs to converged/didn't-converge pairs.
    • We'll get an n_comps array of correlation coefficients from each pair, so to compare across all runs we'll need to use the full distributions.
    • As with all comparisons of convergence, a problem we'll have to deal with is that convergence failure doesn't happen randomly. Some subjects fail a lot of the time, while others never fail.
  6. To evaluate consistency of classification, we'll need some metric summarizing cross-run comparability of components. Then we can build a contingency table (see example below) for each pair of runs, and can look at the average of that across all runs, I think.
    • We still have the duplicates issue here.

Example correlation matrix from real data

example_correlation_matrix

Example confusion matrix

Note that I'm ignoring the duplicates issue described above. That means that 8 components in run2 are reflected 2-3 times below, and 10 components are not reflected at all.

run1/run2 accepted ignored rejected
accepted 40 10 8
ignored 0 0 1
rejected 4 1 8
@tsalo
Copy link
Member Author

tsalo commented Feb 4, 2019

I also looked at correlations between the beta maps as well, as a substitute to or in conjunction with the correlations between the time series, but that doesn't do anything to reduce duplicates in the test runs I'm using.

@tsalo tsalo changed the title Method for comparable component identification across runs Method for identifying comparable components across runs Feb 9, 2019
@jbteves
Copy link

jbteves commented Apr 15, 2019

One thing to note as I think about this is that if a component correlates highly with several other components, it seems likely those several components are not actually independent anymore, so when this happens we are in a sense failing to create truly independent components. When this occurs, this should be regarded as an undesirable ICA behavior (I'm reluctant to call it an outright failure of the ICA). However, the threshold where we decide that something is too highly correlated is a little bit tricky in the absence of the data itself. I think we will have to take a data set and inspect manually to see if there are scenarios where components might actually be independent but still have high correlation. What data set is the above example?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants