-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] update ICA to sklearn from mdp #44
Conversation
I don't know if backwards compatibility is a requirement for new versions before |
Yes, agreed ! This is a |
Codecov Report
@@ Coverage Diff @@
## master #44 +/- ##
=========================================
- Coverage 48.72% 48.7% -0.03%
=========================================
Files 32 32
Lines 2079 2080 +1
=========================================
Hits 1013 1013
- Misses 1066 1067 +1
Continue to review full report at Codecov.
|
No big surprise here: from the most recent Circle build it looks like the first file saved out post-ICA is different. It might be worth doing some manual inspection to see how different these are, and if it's to a tolerance we're comfortable with... |
Yes, agreed. @prantikk specifically asked that we
which I think is a good starting point for manual inspection ! |
50591b1
to
652a795
Compare
38071b1
to
694be16
Compare
Now that we've committed to merging this, I wanted to check in about it. Other than dealing with the conflicts, is there anything that needs to be done before this one can be merged? |
Thanks for checking in on this, @tsalo !! I think I just need to resolve the conflicts (which I'm happy to do later today !), but I wanted to give a little more time for feedback on the roadmap. Also, I realized I never did the promised comparison 😞 Do you think it would still be useful to do ? |
I agree that varying the seed will probably result in variability on par with the differences between the |
That's ok by me, unless @handwerkerd or @KirstieJane disagree ! Either way, I'll fix these merge conflicts later today and wait until we've given a full week for the roadmap RFC (in #151) before merging in :) |
I know we've probably discussed this before, but I can't see where- Does it matter that FastICA whitens the data? |
MDP was already whitening the data -- we had not supplied a |
Is it okay that MDP was doing it too though? |
Yes, it should be fine. We actually want whitened data, especially if (as previously) we weren't selecting principal components based on descending eigenvalues. Even when we are, though, we still want to account for the fact that descending eigenvalues explain differential amounts of variance (since, in whitened data, they should all be equal amounts). I just found this review and think it explains the idea well ! |
If the whitening is performed by FastICA, and we aren't using a decision tree to select PCA components during that whitening stage, do we still need the PCA step? |
@tsalo and I chatted off-line and got a better handle on this point (since it is quite confusing !). We do need the PCA step. The reason this is confusing is because if we allow for whitening within the ICA we are introducing a second PCA, which makes the first seem redundant. But it's not ! The first PCA allows us to dimensionally reduce the data, by taking dimensions that meet some criteria -- we can (and I think should !) keep discussing what those criteria are in #101. But the second PCA, performed inside the There's a CrossValidated answer making exactly this point, and might be a useful reference ! |
Merging this, since #151 is merged in ! Thanks everyone for all of your feedback, here ! |
* Added flow charts and some text * Finished flow charts and text. Co-authored-by: marco7877 <[email protected]> --------- Co-authored-by: marco7877 <[email protected]>
* Decision tree refactor with minimal and kundu * Fix commented-out tedana workflow * Appease the style checker * All tremble before the mighty linter * Actually fix incorrect style checker issue * Unfix another style checker error * Attempt to make Black happy, even though it does not actually say what's wrong * ran black * Added elbows to reports * fixing kundu tree and added calc_median * kundu.json added comment * kundu kappa_elbow is GTE not GT * kundu dtm matches main and minimal updated * flake8 style fixes * fixed linting * fixed report elbow warning * removed unneeded second d_table calc function * Links building decision trees to index * Adds ComponentSelector to API docs * Set language to English * Fix dead nilearn link * Add load_config and ComponentSelector to API docs * Fix mixing matrix over-save bug * Separately modularized kappa & rho elbow calcs and created liberal rho elbow (#15) * kundu tree provisionalreject to unclassified * calc_rho_elbow progress * calc_rho_elbow done * Removed calc_varex_upper_p * Removed kappa_rho_elbow tests * both decision trees running * linting fixes * Enable tedana_reclassify as console script * No errors if no xcomp but also no decide_comps (#16) * Update tedana/io.py Co-authored-by: Taylor Salo <[email protected]> * Appease style checker * Appease the style checker? * Force to use up to date setuptools; installation bug otherwise * Remove out of date make entry * Create functional reclassify CLI * Replace blanks with n/a * Maybe appease black * Fix typo Co-authored-by: Eneko Uruñuela <[email protected]> * BIDSify some outputs * Appease black * Heavily revise ComponentSelector module docs * Fixing mid kappa A inconsistency (#17) * Output codes in kundu.json * fixed kappa ratio * Update tedana/selection/selection_nodes.py Co-authored-by: Joshua Teves <[email protected]> * minimal tree keep kappa>2rho Co-authored-by: Joshua Teves <[email protected]> * Drops 3.6 support * Remove 3.6 support from CircleCI tests * Reformat comment * Reduce line length * Update lint in Makefile * Correctly collect API submodule doc * Fix errors * Fix more sphinx * working on selector init documentation * Breaking up outputs.rst * partially updated output_file_descriptions.rst * changed n_bold_comps to n_accepted_comps * n_bold_comps to n_accepted_comps * ComponentSelector.py API docs cleaned up * selection_nodes decision_docs updated * selection_nodes docstrings cleaned up * Fixed a test for selection_nodes * Updated faq for tedana_reclassify and tree options * docstrings in tedica and other small updates * Updated docstrings in selection_utils.py * Update docs/output_file_descriptions.rst * Working on improving selector documentation (#18) * working on selector init documentation * Breaking up outputs.rst * partially updated output_file_descriptions.rst * changed n_bold_comps to n_accepted_comps * n_bold_comps to n_accepted_comps * ComponentSelector.py API docs cleaned up * selection_nodes decision_docs updated * selection_nodes docstrings cleaned up * Fixed a test for selection_nodes * Updated faq for tedana_reclassify and tree options * docstrings in tedica and other small updates * Updated docstrings in selection_utils.py * Update docs/output_file_descriptions.rst Co-authored-by: Joshua Teves <[email protected]> * Remove manual selection * Force user to pick a tree * Fix CLI test * Revert "Force user to pick a tree" This reverts commit 4fc656f. * Revert "Fix CLI test" This reverts commit 4038336. * Make kundu default tree * Attempt to fix error * Adds input data to registry * Revert "Adds input data to registry" This reverts commit c7349bd. * Adds input registration * Appease linter * Add class template start * Add previous workflow registry into new one * Fix failure to update tags and classifications in manual * Fix missing less likely BOOLD tag * Adds more useful reporting for unused metrics * Create generated metrics * Update line terminator * Force black to run before flake8 * Updates percentile call * more doc updates * fixed meica to v2.5 in docstrings * docs building again * more updates to building decision trees * improved docs (#19) * working on selector init documentation * Breaking up outputs.rst * partially updated output_file_descriptions.rst * changed n_bold_comps to n_accepted_comps * n_bold_comps to n_accepted_comps * ComponentSelector.py API docs cleaned up * selection_nodes decision_docs updated * selection_nodes docstrings cleaned up * Fixed a test for selection_nodes * Updated faq for tedana_reclassify and tree options * docstrings in tedica and other small updates * Updated docstrings in selection_utils.py * Update docs/output_file_descriptions.rst * more doc updates * fixed meica to v2.5 in docstrings * docs building again * more updates to building decision trees Co-authored-by: Joshua Teves <[email protected]> * Get rid of optional method keyword * Revert "Get rid of optional method keyword" This reverts commit e5fdec1. * Revert "Updates percentile call" This reverts commit 9d6a487. * Revert "Update line terminator" This reverts commit 8cf697c. * Autodocument ComponentSelector methods/attributes (#20) * Rename ComponentSelector module. * Document the ComponentSelector directly. * fixed rename of component_selector * Fixed remaining transition to component_selector (#21) * working on selector init documentation * Breaking up outputs.rst * partially updated output_file_descriptions.rst * changed n_bold_comps to n_accepted_comps * n_bold_comps to n_accepted_comps * ComponentSelector.py API docs cleaned up * selection_nodes decision_docs updated * selection_nodes docstrings cleaned up * Fixed a test for selection_nodes * Updated faq for tedana_reclassify and tree options * docstrings in tedica and other small updates * Updated docstrings in selection_utils.py * Update docs/output_file_descriptions.rst * more doc updates * fixed meica to v2.5 in docstrings * docs building again * more updates to building decision trees * fixed rename of component_selector Co-authored-by: Joshua Teves <[email protected]> * more doc updates * mostly classification_output_descriptions * Fixed io API and selector API warnings * message message * key parts of docs all updated * output_file_descriptions fully updated * filled testing gaps for component_selector * Updates integration test fnames * Try a numpy fix * Try again * Remove dead code * full selector coverage (#23) * Add tedana_reclassify tests * Actually add test to circle workflow * Maybe actually add it * Change o to outdir * Fix noreports maybe * Fix tedort * CircleCI are you okay? * Circle if you keep this up I will switch to Actions * Revert "Circle if you keep this up I will switch to Actions" This reverts commit ad29c0d. * Maybe silence duecredit and re-trigger Circle * Try something else * Guess that wasn't legal * Switch main to _main * Add to pyproject.toml * Force it to be editable * Add references to resources package * Dispose of sanity check * Add more reclassify tests * Adaptive mask is not a bool * Add label for setup.cfg * Revert "Adaptive mask is not a bool" This reverts commit f7db360. * Add resource files * Clarify variables * Update date and weep * Fixed NoLikelyBOLDBug (#24) * Fixed NoLikelyBOLDBug * Updated docs for Likely BOLD * Added note for when ICA will rerun * updated message * New verbose tag for more detailed logging. * at_least_num_exist to classification_doesnt_exist * Cleaned up selector logging output * fixed debug logging * Temporarily turn on force overwrite for redo ICA * Fixed I007 divergence * calc_varex_thresh now has num_highest_var_comps * fixed linting errors * Update integration test data * Adds csv and text file reading for manual acc/rej * Add tests for CustomEncoder * Adds bibtex warning check test * Appease linter * Fix unused metrics warning * Add reclassify tests and patches to test failures * Make stylistic changes. * Remove trailing whitespace. * Spacing in io. * More minor changes. * Add custom napoleon section "Generated Files" * Replace numTrue/numFalse with n_true/n_false. * Replace ifTrue/ifFalse with if_true/if_false. * Use fill_doc. * Style fixes. * more int32 * more int32 fun * Appease linter * Fixed style issues * Add RICA to Approach section of docs * Fixed CI style check failure * DTM documentation review (#30) * Standardization of usage descriptions * Minor grammar edits * Minor grammar/spelling edits * Update docs/faq.rst --------- * Rename reclassify force (#32) * changed tedana_reclassify and force * Added default messages to CLI workflows * clean up CLI default messages * added t2smap to function from CLI * style fix * Add defaults to --help output (#31) * added ica_reclassify to setup.cfg * Using a more persistent cache for the testing data (#33) * Cleans up how testing datasets are downloaded within test_integration.py. In Main & the current JT_DTM each dataset is downloaded in a slightly different way and the five-echo data are downloaded twice. * Added `data_for_testing_info` which gives the file hash location and local directory name for each of the four files we download. All tests are updated to use this function. * The local copy of testing data will now go into the `.testing_data_cache` subdirectory * The downloaded testing data will be in separate directories from the outputs so the downloaded directories can be completely static * When `download_test_data` is called, it will first download the metadata json to see if the last updated copy on osf.io is newer than the downloaded version and will only download if osf has a newer file. Downloading the metadata will happen frequently, but it will hopefully be fast. * The logger is now used to give a warning if osf.io cannot be accessed, but it will still run using cached data * Change to TestLGR.info * Fixing high variance classification mess (#34) * Added dec_reclassify_high_var_comps plus * clarified diff btwn rho_kundu and _liberal thresh * Clarified docs for minimal tree * Replace versioneer with hatch (#35) * Update gitignore. * Delete _version.py * Adopt new packaging. * Ignore the _version.py file. * Fix CI (#36) * Base the cache on pyproject.toml, not setup.cfg. * Also drop use of setup.py in publishing action. * Add flake8-pyproject as a requirement. (#37) * Try fixing coverage. (#38) * Improving ica_reclassify (#39) * ica_reclassify docs now rendering in usage.html * moves file parsing to ica_reclassify_workflow * added error checks and tests * Ica reclassify registry fixes (#42) * add pandas version check >= 1.5.2 and mod behavior (#938) * add version check and mod behavior if pandas >= 1.5.2 to prevent error in writing csv * formatting * adding P. Molfese --------- Co-authored-by: Molfese <[email protected]> * readded InputHarvester and expanduser * fixed handler base_dir path * mixing matrix file always in registry --------- Co-authored-by: Peter J. Molfese <[email protected]> Co-authored-by: Molfese <[email protected]> * Drop Python 3.6 and 3.7 support (#40) * Drop Python 3.6 and 3.7 support. * line_terminator --> lineterminator * added mixm to 4echo test (#43) * Updating Contributor Information (#41) * Some contributor updates * Added doc to Marco * Added flow charts and some text (#44) * Added flow charts and some text * Finished flow charts and text. Co-authored-by: marco7877 <[email protected]> --------- Co-authored-by: marco7877 <[email protected]> * RTDfix (#45) * Update documentation (#46) * Update docs. * Update docs/building_decision_trees.rst Co-authored-by: Dan Handwerker <[email protected]> --------- Co-authored-by: Dan Handwerker <[email protected]> * Output docs on one page (#47) * Output docs on one page * added new multi-echo lectures --------- Co-authored-by: Joshua Teves <[email protected]> Co-authored-by: handwerkerd <[email protected]> Co-authored-by: Taylor Salo <[email protected]> Co-authored-by: Eneko Uruñuela <[email protected]> Co-authored-by: handwerkerd <[email protected]> Co-authored-by: Taylor Salo <[email protected]> Co-authored-by: Eneko Uruñuela <[email protected]> Co-authored-by: Neha Reddy <[email protected]> Co-authored-by: Peter J. Molfese <[email protected]> Co-authored-by: Molfese <[email protected]> Co-authored-by: marco7877 <[email protected]> Co-authored-by: Taylor Salo <[email protected]>
Pending discussion in #14.
This is a breaking change, since the implementation of FastICA is slightly different across MDP and sklearn.