Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] update ICA to sklearn from mdp #44

Merged
merged 15 commits into from
Nov 13, 2018
Merged

Conversation

emdupre
Copy link
Member

@emdupre emdupre commented May 13, 2018

Pending discussion in #14.

This is a breaking change, since the implementation of FastICA is slightly different across MDP and sklearn.

tedana/workflows/tedana.py Outdated Show resolved Hide resolved
tedana/cli/run_tedana.py Outdated Show resolved Hide resolved
@tsalo
Copy link
Member

tsalo commented May 18, 2018

I don't know if backwards compatibility is a requirement for new versions before 1.0.0, but could this PR be added to a 0.0.2 or 1.0.0 milestone? I don't think it belongs in 0.0.1.

@emdupre
Copy link
Member Author

emdupre commented May 18, 2018

Yes, agreed ! This is a 0.1.0 feature, in my mind— the main one, I think.
I don't think I have it on the 0.0.1 milestone, but I'll go ahead and make the 0.1.0 milestone now !

@codecov
Copy link

codecov bot commented May 18, 2018

Codecov Report

Merging #44 into master will decrease coverage by 0.02%.
The diff coverage is 9.09%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master     #44      +/-   ##
=========================================
- Coverage   48.72%   48.7%   -0.03%     
=========================================
  Files          32      32              
  Lines        2079    2080       +1     
=========================================
  Hits         1013    1013              
- Misses       1066    1067       +1
Impacted Files Coverage Δ
tedana/info.py 100% <ø> (ø) ⬆️
tedana/workflows/tedana.py 12% <0%> (+0.09%) ⬆️
tedana/decomposition/eigendecomp.py 10.81% <11.11%> (-0.15%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 87cb2e2...1025ace. Read the comment docs.

@emdupre emdupre added this to the 0.1.0 milestone May 18, 2018
@rmarkello
Copy link
Member

No big surprise here: from the most recent Circle build it looks like the first file saved out post-ICA is different. It might be worth doing some manual inspection to see how different these are, and if it's to a tolerance we're comfortable with...

@emdupre
Copy link
Member Author

emdupre commented May 18, 2018

Yes, agreed. @prantikk specifically asked that we

try to make sure final SNR and contrast stay comparable.

which I think is a good starting point for manual inspection !

@emdupre emdupre force-pushed the sklearn-ica branch 5 times, most recently from 50591b1 to 652a795 Compare May 28, 2018 03:28
@emdupre emdupre force-pushed the sklearn-ica branch 7 times, most recently from 38071b1 to 694be16 Compare May 28, 2018 21:53
@tsalo
Copy link
Member

tsalo commented Nov 6, 2018

Now that we've committed to merging this, I wanted to check in about it. Other than dealing with the conflicts, is there anything that needs to be done before this one can be merged?

@emdupre
Copy link
Member Author

emdupre commented Nov 6, 2018

Thanks for checking in on this, @tsalo !! I think I just need to resolve the conflicts (which I'm happy to do later today !), but I wanted to give a little more time for feedback on the roadmap.

Also, I realized I never did the promised comparison 😞 Do you think it would still be useful to do ?

@tsalo
Copy link
Member

tsalo commented Nov 6, 2018

I agree that varying the seed will probably result in variability on par with the differences between the mdp and sklearn implementations. I personally don't think we need those comparisons to merge. We'll need to run the new version and inspect the results when we update the integration tests anyway, right?

@emdupre
Copy link
Member Author

emdupre commented Nov 6, 2018

That's ok by me, unless @handwerkerd or @KirstieJane disagree ! Either way, I'll fix these merge conflicts later today and wait until we've given a full week for the roadmap RFC (in #151) before merging in :)

tedana/workflows/tedana.py Outdated Show resolved Hide resolved
@tsalo
Copy link
Member

tsalo commented Nov 11, 2018

I know we've probably discussed this before, but I can't see where- Does it matter that FastICA whitens the data?

@emdupre
Copy link
Member Author

emdupre commented Nov 11, 2018

MDP was already whitening the data -- we had not supplied a whitened param, so by default we were whitening the data !

@tsalo
Copy link
Member

tsalo commented Nov 11, 2018

Is it okay that MDP was doing it too though?

@emdupre
Copy link
Member Author

emdupre commented Nov 11, 2018

Yes, it should be fine. We actually want whitened data, especially if (as previously) we weren't selecting principal components based on descending eigenvalues. Even when we are, though, we still want to account for the fact that descending eigenvalues explain differential amounts of variance (since, in whitened data, they should all be equal amounts). I just found this review and think it explains the idea well !

@tsalo
Copy link
Member

tsalo commented Nov 11, 2018

If the whitening is performed by FastICA, and we aren't using a decision tree to select PCA components during that whitening stage, do we still need the PCA step?

@emdupre
Copy link
Member Author

emdupre commented Nov 11, 2018

@tsalo and I chatted off-line and got a better handle on this point (since it is quite confusing !).

We do need the PCA step. The reason this is confusing is because if we allow for whitening within the ICA we are introducing a second PCA, which makes the first seem redundant. But it's not ! The first PCA allows us to dimensionally reduce the data, by taking dimensions that meet some criteria -- we can (and I think should !) keep discussing what those criteria are in #101.

But the second PCA, performed inside the fastICA call, does not dimensionally reduce the data. Instead, it just orthogonalizes components which can then be statistically whitened to help the ICA converge.

There's a CrossValidated answer making exactly this point, and might be a useful reference !

@emdupre
Copy link
Member Author

emdupre commented Nov 13, 2018

Merging this, since #151 is merged in ! Thanks everyone for all of your feedback, here !

@emdupre emdupre merged commit 0641061 into ME-ICA:master Nov 13, 2018
@jbteves jbteves added breaking change WIll make a non-trivial change to outputs and removed output-change labels Apr 19, 2021
handwerkerd added a commit to handwerkerd/tedana that referenced this pull request May 4, 2023
* Added flow charts and some text

* Finished flow charts and text.

Co-authored-by: marco7877 <[email protected]>

---------

Co-authored-by: marco7877 <[email protected]>
tsalo added a commit that referenced this pull request May 11, 2023
* Decision tree refactor with minimal and kundu

* Fix commented-out tedana workflow

* Appease the style checker

* All tremble before the mighty linter

* Actually fix incorrect style checker issue

* Unfix another style checker error

* Attempt to make Black happy, even though it does not actually say what's wrong

* ran black

* Added elbows to reports

* fixing kundu tree and added calc_median

* kundu.json added comment

* kundu kappa_elbow is GTE not GT

* kundu dtm matches main and minimal updated

* flake8 style fixes

* fixed linting

* fixed report elbow warning

* removed unneeded second d_table calc function

* Links building decision trees to index

* Adds ComponentSelector to API docs

* Set language to English

* Fix dead nilearn link

* Add load_config and ComponentSelector to API docs

* Fix mixing matrix over-save bug

* Separately modularized kappa & rho elbow calcs and created liberal rho elbow (#15)

* kundu tree provisionalreject to unclassified

* calc_rho_elbow progress

* calc_rho_elbow done

* Removed calc_varex_upper_p

* Removed kappa_rho_elbow tests

* both decision trees running

* linting fixes

* Enable tedana_reclassify as console script

* No errors if no xcomp but also no decide_comps (#16)

* Update tedana/io.py

Co-authored-by: Taylor Salo <[email protected]>

* Appease style checker

* Appease the style checker?

* Force to use up to date setuptools; installation bug otherwise

* Remove out of date make entry

* Create functional reclassify CLI

* Replace blanks with n/a

* Maybe appease black

* Fix typo

Co-authored-by: Eneko Uruñuela <[email protected]>

* BIDSify some outputs

* Appease black

* Heavily revise ComponentSelector module docs

* Fixing mid kappa A  inconsistency (#17)

* Output codes in kundu.json

* fixed kappa ratio

* Update tedana/selection/selection_nodes.py

Co-authored-by: Joshua Teves <[email protected]>

* minimal tree keep kappa>2rho

Co-authored-by: Joshua Teves <[email protected]>

* Drops 3.6 support

* Remove 3.6 support from CircleCI tests

* Reformat comment

* Reduce line length

* Update lint in Makefile

* Correctly collect API submodule doc

* Fix errors

* Fix more sphinx

* working on selector init documentation

* Breaking up outputs.rst

* partially updated output_file_descriptions.rst

* changed n_bold_comps to n_accepted_comps

* n_bold_comps to n_accepted_comps

* ComponentSelector.py API docs cleaned up

* selection_nodes decision_docs updated

* selection_nodes docstrings cleaned up

* Fixed a test for selection_nodes

* Updated faq for tedana_reclassify and tree options

* docstrings in tedica and other small updates

* Updated docstrings in selection_utils.py

* Update docs/output_file_descriptions.rst

* Working on improving selector documentation (#18)

* working on selector init documentation

* Breaking up outputs.rst

* partially updated output_file_descriptions.rst

* changed n_bold_comps to n_accepted_comps

* n_bold_comps to n_accepted_comps

* ComponentSelector.py API docs cleaned up

* selection_nodes decision_docs updated

* selection_nodes docstrings cleaned up

* Fixed a test for selection_nodes

* Updated faq for tedana_reclassify and tree options

* docstrings in tedica and other small updates

* Updated docstrings in selection_utils.py

* Update docs/output_file_descriptions.rst

Co-authored-by: Joshua Teves <[email protected]>

* Remove manual selection

* Force user to pick a tree

* Fix CLI test

* Revert "Force user to pick a tree"

This reverts commit 4fc656f.

* Revert "Fix CLI test"

This reverts commit 4038336.

* Make kundu default tree

* Attempt to fix error

* Adds input data to registry

* Revert "Adds input data to registry"

This reverts commit c7349bd.

* Adds input registration

* Appease linter

* Add class template start

* Add previous workflow registry into new one

* Fix failure to update tags and classifications in manual

* Fix missing less likely BOOLD tag

* Adds more useful reporting for unused metrics

* Create generated metrics

* Update line terminator

* Force black to run before flake8

* Updates percentile call

* more doc updates

* fixed meica to v2.5 in docstrings

* docs building again

* more updates to building decision trees

* improved docs (#19)

* working on selector init documentation

* Breaking up outputs.rst

* partially updated output_file_descriptions.rst

* changed n_bold_comps to n_accepted_comps

* n_bold_comps to n_accepted_comps

* ComponentSelector.py API docs cleaned up

* selection_nodes decision_docs updated

* selection_nodes docstrings cleaned up

* Fixed a test for selection_nodes

* Updated faq for tedana_reclassify and tree options

* docstrings in tedica and other small updates

* Updated docstrings in selection_utils.py

* Update docs/output_file_descriptions.rst

* more doc updates

* fixed meica to v2.5 in docstrings

* docs building again

* more updates to building decision trees

Co-authored-by: Joshua Teves <[email protected]>

* Get rid of optional method keyword

* Revert "Get rid of optional method keyword"

This reverts commit e5fdec1.

* Revert "Updates percentile call"

This reverts commit 9d6a487.

* Revert "Update line terminator"

This reverts commit 8cf697c.

* Autodocument ComponentSelector methods/attributes (#20)

* Rename ComponentSelector module.

* Document the ComponentSelector directly.

* fixed rename of component_selector

* Fixed remaining transition to component_selector (#21)

* working on selector init documentation

* Breaking up outputs.rst

* partially updated output_file_descriptions.rst

* changed n_bold_comps to n_accepted_comps

* n_bold_comps to n_accepted_comps

* ComponentSelector.py API docs cleaned up

* selection_nodes decision_docs updated

* selection_nodes docstrings cleaned up

* Fixed a test for selection_nodes

* Updated faq for tedana_reclassify and tree options

* docstrings in tedica and other small updates

* Updated docstrings in selection_utils.py

* Update docs/output_file_descriptions.rst

* more doc updates

* fixed meica to v2.5 in docstrings

* docs building again

* more updates to building decision trees

* fixed rename of component_selector

Co-authored-by: Joshua Teves <[email protected]>

* more doc updates

* mostly classification_output_descriptions

* Fixed io API and selector API warnings

* message
message

* key parts of docs all updated

* output_file_descriptions fully updated

* filled testing gaps for component_selector

* Updates integration test fnames

* Try a numpy fix

* Try again

* Remove dead code

* full selector coverage (#23)

* Add tedana_reclassify tests

* Actually add test to circle workflow

* Maybe actually add it

* Change o to outdir

* Fix noreports maybe

* Fix tedort

* CircleCI are you okay?

* Circle if you keep this up I will switch to Actions

* Revert "Circle if you keep this up I will switch to Actions"

This reverts commit ad29c0d.

* Maybe silence duecredit and re-trigger Circle

* Try something else

* Guess that wasn't legal

* Switch main to _main

* Add to pyproject.toml

* Force it to be editable

* Add references to resources package

* Dispose of sanity check

* Add more reclassify tests

* Adaptive mask is not a bool

* Add label for setup.cfg

* Revert "Adaptive mask is not a bool"

This reverts commit f7db360.

* Add resource files

* Clarify variables

* Update date and weep

* Fixed NoLikelyBOLDBug (#24)

* Fixed NoLikelyBOLDBug

* Updated docs for Likely BOLD

* Added note for when ICA will rerun

* updated message

* New verbose tag for more detailed logging.

* at_least_num_exist to classification_doesnt_exist

* Cleaned up selector logging output

* fixed debug logging

* Temporarily turn on force overwrite for redo ICA

* Fixed I007 divergence

* calc_varex_thresh now has num_highest_var_comps

* fixed linting errors

* Update integration test data

* Adds csv and text file reading for manual acc/rej

* Add tests for CustomEncoder

* Adds bibtex warning check test

* Appease linter

* Fix unused metrics warning

* Add reclassify tests and patches to test failures

* Make stylistic changes.

* Remove trailing whitespace.

* Spacing in io.

* More minor changes.

* Add custom napoleon section "Generated Files"

* Replace numTrue/numFalse with n_true/n_false.

* Replace ifTrue/ifFalse with if_true/if_false.

* Use fill_doc.

* Style fixes.

* more int32

* more int32 fun

* Appease linter

* Fixed style issues

* Add RICA to Approach section of docs

* Fixed CI style check failure

* DTM documentation review (#30)

* Standardization of usage descriptions

* Minor grammar edits

* Minor grammar/spelling edits

* Update docs/faq.rst

---------

* Rename reclassify force (#32)

* changed tedana_reclassify and force

* Added default messages to CLI workflows

* clean up CLI default messages

* added t2smap to function from CLI

* style fix

* Add defaults to --help output (#31)

* added ica_reclassify to setup.cfg

* Using a more persistent cache for the testing data (#33)

* Cleans up how testing datasets are downloaded within test_integration.py. In Main & the current JT_DTM each dataset is downloaded in a slightly different way and the five-echo data are downloaded twice.
* Added `data_for_testing_info` which gives the file hash location and local directory name for each of the four files we download. All tests are updated to use this function.
* The local copy of testing data will now go into the `.testing_data_cache` subdirectory
* The downloaded testing data will be in separate directories from the outputs so the downloaded directories can be completely static
* When `download_test_data` is called, it will first download the metadata json to see if the last updated copy on osf.io is newer than the downloaded version and will only download if osf has a newer file. Downloading the metadata will happen frequently, but it will hopefully be fast.
* The logger is now used to give a warning if osf.io cannot be accessed, but it will still run using cached data

* Change to TestLGR.info

* Fixing high variance classification mess (#34)

* Added dec_reclassify_high_var_comps plus

* clarified diff btwn rho_kundu and _liberal thresh

* Clarified docs for minimal tree

* Replace versioneer with hatch (#35)

* Update gitignore.

* Delete _version.py

* Adopt new packaging.

* Ignore the _version.py file.

* Fix CI (#36)

* Base the cache on pyproject.toml, not setup.cfg.

* Also drop use of setup.py in publishing action.

* Add flake8-pyproject as a requirement. (#37)

* Try fixing coverage. (#38)

* Improving ica_reclassify (#39)

* ica_reclassify docs now rendering in usage.html

* moves file parsing to ica_reclassify_workflow

* added error checks and tests

* Ica reclassify registry fixes (#42)

* add pandas version check >= 1.5.2 and mod behavior (#938)

* add version check and mod behavior if pandas >= 1.5.2 to prevent error in writing csv

* formatting

* adding P. Molfese

---------

Co-authored-by: Molfese <[email protected]>

* readded InputHarvester and expanduser

* fixed handler base_dir path

* mixing matrix file always in registry

---------

Co-authored-by: Peter J. Molfese <[email protected]>
Co-authored-by: Molfese <[email protected]>

* Drop Python 3.6 and 3.7 support (#40)

* Drop Python 3.6 and 3.7 support.

* line_terminator --> lineterminator

* added mixm to 4echo test (#43)

* Updating Contributor Information (#41)

* Some contributor updates

* Added doc to Marco

* Added flow charts and some text (#44)

* Added flow charts and some text

* Finished flow charts and text.

Co-authored-by: marco7877 <[email protected]>

---------

Co-authored-by: marco7877 <[email protected]>

* RTDfix (#45)

* Update documentation (#46)

* Update docs.

* Update docs/building_decision_trees.rst

Co-authored-by: Dan Handwerker <[email protected]>

---------

Co-authored-by: Dan Handwerker <[email protected]>

* Output docs on one page (#47)

* Output docs on one page

* added new multi-echo lectures

---------

Co-authored-by: Joshua Teves <[email protected]>
Co-authored-by: handwerkerd <[email protected]>
Co-authored-by: Taylor Salo <[email protected]>
Co-authored-by: Eneko Uruñuela <[email protected]>
Co-authored-by: handwerkerd <[email protected]>
Co-authored-by: Taylor Salo <[email protected]>
Co-authored-by: Eneko Uruñuela <[email protected]>
Co-authored-by: Neha Reddy <[email protected]>
Co-authored-by: Peter J. Molfese <[email protected]>
Co-authored-by: Molfese <[email protected]>
Co-authored-by: marco7877 <[email protected]>
Co-authored-by: Taylor Salo <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking change WIll make a non-trivial change to outputs enhancement issues describing possible enhancements to the project
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants