Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] BIDS Derivatives-compatible outputs #574

Merged
merged 34 commits into from
Feb 24, 2021

Conversation

tsalo
Copy link
Member

@tsalo tsalo commented May 22, 2020

Closes #146 and closes #649.

Changes proposed in this pull request:

  • Rename outputs to be as BIDS Derivatives-compatible as possible.
  • Reorganize metric and decomposition files. Metrics are now stored in TSVs, with associated JSONs describing the metrics, and decomposition files should match BEP012 convention.

TODO

  • T1gs.nii.gz (from gscontrol_raw())
  • desc-[PCA|ICA]R2ModelPredictions_X.nii.gz
  • desc-[PCA|ICA]S0ModelPredictions_X.nii.gz
  • desc-[PCA|ICA]AveragingWeights_X.nii.gz
  • Outputs of tedana page
  • Tedana workflow page
  • desc-ICA_metrics.tsv vs. desc-[tedana|AROMA]_metrics.tsv

EDIT: Whoops, I think desc-tedana_metrics.tsv isn't specific enough since we'll have metrics from ICA and PCA.... 🤔 Circumvented by just having desc-PCA for the TEDPCA metrics and desc-tedana for the final TEDICA metrics.

Comment on lines +515 to +519
io.filewrite(
data_oc,
op.join(out_dir, 'desc-optcom_bold.nii.gz'),
ref_img
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Write OC data earlier than before. Much more transparent this way.

tedana/metrics/kundu_fit.py Outdated Show resolved Hide resolved
fout = filewrite(ts, op.join(out_dir, 'ts_OC'), ref_img)
LGR.info('Writing optimally combined time series: {}'.format(op.abspath(fout)))

write_split_ts(ts, mmix, mask, comptable, ref_img, out_dir=out_dir, suffix='OC')
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just moved down. Not removed.

"""
acc = comptable[comptable.classification == 'accepted'].index.values

fout = filewrite(ts, op.join(out_dir, 'ts_OC'), ref_img)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to main workflow.

tedana/gscontrol.py Outdated Show resolved Hide resolved
tedana/decomposition/pca.py Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented May 22, 2020

Codecov Report

Merging #574 (cc9296b) into main (3897363) will decrease coverage by 0.06%.
The diff coverage is 93.57%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #574      +/-   ##
==========================================
- Coverage   93.64%   93.58%   -0.07%     
==========================================
  Files          26       26              
  Lines        2029     2041      +12     
==========================================
+ Hits         1900     1910      +10     
- Misses        129      131       +2     
Impacted Files Coverage Δ
tedana/workflows/tedana.py 89.66% <89.58%> (-0.16%) ⬇️
tedana/metrics/kundu_fit.py 95.87% <92.85%> (+0.29%) ⬆️
tedana/selection/tedica.py 93.37% <92.85%> (+0.37%) ⬆️
tedana/decomposition/pca.py 87.23% <94.11%> (+0.87%) ⬆️
tedana/gscontrol.py 100.00% <100.00%> (ø)
tedana/io.py 97.08% <100.00%> (-0.76%) ⬇️
tedana/reporting/dynamic_figures.py 100.00% <100.00%> (ø)
tedana/reporting/html_report.py 100.00% <100.00%> (ø)
tedana/selection/tedpca.py 83.33% <100.00%> (+0.52%) ⬆️
tedana/workflows/t2smap.py 93.90% <100.00%> (+0.31%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3897363...cc9296b. Read the comment docs.

@tsalo tsalo requested a review from emdupre May 22, 2020 21:08
@tsalo
Copy link
Member Author

tsalo commented Aug 6, 2020

For transparency, here is what the ICA decomposition json will look like, with two example components:

{
    "Method": "Independent components analysis with FastICA algorithm implemented by sklearn. Components are sorted by Kappa in descending order. Component signs are flipped to best match the data.",
    "ica_00": {
        "Description": "ICA fit to dimensionally-reduced optimally combined data.",
        "classification": "rejected",
        "countnoise": 1119,
        "countsigFR2": 6128,
        "countsigFS0": 3778,
        "d_table_score": 18.4,
        "d_table_score_scrub": "n/a",
        "dice_FR2": 0.5568785491848324,
        "dice_FS0": 0.243486809765689,
        "kappa": 35.24615192099909,
        "kappa ratio": 2.408554759865709,
        "normalized variance explained": 0.017924586667012896,
        "original_classification": "rejected",
        "original_rationale": "I005",
        "rationale": "I001",
        "rho": 18.026996948546344,
        "signal-noise_p": 0.28063083660822047,
        "signal-noise_t": -1.0960543047701905,
        "variance explained": 9.670718133027448
    },
    "ica_01": {
        "Description": "ICA fit to dimensionally-reduced optimally combined data.",
        "classification": "accepted",
        "countnoise": 1277,
        "countsigFR2": 4097,
        "countsigFS0": 3496,
        "d_table_score": 14.8,
        "d_table_score_scrub": 11.8,
        "dice_FR2": 0.5252645699806231,
        "dice_FS0": 0.3942153186930905,
        "kappa": 33.957022187827526,
        "kappa ratio": 0.13136512366806724,
        "normalized variance explained": 0.0009146448952444163,
        "original_classification": "accepted",
        "original_rationale": "",
        "rationale": "",
        "rho": 33.23361928836013,
        "signal-noise_p": 0.7822498063437031,
        "signal-noise_t": 0.2769873372743333,
        "variance explained": 0.508159640579781
    },
}

tedana/gscontrol.py Outdated Show resolved Hide resolved
@tsalo
Copy link
Member Author

tsalo commented Oct 22, 2020

Do we want to output sidecar jsons for these files as well? To specify Sources and Units at least?

@emdupre
Copy link
Member

emdupre commented Oct 26, 2020

Do we want to output sidecar jsons for these files as well? To specify Sources and Units at least?

Sorry, which files ?

@tsalo
Copy link
Member Author

tsalo commented Oct 26, 2020

We'd probably want jsons for all of the files. At least the core niftis.

EDIT: And a top-level derivatives description json?

@stale stale bot added the stale label Jan 24, 2021
Base automatically changed from master to main February 1, 2021 23:57
@stale stale bot removed the stale label Feb 1, 2021
@ME-ICA ME-ICA deleted a comment from stale bot Feb 6, 2021
@tsalo
Copy link
Member Author

tsalo commented Feb 10, 2021

I think the only things left are:

  • Decide between desc-ICA and desc-tedana for the metrics files. We want to support multiple classification algorithms on the same decomposition. Going with desc-tedana.
  • Add standard derivatives files, like dataset_description.json. Done!

@handwerkerd
Copy link
Member

handwerkerd commented Feb 18, 2021

I was just talking with @jbteves about this and I have one non-trivial concern. Sorry for not bringing this up earlier. This is going to be a major breaking change on outputs. Right now it seems to be close, but not quite matching the BIDS derivatives format and BIDS deviates isn't totally fixed either. That means we'll have another breaking change for output names in the near future (probably within the coming year).

You've put so much work into this that I don't want it sitting stagnate until we agree on perfect file naming. One middle ground might be to add a --BIDSoutput flag that would use the new names when true & the current names when false. The default would currently be false. Doing that would allow for incremental changes in BIDS derivatives output names within main for the people using those outputs (and this would allow for issue and fixes that come from active use) while not breaking things for people using the current output names. Knowing that file naming happens all over the code, this will be a bit of a headache & this whole topic is worth discussion (probably at tomorrows dev call). I figured I'd comment now instead of just speaking up during the dev call so that others could think about this in advance.

I also have a tangential suggestion. If we are making breaking changes to file names, is it time to finally get ride of 'Optimally Combined' / 'OptCom' / 'OC' in file names? There are several weighted summation methods in the code and @jsheunis has just proposed a new one. Having several methods all being saved as 'optimally combined' creates unnecessary confusion. Josh also noted 'Com' can mean 'components' in other situations, which adds to the confusion. Perhaps 'Weighted Sum' / 'WSum' / 'WS' is another option???

@emdupre
Copy link
Member

emdupre commented Feb 18, 2021

This is going to be a major breaking change on outputs. Right now it seems to be close, but not quite matching the BIDS derivatives format and BIDS deviates isn't totally fixed either. That means we'll have another breaking change for output names in the near future (probably within the coming year).

I think we're well-timed for a breaking change as we approach our first minor-version release, which usually signals backwards-incompatible API changes :) I'd suggest that we pin ourselves to a set BIDS version for a while, and bump at the next minor-version release. But I agree with your broader point that this is worth discussing !

If we are making breaking changes to file names, is it time to finally get ride of 'Optimally Combined' / 'OptCom' / 'OC' in file names? There are several weighted summation methods in the code and @jsheunis has just proposed a new one. Having several methods all being saved as 'optimally combined' creates unnecessary confusion.

I'm fine with that change, though it's a good push for us to think about where we can explicitly denote the combination method. I'd argue that this is something that should be more transparently documented to end-users as they can call more methods.

@tsalo
Copy link
Member Author

tsalo commented Feb 19, 2021

Per today's call, @handwerkerd and @jbteves will look into a name-swapping option next week.

@jbteves
Copy link
Collaborator

jbteves commented Feb 19, 2021

Quick thing: I noticed that this actually isn't quite compliant because we don't keep the subject and session for the outputs. Should we maybe also add a --prefix option so that this could be specified?

@tsalo
Copy link
Member Author

tsalo commented Feb 19, 2021

Quick thing: I noticed that this actually isn't quite compliant because we don't keep the subject and session for the outputs. Should we maybe also add a --prefix option so that this could be specified?

This won't make tedana a full-on BIDS App. Instead, the goal is to make the suffixes and formats BIDS-compliant. That way, a BIDS App that uses tedana can take the subject, session, task, run, etc. entities and add them to the filenames we produce.

@eurunuela
Copy link
Collaborator

Just so we don't forget it, we discussed that updating the file names can be a burden if BIDS guidelines keep changing.

A solution could be to have a json file with the directory tree of tedana outputs. This would allow us to update the json file and forget about where in the code these files are being written.

docs/outputs.rst Outdated Show resolved Hide resolved
@jbteves
Copy link
Collaborator

jbteves commented Feb 23, 2021

Alright, I have a significant draft of what's basically a BIDS derivatives enabler + major modularization of io.filewrite. I can either open a PR to this branch, which is merged in to my branch, or open a separate one to the repository where we can discuss the changes there. What's your preference @tsalo ?

@tsalo
Copy link
Member Author

tsalo commented Feb 24, 2021

@jbteves I worry about keeping eyes on a PR to my fork. In my experience PRs like that tend to get ignored.. What if we change the target of this PR to a new branch and then merged? Then you could open your PR to that branch and we could have a draft PR from that branch to main simultaneously?

@jbteves
Copy link
Collaborator

jbteves commented Feb 24, 2021

I'm not quite sure I follow. In my branch, everything in this one is merged already.

@tsalo
Copy link
Member Author

tsalo commented Feb 24, 2021

I can either open a PR to this branch

This would mean opening a PR to tsalo/tedana. My concern is that PRs to individuals' branches are often overlooked.

or open a separate one to the repository where we can discuss the changes there

This should work fine, but it won't let us evaluate your changes in isolation. They'll be mixed in with my changes.

My alternative was to (1) change the target of this PR to a new branch (e.g., bids-derivs), (2) merge this PR as-is into bids-derivs, (3) open your PR to bids-derivs, so that the diff will focus on your proposed changes, and (4) open a draft PR from bids-derivs to main, which we would merge once your PR to the new branch is merged.

@jbteves
Copy link
Collaborator

jbteves commented Feb 24, 2021

My alternative was to (1) change the target of this PR to a new branch (e.g., bids-derivs), (2) merge this PR as-is into bids-derivs, (3) open your PR to bids-derivs, so that the diff will focus on your proposed changes, and (4) open a draft PR from bids-derivs to main, which we would merge once your PR to the new branch is merged.

Oh, okay, I follow now. Sorry, I don't know how I misunderstood that. Yes, let's retarget to a branch on this repo that I can open a PR to it and shows up here for everyone to see. That sounds good, thanks!

@tsalo tsalo changed the base branch from main to bids-derivatives February 24, 2021 21:59
@tsalo tsalo merged commit 0789f81 into ME-ICA:bids-derivatives Feb 24, 2021
@tsalo
Copy link
Member Author

tsalo commented Feb 24, 2021

Done!

EDIT: You can target bids-derivatives with your PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants