Single-cell representation learning #153

ziw-liu · 2024-08-31T13:23:35Z

Accumulated changes for single-cell representation learning.

@edyoshikun this PR include breaking API changes for image translation (#145).

Pending before merging this to main:

* notes on standard report * Add code for generating figures --------- Co-authored-by: Alishba Imran <[email protected]>

…y and features learned by embeddings (#140) * notes on standard report * add lib of computed features * correlates PCA with computed features * compute for all timepoints * compute correlation * remove cv library usage * remove edge detection * convert to dataframe * for entire well * add std_dev feature * fix patch size --------- Co-authored-by: Soorya Pradeep <[email protected]>

* remove obsolete training and prediction scripts * lint contrastive scripts

* draft projection head per Update the projection head (normalization and size). #139 * reorganize comments in example fit config * configurable stem stride and projection dimensions * update type hint and docstring for ContrastiveEncoder * clarify embedding_dim * use the forward method directly for projected * normalize projections only when fitting the projected features saved during prediction is now *not* normalized * remove unused logger * refactor training code into translation and representation modules * extract image logging functions * use AdamW instead of Adam for contrastive learning * inline single-use argument * fix normalization * fix MLP layer order * fix output dimensions * remove L2 normalization before computing loss * compute rank of features and projections * documentation --------- Co-authored-by: Shalin Mehta <[email protected]>

* docstring * move scripts from contrastive_scripts to viscy/scripts * organize files in applications/contrastive_phenotyping * delete unused evaluation code * more cleanup * refactor evaluation metrics for translation task * refactor viscy.evaluation -> viscy.translation.evaluation_metrics and viscy.representation.evaluation * WIP: representation evaluation module * WIP: representation eval - docstrings in numpy format * WIP: more documentation * refactor: feature_extractor moved to viscy.representation.evaluation * lint * bug fix * refactored common computations and dataset * add imbalance-learn dependecy to metrics * refactor classification of embeddings * organize viscy.representation.evaluation * ruff * Soorya's plotting script * WIP: combine two versions of plot_embeddings.py * simplify representation.viscy.evaluation - move LCA to its own module * refactor of viscy.representation.evaluation * refactored and tested PCA and UMAP plots --------- Co-authored-by: Soorya Pradeep <[email protected]>

…et contrastive task (#154) * wip: sample positive and negative samples from another time point * configure time interval in triplet data module * vectorized anchor filtering * conditional augmentation for anchor anchor is augmented if the positive is another time point * example training script for the CTC dataset this is optimized to run on MPS * add example CTC prediction config for MPS

viscy/representation/evaluation.py

* refactor linear probing with lightning * test convenience function * always convert to long before onehot * use onehot only during training * supply trainer through argument to avoid wrapping * only log per epoch * example script for linear probing * add comment about loss curve * fix sample filtering order for select tracks * add script to visualize integrated gradients * plot integrated gradients over time * Use sklearn's logistic regression for linear probing (#169) * use binary logistic regression to initialize the linear layer * plot integrated gradients from a binary classifier * add cmap to 'visual' requirements * move model assembling to lca * rename init argument * disable feature scaling * update test and evaluation scripts to use new API * add docstrings to LCA

* add maplotlib style sheet for figure making * add cell division attribution * add matplotlib style sheet * move attribution computation to lca * tweak contrast limits and text * add captum to optional dependencies * move attribution function to a method of the classifier * add script to show organelle dynamics * add occlusion attribution * more generic save path * add uninfected cell * tweak subplot spacing

* updated files * format fixed for tests * updated scripts * umap dist code * bug fixes and linting * logistic regression script * add infection figure script * Add script for generating infection figure and perform prediction on the June dataset * Format code * Black format evaluation module and fix import in figure_cell_infection script * Refactor scatterplot colors and markers * Calculate model accuracy * Add script for appendix video * formatted code * updated displacement funcs for full embeddings * script for displacement computation * fix style * fix docstring format --------- Co-authored-by: Shalin Mehta <[email protected]> Co-authored-by: Soorya Pradeep <[email protected]> Co-authored-by: Ziwen Liu <[email protected]>

ziw-liu · 2024-09-27T21:50:40Z

#159 and #168 were still a bit broken. I'm merging now to prepare the branch point, but they should be fixed before merging to main.

mattersoflight · 2024-10-08T17:57:32Z

To merge this branch and release candidate 0.3.0-rc1, we need to test the following:

Demos and notebooks that illustrate robust virtual staining: @ziw-liu @edyoshikun
Try training models with updated example configs: @ziw-liu
Fix any remaining bugs in the representation learning code path.

We decided that the configs and checkpoints posted for the preprint will continue to depend on release 0.2.0.

* fix docstrings and type hint for the ContrastiveEncoder * refactor the representation evaluation module into submodules * move shared image logging into utils * fix line end * fix import paths in example notebooks

ziw-liu · 2024-10-10T01:05:32Z

Things I have tested with the current HEAD of this branch:

Train VS model with:

/hpc/projects/intracellular_dashboard/viral-sensor/infection_classification/models/phase-to-sensor/2024_08_14_ZIKV_pal17_48h/fit.yml

Predict with VSCyto2D (reported in the VS preprint) with:

/hpc/projects/intracellular_dashboard/ops/2024_09_19_tracking_accuracy_test/2-VS/tta/predict.yml

Imports paths in example VS notebooks and configs are correct

@mattersoflight I'm still working on #181 which will also introduce user interface changes. Should we do comprehensive release candidate testing after that?

Things need to be tested before release:

@edyoshikun or @mattersoflight:

End-to-end testing of the VS example notebooks.
(Optional) update the HF demo

@Soorya19Pradeep:

Training of new contrastive model
Prediction using model checkpoint we report in the paper.

mattersoflight · 2024-10-10T14:57:01Z

@mattersoflight I'm still working on #181 which will also introduce user interface changes. Should we do comprehensive release candidate testing after that?

@ziw-liu I suggest merging #181 (CLI interface) in this branch and then doing the tests you outlined so that everyone builds familiarity with the revised CLI.

Since these two PRs make multiple breaking (and welcome) changes to the codebase, I suggest tagging the current head of main as 0.2.1-rc1 or similar. We don't need to push this to PyPI; it is just for us to check out the current state of main if need arises.

ziw-liu · 2024-10-10T16:47:01Z

@mattersoflight Compared to the latest stable release (v0.2.1), the current HEAD of main adds a visualization script (#144) and a link to the demo (#172), so there should be no behavior change. If you still think we need a tag, I'm comfortable with just tagging v0.2.2 stable.

Soorya19Pradeep · 2024-10-10T22:18:46Z

I have done one round of testing. The training is underway and the prediction using an earlier model checkpoint was completed.

* remove obsolete metrics script for translation * move cellpose annotation script * consolidate CLI documentation * remove old CLI help * move translation CLI to its own module * move contrastive CLI to its own module * remove old CLI module * remove global entry script * share trainer class between tasks * move cli from init to main * inherit base CLI class for tasks * improve type hint and docstring * restore global CLI entry point * special case subclass mode for preprocessing * remove separate entry points * add CLI description message * make the setup function private * fix subclass mode detection * remove unused arguments from custom subcommands * use generic path in example * fix docstring style * update virtual staining example configs * update CTC SSL example configs * update infection SSL example configs

viscy/representation/evalutation/dimensionality_reduction.py

edyoshikun · 2024-10-17T18:50:52Z

As discussed, the HF model is pinned to use <0.3 versions and the gradio code is not exposed to the user, so we don't need to update this for now. The model weights are posted separately and point to the github.

edyoshikun · 2024-10-17T19:19:09Z

@ziwen, I'm done testing the virtual staining end-to-end. I didn't run any of the representation learning. I really appreciate the new config files structure and CLI. This will work well with any type of custom dataloaders and models. Thank you

edyoshikun

LGTM!

* extract function for computing umap * specific return type for predict step * write umap in prediction * raise log level for umap computation * fix key conversion

* draft readme * direct link dynaCLR schematic * add DynaCLR schemetic figure * add static schematic and link to video --------- Co-authored-by: Ziwen Liu <[email protected]> Co-authored-by: Ziwen Liu <[email protected]>

* Merging code related to figures (#146) * notes on standard report * Add code for generating figures --------- Co-authored-by: Alishba Imran <[email protected]> * produce a report of useful visualizations to assess the dimensionality and features learned by embeddings (#140) * notes on standard report * add lib of computed features * correlates PCA with computed features * compute for all timepoints * compute correlation * remove cv library usage * remove edge detection * convert to dataframe * for entire well * add std_dev feature * fix patch size --------- Co-authored-by: Soorya Pradeep <[email protected]> * Remove obsolete scripts for contrastive phenotyping (#150) * remove obsolete training and prediction scripts * lint contrastive scripts * SSL: fix MLP head and remove L2 normalization (#145) * draft projection head per Update the projection head (normalization and size). #139 * reorganize comments in example fit config * configurable stem stride and projection dimensions * update type hint and docstring for ContrastiveEncoder * clarify embedding_dim * use the forward method directly for projected * normalize projections only when fitting the projected features saved during prediction is now *not* normalized * remove unused logger * refactor training code into translation and representation modules * extract image logging functions * use AdamW instead of Adam for contrastive learning * inline single-use argument * fix normalization * fix MLP layer order * fix output dimensions * remove L2 normalization before computing loss * compute rank of features and projections * documentation --------- Co-authored-by: Shalin Mehta <[email protected]> * created and updated classify_feb_embeddings.py * Module and scripts for evaluating representations (#156) * docstring * move scripts from contrastive_scripts to viscy/scripts * organize files in applications/contrastive_phenotyping * delete unused evaluation code * more cleanup * refactor evaluation metrics for translation task * refactor viscy.evaluation -> viscy.translation.evaluation_metrics and viscy.representation.evaluation * WIP: representation evaluation module * WIP: representation eval - docstrings in numpy format * WIP: more documentation * refactor: feature_extractor moved to viscy.representation.evaluation * lint * bug fix * refactored common computations and dataset * add imbalance-learn dependecy to metrics * refactor classification of embeddings * organize viscy.representation.evaluation * ruff * Soorya's plotting script * WIP: combine two versions of plot_embeddings.py * simplify representation.viscy.evaluation - move LCA to its own module * refactor of viscy.representation.evaluation * refactored and tested PCA and UMAP plots --------- Co-authored-by: Soorya Pradeep <[email protected]> * delete duplicate file * lint * fix import paths * rename translation tests * rename translation metrics * Sample positive and negative samples with a time offset for the triplet contrastive task (#154) * wip: sample positive and negative samples from another time point * configure time interval in triplet data module * vectorized anchor filtering * conditional augmentation for anchor anchor is augmented if the positive is another time point * example training script for the CTC dataset this is optimized to run on MPS * add example CTC prediction config for MPS * add fig for mitosis * add script to save image patches * add save patches as npy * save figure at 300dpi * Linear probing (#160) * refactor linear probing with lightning * test convenience function * always convert to long before onehot * use onehot only during training * supply trainer through argument to avoid wrapping * only log per epoch * example script for linear probing * add comment about loss curve * fix sample filtering order for select tracks * add script to visualize integrated gradients * plot integrated gradients over time * Use sklearn's logistic regression for linear probing (#169) * use binary logistic regression to initialize the linear layer * plot integrated gradients from a binary classifier * add cmap to 'visual' requirements * move model assembling to lca * rename init argument * disable feature scaling * update test and evaluation scripts to use new API * add docstrings to LCA * Tweak attribution visualization (#170) * add maplotlib style sheet for figure making * add cell division attribution * add matplotlib style sheet * move attribution computation to lca * tweak contrast limits and text * add captum to optional dependencies * move attribution function to a method of the classifier * add script to show organelle dynamics * add occlusion attribution * more generic save path * add uninfected cell * tweak subplot spacing * UMAP line plot to assess temporal smoothness in features space (#176) * add maplotlib style sheet for figure making * add cell division attribution * add matplotlib style sheet * move attribution computation to lca * tweak contrast limits and text * add captum to optional dependencies * move attribution function to a method of the classifier * add script to show organelle dynamics * add occlusion attribution * more generic save path * add uninfected cell * tweak subplot spacing * lower case titles * reduce UMAP components to 2 and add indices * add script to make the bridge gaps figure * fixed import error * formatted with black * reduce to single arrow on plot * remove reduntant script * Fixes on correlation of PCA and UMAP components to computed_feature script (#159) * reduce initial patch size * add radial profiling * add function descriptions * add umap correlation * add def comments * change umap for all data * add script for 1 chan * add p-value analysis * add PCA analysis * remove duplicate script * Refactor and format code * Format code * Removed umap correlation * note for future refactor --------- Co-authored-by: Ziwen Liu <[email protected]> * updated eval module & cosine sim figures (#168) * updated files * format fixed for tests * updated scripts * umap dist code * bug fixes and linting * logistic regression script * add infection figure script * Add script for generating infection figure and perform prediction on the June dataset * Format code * Black format evaluation module and fix import in figure_cell_infection script * Refactor scatterplot colors and markers * Calculate model accuracy * Add script for appendix video * formatted code * updated displacement funcs for full embeddings * script for displacement computation * fix style * fix docstring format --------- Co-authored-by: Shalin Mehta <[email protected]> Co-authored-by: Soorya Pradeep <[email protected]> Co-authored-by: Ziwen Liu <[email protected]> * Fixup representation (#180) * fix docstrings and type hint for the ContrastiveEncoder * refactor the representation evaluation module into submodules * move shared image logging into utils * fix line end * fix import paths in example notebooks * Unified CLI entry point (#182) * remove obsolete metrics script for translation * move cellpose annotation script * consolidate CLI documentation * remove old CLI help * move translation CLI to its own module * move contrastive CLI to its own module * remove old CLI module * remove global entry script * share trainer class between tasks * move cli from init to main * inherit base CLI class for tasks * improve type hint and docstring * restore global CLI entry point * special case subclass mode for preprocessing * remove separate entry points * add CLI description message * make the setup function private * fix subclass mode detection * remove unused arguments from custom subcommands * use generic path in example * fix docstring style * update virtual staining example configs * update CTC SSL example configs * update infection SSL example configs * Remove outdated comment * updating the dlmbl notebooks * updating dependendencies to allow viscy>0.2 in examples * updating phase contrast demo notebook. * updating references to main * Store UMAP embeddings in SSL predictions (#184) * extract function for computing umap * specific return type for predict step * write umap in prediction * raise log level for umap computation * fix key conversion * Add representation section to readme (#186) * draft readme * direct link dynaCLR schematic * add DynaCLR schemetic figure * add static schematic and link to video --------- Co-authored-by: Ziwen Liu <[email protected]> Co-authored-by: Ziwen Liu <[email protected]> * fix link syntax in readme --------- Co-authored-by: Shalin Mehta <[email protected]> Co-authored-by: Alishba Imran <[email protected]> Co-authored-by: Soorya Pradeep <[email protected]> Co-authored-by: Alishba Imran <[email protected]> Co-authored-by: Soorya19Pradeep <[email protected]> Co-authored-by: Eduardo Hirata-Miyasaki <[email protected]>

mattersoflight and others added 4 commits August 28, 2024 10:59

Merging code related to figures (#146)

37b07a1

* notes on standard report * Add code for generating figures --------- Co-authored-by: Alishba Imran <[email protected]>

Remove obsolete scripts for contrastive phenotyping (#150)

6e7d61f

* remove obsolete training and prediction scripts * lint contrastive scripts

ziw-liu added enhancement New feature or request breaking Breaking changes bug Something isn't working labels Aug 31, 2024

ziw-liu added this to the v0.3.0 milestone Aug 31, 2024

mattersoflight requested review from mattersoflight and edyoshikun August 31, 2024 14:11

alishbaimran and others added 8 commits September 8, 2024 14:19

created and updated classify_feb_embeddings.py

4bfbf8b

delete duplicate file

9639961

Merge branch 'main' into representation

2f85eec

lint

083897c

fix import paths

4521afc

rename translation tests

19c4559

rename translation metrics

63d9f5a

ziw-liu changed the title ~~Single-cell representation learning (dev)~~ Single-cell representation learning Sep 10, 2024

ziw-liu added the representation Representation learning (SSL) label Sep 11, 2024

ziw-liu linked an issue Sep 11, 2024 that may be closed by this pull request

Time sampling for positive pair #123

Closed

3 tasks

ziw-liu commented Sep 11, 2024

View reviewed changes

viscy/representation/evaluation.py Outdated Show resolved Hide resolved

ziw-liu mentioned this pull request Sep 16, 2024

Number of UMAP components #165

Closed

Soorya19Pradeep and others added 6 commits September 17, 2024 13:28

add fig for mitosis

e2175b4

add script to save image patches

2a6cd20

add save patches as npy

767b12c

save figure at 300dpi

2759584

Fixup representation (#180)

2960633

* fix docstrings and type hint for the ContrastiveEncoder * refactor the representation evaluation module into submodules * move shared image logging into utils * fix line end * fix import paths in example notebooks

ziw-liu marked this pull request as ready for review October 16, 2024 18:01

ziw-liu mentioned this pull request Oct 16, 2024

Update hosted virtual staining config files for 0.3.0 #183

Open

ziw-liu commented Oct 16, 2024

View reviewed changes

viscy/representation/evalutation/dimensionality_reduction.py Outdated Show resolved Hide resolved

Remove outdated comment

25b10e1

This was linked to issues Oct 16, 2024

Number of UMAP components #165

Closed

CLI for multiple training tasks #127

Closed

edyoshikun added 3 commits October 17, 2024 10:52

updating the dlmbl notebooks

9c86140

updating dependendencies to allow viscy>0.2 in examples

43dee17

updating phase contrast demo notebook.

3b9063c

updating references to main

ef82427

edyoshikun approved these changes Oct 17, 2024

View reviewed changes

ziw-liu and others added 4 commits October 17, 2024 13:27

Store UMAP embeddings in SSL predictions (#184)

beb1c49

* extract function for computing umap * specific return type for predict step * write umap in prediction * raise log level for umap computation * fix key conversion

Add representation section to readme (#186)

8b0c6e7

* draft readme * direct link dynaCLR schematic * add DynaCLR schemetic figure * add static schematic and link to video --------- Co-authored-by: Ziwen Liu <[email protected]> Co-authored-by: Ziwen Liu <[email protected]>

Merge branch 'main' into representation

131f996

fix link syntax in readme

15b9386

ziw-liu merged commit ee834ce into main Oct 17, 2024
4 checks passed

ziw-liu deleted the representation branch October 17, 2024 23:06

ziw-liu mentioned this pull request Oct 18, 2024

WIP: refactor contrastive learning code with virtual staining code #109

Closed

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Single-cell representation learning #153

Single-cell representation learning #153

ziw-liu commented Aug 31, 2024 •

edited

Loading

ziw-liu commented Sep 27, 2024

mattersoflight commented Oct 8, 2024 •

edited

Loading

ziw-liu commented Oct 10, 2024 •

edited by edyoshikun

Loading

mattersoflight commented Oct 10, 2024 •

edited

Loading

ziw-liu commented Oct 10, 2024

Soorya19Pradeep commented Oct 10, 2024

edyoshikun commented Oct 17, 2024

edyoshikun commented Oct 17, 2024

edyoshikun left a comment

Single-cell representation learning #153

Single-cell representation learning #153

Conversation

ziw-liu commented Aug 31, 2024 • edited Loading

ziw-liu commented Sep 27, 2024

mattersoflight commented Oct 8, 2024 • edited Loading

ziw-liu commented Oct 10, 2024 • edited by edyoshikun Loading

mattersoflight commented Oct 10, 2024 • edited Loading

ziw-liu commented Oct 10, 2024

Soorya19Pradeep commented Oct 10, 2024

edyoshikun commented Oct 17, 2024

edyoshikun commented Oct 17, 2024

edyoshikun left a comment

Choose a reason for hiding this comment

ziw-liu commented Aug 31, 2024 •

edited

Loading

mattersoflight commented Oct 8, 2024 •

edited

Loading

ziw-liu commented Oct 10, 2024 •

edited by edyoshikun

Loading

mattersoflight commented Oct 10, 2024 •

edited

Loading