WIP: refactor contrastive learning code with virtual staining code #109

mattersoflight · 2024-07-17T23:54:37Z

This issue tracks our progress toward integration of the contrastive learning code with virtual staining code.

Our preprocessing code is currently in good shape and consists of:

fluorescence deskewing/deconvolution: shrimPy
phase deconvolution: recOrder
Registration with fluorescence: shrimPy
virtual staining of nuclei and membrane: VisCy
Segmentation: VisCy (Adding segmentation utility functions #108)
Tracking: ultrack

We are still improving the tracking to capture cell division and cells near the boundary of the FOV: @tayllatheodoro

Training
It works well via pytorch lightning CLI and configs. The dataloader also works well with the HCS data format.

Pending improvements:
Architecture :

Data loader and loss functions:

Implement NT-Xent loss (implicit negative samples) #136
Time sampling for positive pair #123
data loader that pools multiple datasets.

Prediction and evaluation
It works well via pytorch lightning CLI and configs.

Analyze embeddings with PCA and UMAP to generate a standard report: produce a report of useful visualizations to assess the dimensionality and features learned by embeddings #140 . @Soorya19Pradeep
Refine the napari visualization tool for annotation of cell states in latent space: Visualize tracked cells and their features czbiohub-sf/napari-iohub#13. @ziw-liu

In this round, we should make any changes in the code path that can affect the architecture.

The text was updated successfully, but these errors were encountered:

ziw-liu · 2024-07-18T17:28:25Z

~~Currently (Slurm) Cellpose segmentation is implemented in shrimPy, and will be moved to the new bioimage analysis repo.~~

@Soorya19Pradeep I'm outdated! It's now in #108

ziw-liu · 2024-07-18T17:30:21Z

For the napari UI I think we should first try interacting with the plugin through standardized data files so we don't have to maintain our own interface.

ziw-liu · 2024-07-18T23:04:12Z

The napari-clusters-plotter plugin does not implement readers. So it relies on what's available in the napari layer list (features are stored as attribute of the labels layer).

I now think a workable way is to implement a custom reader in napari-iohub for the images and tracks so the visualization is easier (handle mixed dimensions and scales etc). The ultrack plugin does load the extra columns in its output CSVs as layer features so it can be used by the cluster plotter.

As for clustering, I think dimensionality reduction should be done beforehand on all the cells, instead of on the limited number of cells in each FOV.

mattersoflight · 2024-07-19T02:53:37Z

The ultrack plugin does load the extra columns in its output CSVs as layer features so it can be used by the cluster plotter.

That's interesting. Can this work?

Write projected embeddings to the same table as the output of ultrack
load the tracks using ultrack plugin
use napari-cluster-plotter to visualize labels that match projected embeddings.
If yes, we don't need to create a new widget.

@ziw-liu please go ahead and decide on a useful and low-maintenance solution.

mattersoflight · 2024-07-23T23:53:24Z

@ziw-liu @alishbaimran Given our offline discussion, here is the prioritization of features:

update data format and data module for efficiency + allow selection of channels to encode,
define positive pairs based on temporal closeness, and
pool different datasets.

You could partition the refactor in 3 PRs, each of which implements the above and is tested with the corresponding training run.

We will train contrastive phenotyping models via the python scripts and CLIs that wrap these scripts. We don't have to prioritize integration with lighting CLI yet.

mattersoflight · 2024-07-24T16:10:57Z

@ziw-liu and @alishbaimran I think we can bypass the patchification step by chunking the Zarr store in C*Y*X sized chunks. The data chunked like this can be loaded fast enough on VAST since we just have to fetch the images within specific Z and T ranges. This will also simplify the processing pipeline and avoid the need to track one more data format.

I got this idea while exploring the data we are preparing for release with the paper on mantis.
Take a look at:
/hpc/projects/comp.micro/mantis/mantis_paper_data_release/figure_1.zarr

mattersoflight assigned edyoshikun, alishbaimran, ziw-liu and Soorya19Pradeep Jul 18, 2024

ziw-liu linked a pull request Jul 23, 2024 that will close this issue

Single-cell phenotyping with contrastive learning #113

Merged

ziw-liu closed this as completed in #113 Aug 2, 2024

ziw-liu reopened this Aug 2, 2024

mattersoflight assigned tayllatheodoro Aug 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: refactor contrastive learning code with virtual staining code #109

WIP: refactor contrastive learning code with virtual staining code #109

mattersoflight commented Jul 17, 2024 •

edited

Loading

ziw-liu commented Jul 18, 2024 •

edited

Loading

ziw-liu commented Jul 18, 2024

ziw-liu commented Jul 18, 2024

mattersoflight commented Jul 19, 2024 •

edited

Loading

mattersoflight commented Jul 23, 2024 •

edited

Loading

mattersoflight commented Jul 24, 2024

WIP: refactor contrastive learning code with virtual staining code #109

WIP: refactor contrastive learning code with virtual staining code #109

Comments

mattersoflight commented Jul 17, 2024 • edited Loading

ziw-liu commented Jul 18, 2024 • edited Loading

ziw-liu commented Jul 18, 2024

ziw-liu commented Jul 18, 2024

mattersoflight commented Jul 19, 2024 • edited Loading

mattersoflight commented Jul 23, 2024 • edited Loading

mattersoflight commented Jul 24, 2024

mattersoflight commented Jul 17, 2024 •

edited

Loading

ziw-liu commented Jul 18, 2024 •

edited

Loading

mattersoflight commented Jul 19, 2024 •

edited

Loading

mattersoflight commented Jul 23, 2024 •

edited

Loading