Skip to content

Commit

Permalink
minor adjustment to documentation and README
Browse files Browse the repository at this point in the history
  • Loading branch information
agisga authored Jun 29, 2024
1 parent 88ae103 commit dd3db9f
Show file tree
Hide file tree
Showing 3 changed files with 32 additions and 20 deletions.
21 changes: 15 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,17 @@
# Domain Identification (DomId)
# Domain Identification (DomId): A suite of deep unsupervised clustering algorithms

![GH Actions CI ](https://github.com/DIDSR/DomId/actions/workflows/ci.yml/badge.svg)

Deep unsupervised clustering algorithms for domain identification.
DomId is a Python package offering a PyTorch-based suite of unsupervised deep clustering algorithms. The primary goal is to identify subgroups that have not been previously annotated within image datasets.

Some of the implemented models are the Variational Deep Embedding (VaDE) model [Jiang et al., 2017], Conditionally Decoded Variational Deep Embedding (CDVaDE) [Sidulova et al., 2023], Deep Embedding Clustering (DEC) [Xie et al., 2016], Structural Deep Clustering Network (SDCN) [Bo et al., 2020].

These clustering algorithms include a feature extractor component, which can be either an Autoencoders (AE) or a Variational Autoencoder (VAE). The package provides multiple AE and VAE architectures to choose from and includes instructions for extending the package with custom neural network architectures or clustering algorithms.

Ready-to-use experiment tutorials in Jupyter notebooks are available for both the MNIST dataset and a digital pathology dataset.

By adopting a highly modular design, the codebase prioritizes straightforward extensibility, so that new models, datasets or tasks can be added with ease.
The software design of DomId follows the design principles of [DomainLab](https://github.com/marrlab/DomainLab), which is a modular Python package for training domain invariant neural networks and has been used to develop DomId.

## Installation

Expand All @@ -18,10 +27,10 @@ git clone https://github.com/agisga/DomId.git
poetry install
```

*Note*: DomId will be published to PyPI in the near future, and the installation will be as easy as `pip install domid`.

## Usage

The following examples demonstrate how to use the DomId Python package directly from your command line. You can also leverage its API within your Python scripts or notebooks for greater flexibility. For in-depth tutorials and case studies, please refer to the `notebooks` directory.

### VaDE model

The deep unsupervised clustering model VaDE has been proposed in [1].
Expand Down Expand Up @@ -187,10 +196,10 @@ poetry run python main_out.py --te_d 0 1 2 --tr_d 3 4 5 6 7 8 9 --task=mnistcolo

[2] Kingma, Welling. "Auto-encoding variational bayes." ICLR 2013. (<https://arxiv.org/abs/1312.6114>)

[3] Xie, Girshick, Farhadi. "Unsupervised Deep Embedding for Clustering Analysis" (2016) (<http://arxiv.org/abs/1511.06335>)
[3] Xie, Girshick, Farhadi. "Unsupervised Deep Embedding for Clustering Analysis." ICML 2016. (<http://arxiv.org/abs/1511.06335>)

[4] Sidulova, Sun, Gossmann. "Deep Unsupervised Clustering for Conditional Identification of Subgroups Within a Digital Pathology Image Set." MICCAI, 2023. (<https://link.springer.com/chapter/10.1007/978-3-031-43993-3_64>)

[5] Bo, Deyu, et al. "Structural deep clustering network." Proceedings of the web conference 2020. 2020. (<https://doi.org/10.1145/3366423.3380214>)

[6] Sidulova, Kahaki, Hagemann, Gossmann. "Contextual unsupervised deep clustering in digital pathology." 2024 (in review)
[6] Sidulova, Kahaki, Hagemann, Gossmann. "Contextual unsupervised deep clustering in digital pathology." CHIL 2024.
8 changes: 5 additions & 3 deletions docs/about.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,14 +51,16 @@ However, original SDCN model faces significant scalability challenges that hinde
particularly when dealing with whole-slide digital pathology images (WSI), which are typically of gigapixel size or larger.
This limitation arises from SDCN need for constructing a graph on the entire dataset and the imperative to process all data in a single
batch during training. To overcome this issue, we propose batching strategy to the SDCN training process and introduce
a novel batching approach tailored specifically for WSI data.
a novel batching approach tailored specifically for WSI data. [6]

# M2YD Model Summary

The M2YD model is implemented as an experimental method, which combines an unsupervised VAE-based clustering neural network with simultaneous training of a neural network for a supervised classification task.
At the current stage, the method/model is purely experimental (with limited validation), and thus not recommended for practical use, unless you know exactly what you are doing.

# AE+K-means Model

A two-stage approach where K-means clustering (a conventional clustering algorithm) is applied to the embedding space of a trained AE. This primarily serves as a baseline for performance comparisons.


# References
Expand All @@ -67,11 +69,11 @@ At the current stage, the method/model is purely experimental (with limited vali

[2] Kingma, Welling. "Auto-encoding variational bayes." ICLR 2013. (<https://arxiv.org/abs/1312.6114>)

[3] Xie, Girshick, Farhadi. "Unsupervised Deep Embedding for Clustering Analysis" (2016) (<http://arxiv.org/abs/1511.06335>)
[3] Xie, Girshick, Farhadi. "Unsupervised Deep Embedding for Clustering Analysis." ICML 2016. (<http://arxiv.org/abs/1511.06335>)

[4] Sidulova, Sun, Gossmann. "Deep Unsupervised Clustering for Conditional Identification of Subgroups Within a Digital Pathology Image Set." MICCAI, 2023. (<https://link.springer.com/chapter/10.1007/978-3-031-43993-3_64>)

[5] Bo, Deyu, et al. "Structural deep clustering network." Proceedings of the web conference 2020. 2020. (<https://doi.org/10.1145/3366423.3380214>)

[6] Sidulova, Kahaki, Hagemann, Gossmann. "Contextual unsupervised deep clustering in digital pathology." 2024 (in review)
[6] Sidulova, Kahaki, Hagemann, Gossmann. "Contextual unsupervised deep clustering in digital pathology." CHIL 2024.

23 changes: 12 additions & 11 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,28 +14,29 @@ Welcome to DomId's documentation!
About DomId
===============

The goal of this Python package is to provide a PyTorch-based platform for deep unsupervised clustering and domain identification.
DomId is a Python package offering a PyTorch-based suite of unsupervised deep clustering algorithms. The primary goal is to identify subgroups that have not been previously annotated within image datasets.

Currently implemented models include the Variational Deep Embedding (VaDE) model, Conditionally Decoded Variational Deep Embedding (CDVaDE), Deep Embedding Clustering (DEC). Other deep clustering models will be added in the future.
For additioonal information see the sections below.
For basic usage examples see: :doc:`readme_link`.
Some of the implemented models are the Variational Deep Embedding (VaDE) model [Jiang et al., 2017], Conditionally Decoded Variational Deep Embedding (CDVaDE) [Sidulova et al., 2023], Deep Embedding Clustering (DEC) [Xie et al., 2016], Structural Deep Clustering Network (SDCN) [Bo et al., 2020].

These clustering algorithms include a feature extractor component, which can be either an Autoencoders (AE) or a Variational Autoencoder (VAE). The package provides multiple AE and VAE architectures to choose from and includes instructions for extending the package with custom neural network architectures or clustering algorithms.

Experiment tutorials in Jupyter notebooks are available for both the MNIST dataset and a digital pathology dataset.

By adopting a highly modular design, the codebase prioritizes straightforward extensibility, so that new models, datasets or tasks can be added with ease.
The software design of DomId follows the design principles of [DomainLab](https://github.com/marrlab/DomainLab), which is a modular Python package for training domain invariant neural networks and has been used to develop DomId.

.. toctree::
:maxdepth: 1
:caption: More information about the models
:caption: Introduction and Quick Start guide:

about_link
readme_link


DomainLab
==============
DomainLab is a submodule that has been used to develop DomID, and it aims at learning domain invariant features by utilizing data from multiple domains so the learned feature can generalize to new unseen domains.

.. toctree::
:maxdepth: 1
:caption: DomainLab
:caption: More information about the models:

about_link


Loading a Datasets and Defining a Task
Expand Down

0 comments on commit dd3db9f

Please sign in to comment.