diff --git a/README.md b/README.md index 4ab2f03..797a1ac 100644 --- a/README.md +++ b/README.md @@ -1,8 +1,17 @@ -# Domain Identification (DomId) +# Domain Identification (DomId): A suite of deep unsupervised clustering algorithms ![GH Actions CI ](https://github.com/DIDSR/DomId/actions/workflows/ci.yml/badge.svg) -Deep unsupervised clustering algorithms for domain identification. +DomId is a Python package offering a PyTorch-based suite of unsupervised deep clustering algorithms. The primary goal is to identify subgroups that have not been previously annotated within image datasets. + +Some of the implemented models are the Variational Deep Embedding (VaDE) model [Jiang et al., 2017], Conditionally Decoded Variational Deep Embedding (CDVaDE) [Sidulova et al., 2023], Deep Embedding Clustering (DEC) [Xie et al., 2016], Structural Deep Clustering Network (SDCN) [Bo et al., 2020]. + +These clustering algorithms include a feature extractor component, which can be either an Autoencoders (AE) or a Variational Autoencoder (VAE). The package provides multiple AE and VAE architectures to choose from and includes instructions for extending the package with custom neural network architectures or clustering algorithms. + +Ready-to-use experiment tutorials in Jupyter notebooks are available for both the MNIST dataset and a digital pathology dataset. + +By adopting a highly modular design, the codebase prioritizes straightforward extensibility, so that new models, datasets or tasks can be added with ease. +The software design of DomId follows the design principles of [DomainLab](https://github.com/marrlab/DomainLab), which is a modular Python package for training domain invariant neural networks and has been used to develop DomId. ## Installation @@ -18,10 +27,10 @@ git clone https://github.com/agisga/DomId.git poetry install ``` -*Note*: DomId will be published to PyPI in the near future, and the installation will be as easy as `pip install domid`. - ## Usage +The following examples demonstrate how to use the DomId Python package directly from your command line. You can also leverage its API within your Python scripts or notebooks for greater flexibility. For in-depth tutorials and case studies, please refer to the `notebooks` directory. + ### VaDE model The deep unsupervised clustering model VaDE has been proposed in [1]. @@ -187,10 +196,10 @@ poetry run python main_out.py --te_d 0 1 2 --tr_d 3 4 5 6 7 8 9 --task=mnistcolo [2] Kingma, Welling. "Auto-encoding variational bayes." ICLR 2013. () -[3] Xie, Girshick, Farhadi. "Unsupervised Deep Embedding for Clustering Analysis" (2016) () +[3] Xie, Girshick, Farhadi. "Unsupervised Deep Embedding for Clustering Analysis." ICML 2016. () [4] Sidulova, Sun, Gossmann. "Deep Unsupervised Clustering for Conditional Identification of Subgroups Within a Digital Pathology Image Set." MICCAI, 2023. () [5] Bo, Deyu, et al. "Structural deep clustering network." Proceedings of the web conference 2020. 2020. () -[6] Sidulova, Kahaki, Hagemann, Gossmann. "Contextual unsupervised deep clustering in digital pathology." 2024 (in review) +[6] Sidulova, Kahaki, Hagemann, Gossmann. "Contextual unsupervised deep clustering in digital pathology." CHIL 2024. diff --git a/docs/about.md b/docs/about.md index cd14488..ea685d9 100644 --- a/docs/about.md +++ b/docs/about.md @@ -51,14 +51,16 @@ However, original SDCN model faces significant scalability challenges that hinde particularly when dealing with whole-slide digital pathology images (WSI), which are typically of gigapixel size or larger. This limitation arises from SDCN need for constructing a graph on the entire dataset and the imperative to process all data in a single batch during training. To overcome this issue, we propose batching strategy to the SDCN training process and introduce -a novel batching approach tailored specifically for WSI data. +a novel batching approach tailored specifically for WSI data. [6] # M2YD Model Summary The M2YD model is implemented as an experimental method, which combines an unsupervised VAE-based clustering neural network with simultaneous training of a neural network for a supervised classification task. At the current stage, the method/model is purely experimental (with limited validation), and thus not recommended for practical use, unless you know exactly what you are doing. +# AE+K-means Model +A two-stage approach where K-means clustering (a conventional clustering algorithm) is applied to the embedding space of a trained AE. This primarily serves as a baseline for performance comparisons. # References @@ -67,11 +69,11 @@ At the current stage, the method/model is purely experimental (with limited vali [2] Kingma, Welling. "Auto-encoding variational bayes." ICLR 2013. () -[3] Xie, Girshick, Farhadi. "Unsupervised Deep Embedding for Clustering Analysis" (2016) () +[3] Xie, Girshick, Farhadi. "Unsupervised Deep Embedding for Clustering Analysis." ICML 2016. () [4] Sidulova, Sun, Gossmann. "Deep Unsupervised Clustering for Conditional Identification of Subgroups Within a Digital Pathology Image Set." MICCAI, 2023. () [5] Bo, Deyu, et al. "Structural deep clustering network." Proceedings of the web conference 2020. 2020. () -[6] Sidulova, Kahaki, Hagemann, Gossmann. "Contextual unsupervised deep clustering in digital pathology." 2024 (in review) +[6] Sidulova, Kahaki, Hagemann, Gossmann. "Contextual unsupervised deep clustering in digital pathology." CHIL 2024. diff --git a/docs/index.rst b/docs/index.rst index 88d2f74..c31add7 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -14,28 +14,29 @@ Welcome to DomId's documentation! About DomId =============== -The goal of this Python package is to provide a PyTorch-based platform for deep unsupervised clustering and domain identification. +DomId is a Python package offering a PyTorch-based suite of unsupervised deep clustering algorithms. The primary goal is to identify subgroups that have not been previously annotated within image datasets. -Currently implemented models include the Variational Deep Embedding (VaDE) model, Conditionally Decoded Variational Deep Embedding (CDVaDE), Deep Embedding Clustering (DEC). Other deep clustering models will be added in the future. -For additioonal information see the sections below. -For basic usage examples see: :doc:`readme_link`. +Some of the implemented models are the Variational Deep Embedding (VaDE) model [Jiang et al., 2017], Conditionally Decoded Variational Deep Embedding (CDVaDE) [Sidulova et al., 2023], Deep Embedding Clustering (DEC) [Xie et al., 2016], Structural Deep Clustering Network (SDCN) [Bo et al., 2020]. +These clustering algorithms include a feature extractor component, which can be either an Autoencoders (AE) or a Variational Autoencoder (VAE). The package provides multiple AE and VAE architectures to choose from and includes instructions for extending the package with custom neural network architectures or clustering algorithms. + +Experiment tutorials in Jupyter notebooks are available for both the MNIST dataset and a digital pathology dataset. + +By adopting a highly modular design, the codebase prioritizes straightforward extensibility, so that new models, datasets or tasks can be added with ease. +The software design of DomId follows the design principles of [DomainLab](https://github.com/marrlab/DomainLab), which is a modular Python package for training domain invariant neural networks and has been used to develop DomId. .. toctree:: :maxdepth: 1 - :caption: More information about the models + :caption: Introduction and Quick Start guide: - about_link + readme_link -DomainLab -============== -DomainLab is a submodule that has been used to develop DomID, and it aims at learning domain invariant features by utilizing data from multiple domains so the learned feature can generalize to new unseen domains. - .. toctree:: :maxdepth: 1 - :caption: DomainLab + :caption: More information about the models: + about_link Loading a Datasets and Defining a Task