Merge branch 'main' of github.com:brainets/hoi

brainets · Jul 9, 2024 · b9febcf · b9febcf
2 parents 862c512 + 80ff722
commit b9febcf
Show file tree

Hide file tree

Showing 60 changed files with 3,028 additions and 1,044 deletions.
diff --git a/.github/workflows/pypi-publish.yml b/.github/workflows/pypi-publish.yml
@@ -1,31 +1,30 @@
-# This workflow will upload a Python Package using Twine when a release is created
-# For more information see: https://help.github.com/en/actions/language-and-framework-guides/using-python-with-github-actions#publishing-to-package-registries
-
-name: Upload Python Package
+name: Upload Python Package to PyPI when a Release is Created
 
 on:
   release:
     types: [created]
 
 jobs:
-  deploy:
+  pypi-publish:
+    name: Publish release to PyPI
     runs-on: ubuntu-latest
+    environment:
+      name: pypi
+      url: https://pypi.org/p/hoi
+    permissions:
+      id-token: write
     steps:
-    - uses: actions/[email protected]
-    - name: Set up Python
-      uses: actions/setup-python@v5
-      with:
-        python-version: '3.x'
-    - name: Install dependencies
-      run: |
-        python -m pip install --upgrade pip
-        pip install setuptools wheel twine
-    - name: Build and publish
-      env:
-        TWINE_USERNAME: ${{ secrets.PYPI_USERNAME }}
-        TWINE_PASSWORD: ${{ secrets.PYPI_PASSWORD }}
-      run: |
-        make clean_dist
-        make build_dist
-        make check_dist
-        make upload_dist
+      - uses: actions/checkout@v4
+      - name: Set up Python
+        uses: actions/setup-python@v5
+        with:
+          python-version: "3.x"
+      - name: Install dependencies
+        run: |
+          python -m pip install --upgrade pip
+          pip install setuptools wheel
+      - name: Build package
+        run: |
+          python setup.py sdist bdist_wheel  # Could also be python -m build
+      - name: Publish package distributions to PyPI
+        uses: pypa/gh-action-pypi-publish@release/v1
diff --git a/.github/workflows/test_doc.yml b/.github/workflows/test_doc.yml
@@ -73,7 +73,7 @@ jobs:
           touch _build/html/.nojekyll
 
       - name: Deploy Github Pages 🚀
-        uses: JamesIves/[email protected].1
+        uses: JamesIves/[email protected].3
         with:
           branch: gh-pages
           folder: docs/_build/html/

diff --git a/.gitignore b/.gitignore
@@ -157,3 +157,4 @@ yarn.lock
 *.dir
 *.zip
 *ipynb
+develop/
diff --git a/Makefile b/Makefile
@@ -0,0 +1,22 @@
+
+# clean dist
+clean_dist:
+	@rm -rf build/
+	@rm -rf build/
+	@rm -rf frites.egg-info/
+	@rm -rf dist/
+	@echo "Dist cleaned"
+
+# build dist
+build_dist: clean_dist
+	python setup.py sdist
+	python setup.py bdist_wheel
+	@echo "Dist built"
+
+# check distribution
+check_dist:
+	twine check dist/*
+
+# upload distribution
+upload_dist:
+	twine upload --verbose dist/*
diff --git a/docs/_static/jax_cgpu_entropy.png b/docs/_static/jax_cgpu_entropy.png
diff --git a/docs/_static/jax_cgpu_oinfo.png b/docs/_static/jax_cgpu_oinfo.png
diff --git a/docs/_templates/layout.html b/docs/_templates/layout.html
@@ -23,10 +23,11 @@
     <footer>
         <div class="foot">
             <img src="https://cibul.s3.amazonaws.com/e9619f705931403093351210dc70d8ee.base.image.jpg" alt="INT" height="80">
-            <img src="https://www.engagement.fr/wp-content/uploads/2013/02/Aix-Marseille-Universit%C3%A9.png" alt=" Aix-Marseille university" height="80">
+            <img src="https://www.univ-amu.fr/system/files/2021-05/AMU%20logo.png" alt=" Aix-Marseille university" height="80">
             <img src="https://developers.google.com/open-source/gsoc/resources/downloads/GSoC-Vertical.png" alt="Gsoc" height="80">
+            <img src="https://enlight-eu.org/images/logos/Logo_Gent.png" alt="Ghent" height="80">
         <br>
-        <p>&copy; Copyright {{ copyright }}.</p>
+        <!-- <p>&copy; Copyright {{ copyright }}.</p> -->
         </div>
     </footer>
 {% endblock %}
diff --git a/docs/api/api_core.rst b/docs/api/api_core.rst
@@ -12,12 +12,12 @@ Measures of Entropy
    :toctree: generated/
 
    get_entropy
-   entropy_gcmi
+   entropy_gc
+   entropy_gauss
    entropy_bin
    entropy_knn
    entropy_kernel
-   copnorm_nd
-   prepare_for_entropy
+   prepare_for_it
 
 Measures of Mutual Information
 ++++++++++++++++++++++++++++++++

diff --git a/docs/api/api_metrics.rst b/docs/api/api_metrics.rst
@@ -1,3 +1,5 @@
+.. _metrics:
+
 ``hoi.metrics``
 ---------------
 

diff --git a/docs/api/api_sim.rst b/docs/api/api_sim.rst
@@ -8,4 +8,4 @@ Simulate HOI.
 .. autosummary::
    :toctree: generated/
 
-   simulate_hois_gauss
+   simulate_hoi_gauss
diff --git a/docs/conf.py b/docs/conf.py
@@ -16,8 +16,8 @@
 sys.path.insert(0, os.path.abspath(".."))
 
 project = "HOI"
-copyright = "BraiNets"
-author = "BraiNets"
+# copyright = "BraiNets"
+# author = "BraiNets"
 release = hoi.__version__
 release = hoi.__version__
 
@@ -96,6 +96,7 @@
             "../examples/tutorials",
             "../examples/it",
             "../examples/metrics",
+            "../examples/statistics",
             "../examples/miscellaneous",
         ]
     ),

diff --git a/docs/contributor_guide.rst b/docs/contributor_guide.rst
@@ -1,3 +1,5 @@
+.. _contribute:
+
 Developer Documentation
 =======================
 

diff --git a/docs/glossary.rst b/docs/glossary.rst
@@ -1,3 +1,5 @@
+.. _glossary:
+
 Glossary
 ========
 
@@ -25,7 +27,14 @@ Glossary
       Partial Information Decomposition (PID) :cite:`williams2010nonnegative` is a framework for quantifying the unique, shared, and synergistic information that multiple variables provide about a target variable. It aims to decompose the mutual information between a set of predictor variables and a target variable into non-negative components, representing the unique information contributed by each predictor variable, the redundant information shared among predictor variables, and the synergistic information that can only be obtained by considering multiple predictor variables together. PID provides a more nuanced understanding of the relationships between variables in complex systems, beyond traditional pairwise measures of association.
 
     Network behavior
-      Higher Order Interactions between a set of variables.
+      Higher Order Interactions between a set of variables. Metrics of intrinsic
+      information  :cite:`luppi2024information`, i.e. information carried by a group of variables about their 
+      future, are part of this category. `Undirected` metrics :cite:`rosas2024characterising` as the 
+      O-information, fall as well in this category. 
 
     Network encoding
-      Higher Order Interactions between a set of variables about a target variable.
+      Higher Order Interactions between a set of variables modulated by a target variable.
+      Measures of exstrinsic information  :cite:`luppi2024information`, i.e. information carried by a group of 
+      variables about an external target are part of this group.
+      `Directed` metrics :cite:`rosas2024characterising`, as the 
+      Redundancy-synergy index (RSI), are also part of this group. 
diff --git a/docs/jax.rst b/docs/jax.rst
@@ -4,3 +4,128 @@ Jax: linear algebra backend
 One of the main issues in the study of the higher-order structure of complex systems is the computational cost required to investigate one by one all the multiplets of any order. When using information theoretic tools, one must consider the fact that each metric relies on a complex set of operations that have to be performed for all the multiplets of variables in the data set. The number of possible multiplets of :math:`k` nodes in a data set grows as :math:`\binom{n}{k}`. This means that, in a data set of :math:`100` variables, the multiples of three nodes are :math:`\simeq 10^5`, the multiples of 4 nodes, :math:`\simeq 10^6` and 5 nodes, :math:`\simeq 10^7`, etc. This leads to huge computational costs and time that can pose real problems to the study of higher-order interactions in different research fields.
 
 In this toolbox to deal with this problem, we used the recently developed Python library `Jax <https://github.com/google/jax>`_, that uses XLA to compile and run your NumPy programs on CPU, GPU and TPU.
+
+CPU vs. GPU : Performance comparison
+++++++++++++++++++++++++++++++++++++
+
+Computing entropy on large multi-dimensional arrays
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+In this first part, we are going to compare the time taken to compute entropy using large arrays. To run this comparison, we recommend using `Colab <https://colab.research.google.com/>`_ and go to *Modify > Notebook settings* and select a GPU environment.
+
+In the first cell, install hoi and import some modules:
+
+.. code-block:: shell
+
+    !pip install hoi
+
+    import numpy as np
+    import jax
+    import jax.numpy as jnp
+    from time import time
+
+    from hoi.metrics import Oinfo
+    from hoi.core import get_entropy
+
+    import matplotlib.pyplot as plt
+
+    plt.style.use("ggplot")
+
+In a new cell, past the following code. This code compute the Gaussian Copula entropy for an array with a size growing, both on the CPU or GPU :
+
+.. code-block:: shell
+
+    def compute_timings(n=15):
+        n_samples = np.linspace(10, 10e2, n).astype(int)
+        n_features = np.linspace(1, 10, n).astype(int)
+        n_variables = np.linspace(1, 10e2, n).astype(int)
+
+        entropy = jax.vmap(get_entropy(method="gc"), in_axes=(0,))
+
+        # dry run
+        entropy(np.random.rand(2, 2, 10))
+
+        timings_cpu = []
+        data_size = []
+        for n_s, n_f, n_v in zip(n_samples, n_features, n_variables):
+            # generate random data
+            x = np.random.rand(n_v, n_f, n_s)
+            x = jnp.asarray(x)
+
+            # compute entropy
+            start = time()
+            entropy(x)
+            timings_cpu.append(time() - start)
+            data_size.append(n_s * n_f * n_v)
+
+        return data_size, timings_cpu
+
+    with jax.default_device(jax.devices("gpu")[0]):
+        data_size, timings_gpu = compute_timings()
+
+    with jax.default_device(jax.devices("cpu")[0]):
+        data_size, timings_cpu = compute_timings()
+
+
+Finally, plot the timing comparison :
+
+.. code-block:: shell
+
+    plt.plot(data_size, timings_cpu, label="CPU")
+    plt.plot(data_size, timings_gpu, label="GPU")
+    plt.xlabel("Data size")
+    plt.ylabel("Time (s)")
+    plt.title("CPU vs. GPU for computing entropy", fontweight="bold")
+    plt.legend()
+
+
+.. image:: _static/jax_cgpu_entropy.png
+
+On CPU, the computing time increase linearly as the array gets larger. However, on GPU, it doesn't scale as fast.
+
+Computing Higher-Order Interactions on large multiplets
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+In the next example, we are going to compute Higher-Order Interactions on a large network of 10 nodes with an increasing order (i.e. multiplets up to size 3, 4, ..., 10), both on CPU and GPU.
+
+.. code-block:: shell
+
+    def compute_timings():
+        # create a dynamic network with 1000 samples, 10 nodes and
+        # 100 time points
+        x = np.random.rand(1000, 10, 100)
+
+        # define the model
+        model = Oinfo(x, verbose=False)
+
+        # compute hoi for increasing order
+        order = np.arange(3, 11)
+        timings = []
+        for o in order:
+            start = time()
+            model.fit(minsize=3, maxsize=o)
+            timings.append(time() - start)
+        
+        return order, timings
+
+    with jax.default_device(jax.devices("gpu")[0]):
+        order, timings_gpu = compute_timings()
+
+    with jax.default_device(jax.devices("cpu")[0]):
+        order, timings_cpu = compute_timings()
+
+Let's plot the results :
+
+.. code-block:: shell
+
+    plt.plot(order, timings_cpu, label="CPU")
+    plt.plot(order, timings_gpu, label="GPU")
+    plt.xlabel("Multiplet order")
+    plt.ylabel("Time (s)")
+    plt.title("CPU vs. GPU for computing the O-information", fontweight="bold")
+    plt.legend()
+
+
+.. image:: _static/jax_cgpu_oinfo.png
+
+On this toy example, computing the O-information on CPU takes ~13 seconds for each order while on GPU it takes ~3 seconds. GPU computations are ~4 times faster than CPU !
diff --git a/docs/quickstart.rst b/docs/quickstart.rst
@@ -1,7 +1,71 @@
 Quickstart
 ==========
 
+HOI is a Python package to estimate :term:`Higher Order Interactions`. A network is composed of nodes (e.g. users in social network, brain areas in neuroscience, musicians in an orchestra etc.) and nodes are interacting together. Traditionally we measure pairwise interactions. HOI allows to go beyond the pairwise interactions by quantifying the interactions between 3, 4, ..., N nodes of the system. As we are using measures from the :term:`Information Theory`, we can further describe the type of interactions, i.e. whether nodes of the network tend to have redundant or synergistic interactions (see the definition of :term:`Redundancy`, :term:`Synergy`).
+
+* **Installation :** to install HOI with its dependencies, see :ref:`installation`. If you are a developer or if you want to contribute to HOI, checkout the :ref:`contribute`.
+* **Theoretical background :** For a detailed introduction to information theory and HOI, see :ref:`theory`. You can also have a look to our :ref:`glossary` to see the definition of the terms we are using here.
+* **API and examples :** the list of functions and classes can be found in the section :ref:`hoi_modules`. For practical examples on how to use those functions, see :doc:`auto_examples/index`. For faster computations, HOI is built on top of Jax. Checkout the page :doc:`jax` for the performance claims.
+
+Installation
+++++++++++++
+
+To install or update HOI, run the following command in your terminal :
+
+.. code-block:: bash
+
+   pip install -U hoi
+
+Simulate data
++++++++++++++
+
+We provide functions to simulate data and toy example. In a notebook or in a python script, you can run the following lines to simulate synergistic interactions between three variables :
+
+.. code-block:: python
+
+   from hoi.simulation import simulate_hoi_gauss
+
+   data = simulate_hoi_gauss(n_samples=1000, triplet_character='synergy')
+
+Compute Higher-Order Interactions
++++++++++++++++++++++++++++++++++
+
+We provide a list of metrics of HOI (see :ref:`metrics`). Here, we are going to use the O-information (:class:`hoi.metrics.Oinfo`):
+
 .. code-block:: python
 
-   # this is a comment
-   x = 2
+   # import the O-information
+   from hoi.metrics import Oinfo
+
+   # define the model
+   model = Oinfo(data)
+
+   # compute hoi for multiplets with a minimum size of 3 and maximum size of 3
+   # using the Gaussian Copula entropy
+   hoi = model.fit(minsize=3, maxsize=3, method="gc")
+
+Inspect the results
++++++++++++++++++++
+
+To inspect your results, we provide a plotting function called :func:`hoi.plot.plot_landscape` to see how the information is spreading across orders together with :func:`hoi.utils.get_nbest_mult` to get a table of the multiplets with the strongest synergy or redundancy :
+
+
+.. code-block:: python
+
+   from hoi.plot import plot_landscape
+   from hoi.utils import get_nbest_mult
+
+   # plot the landscape
+   plot_landscape(hoi, model=model)
+
+   # print the summary table
+   print(get_nbest_mult(hoi, model=model))
+
+
+Practical recommendations
++++++++++++++++++++++++++
+
+Robust estimations of HOI strongly rely on the accuity of measuring entropy/mutual information on/between (potentially highly) multivariate data. In the :doc:`auto_examples/index` section you can find benchmarks of our entropy estimators. Here we recommend :
+
+* **Measuring entropy and mutual information :** we recommend the Gaussian Copula method (`method="gc"`). Although this measure is not accurate for capturing relationships beyond the gaussian assumption (see :ref:`sphx_glr_auto_examples_it_plot_entropies.py`), this method performs relatively well for multivariate data (see :ref:`sphx_glr_auto_examples_it_plot_entropies_mvar.py`)
+* **Measuring Higher-Order Interactions for network behavior and network encoding :** for network behavior and ncoding, we recommend respectively the O-information :class:`hoi.metrics.Oinfo` and the :class:`hoi.metrics.GradientOinfo`. Although both metrics suffer from the same limitations, like the spreading to higher orders, this can be mitigated using a boostrap approach (see :ref:`sphx_glr_auto_examples_statistics_plot_bootstrapping.py`). Otherwise, both metrics are usually pretty accurate to retrieve the type of interactions between variables, especially once combined with the Gaussian Copula.