Github: https://github.com/mh105/somata
State-space Oscillator Modeling And Time-series Analysis (SOMATA) is a Python library for state-space neural signal processing algorithms developed in the Purdon Lab. Basic state-space models are introduced as class objects for flexible manipulations. Classical exact and approximate inference algorithms are implemented and interfaced as class methods. Advanced neural oscillator modeling techniques are brought together to work synergistically.
- Requirements
- Install
- Basic state-space models
- Advanced neural oscillator methods
- Authors
- Citation
- License
somata
is built on numpy
arrays for computations. joblib
is used for multithreading.
Additional dependencies include scipy
, matplotlib
, cmdstanpy
, and spectrum
.
The source localization module also requires pytorch
and MNE-python
.
-
Full package requirements for each release will be updated in the
requirements-*.txt
files. Thepyproject.toml
file is specified to dynamically retrieve the metadata of dependencies forsetuptools
duringpip install
to verify that runtime dependent packages of compatible versions have been installed. Whenpip install
is used, missing dependencies will be fetched from Python Package Index (PyPI) and installed. -
For development or installing
somata
into a new conda environment,requirements-*.txt
files can also be passed toconda create
via the--file
directive to create a new conda environment with all and only the required packages installed in the new conda environment. Whenconda create
orconda install
is used, missing dependencies will be fetched from conda channels and installed.
$ pip install somata
or
$ conda install somata -c pytorch -c conda-forge
While pip install
usually works, an alternative way to install somata
is through the conda-forge channel, which utilizes continuous integration (CI) across OS platforms.
This means that conda-forge packages are more compatible with each other compared to PyPI packages installed via pip
by default.
If pip install somata
fails to resolve some dependencies, the conda-forge somata feedstock can be used to install somata
.
If you have not installed CmdStan
on your OS before, it is easier to install from the conda-forge
channel if you need to use the Phase Amplitude Coupling Estimation module.
On Windows, it is also easier to use conda install
due to a need to build spectrum
during pip install
that requires Microsoft Visual C++ 14.0 or greater.
If the torch
dependency is not resolved correctly for your OS (such as installed the cpu-only
version when GPU processing is needed), first install pytorch
manually in a conda environment that you want to install somata
in, and then rerun either of the above two lines to install somata
.
Please be aware of a common mixup that PyTorch is distributed as torch
on PyPI but as pytorch
on conda-forge.
If using conda
to install, Windows OS needs to download pytorch
from the pytorch
channel, as win-64
is not built on the conda-forge
channel.
-
Recommended conda distribution: Miniforge3
Apple silicon Mac: choose Miniforge3 native to the ARM64 architecture instead of Intel x86.
-
You may also directly install
somata
in an existing conda environment by skipping this step.$ cd <repo root directory with pyproject.toml>
$ mamba create -n somata -c pytorch -c conda-forge --file requirements/requirements-core.txt --file requirements/requirements-dev.txt
$ mamba activate somata
-
$ cd <repo root directory with pyproject.toml>
$ pip install -e . --config-settings editable_mode=compat
When somata
is installed into an existing conda environment, unmet dependencies are automatically searched, downloaded, and installed from the same repository of packages requested to provide the somata
build distribution and its dependencies (from PyPI with pip install somata
or from pytorch and conda-forge channels with conda install somata -c pytorch -c conda-forge
).
The need for specifying dependencies arises in multiple ways for Python.
There are non-negligible complexities due to the existence of different build backends as well as different Python dependency management and packaging tools such as pdm
, poetry
, pip
, conda
, etc.
One modern standard is to use a declarative config pyproject.toml
file for package building and dependency management, which is becoming popular across build backends and frontends.
In this project, we use pyproject.toml
with the setuptools
backend for building somata
, and we use conda
to manage development environments.
We have deliberately forgone the uses of setup.py
and setup.cfg
for package building and the use of environment.yml
for conda environment creation.
Instead, we use a set of requirements-*.txt
files. These simple one-liner entries can be passed with --file
directives into both setuptools
and conda
, allowing single sourcing the core dependency list.
They are also minimalistic in style, so one can easily re-write them into a desired dependency list, such as a PEP 621 compliant dependencies =
key under the [project]
table in pyproject.toml
that can be accepted across tools and backends.
somata
, much like a neuron body supported by dendrites, is built on a set of basic state-space models introduced as class objects.
The motivations are to:
- develop a standardized format to store model parameters of state-space equations
- override Python dunder methods so
__repr__
and__str__
return something useful - define arithmetic-like operations such as
A + B
andA * B
- emulate
numpy.array()
operations including.append()
- implement inference algorithms like Kalman filtering and parameter update (m-step) equations as callable class methods
At present, and in the near future, somata
will be focused on time-invariant Gaussian linear dynamical systems.
This limit on models we consider simplifies basic models to avoid nested classes such as transition_model
and
observation_model
, at the cost of restricting somata
to classical algorithms with only some extensions to
Bayesian inference and learning. This is a deliberate choice to allow easier, faster, and cleaner applications of
somata
in neural data analysis, instead of to provide a full-fledged statistical inference package.
somata.StateSpaceModel(components=None, F=None, Q=None, mu0=None, S0=None, G=None, R=None, y=None, Fs=None)
StateSpaceModel
is the parent class of basic state-space models. The corresponding Gaussian linear dynamical system is:
Most of the constructor input arguments correspond to these model parameters, which are stored as instance attributes.
There are two additional arguments: Fs
and components
.
Fs
is the sampling frequency of observed data y
.
components
is a list of independent components underlying the hidden states components
would be a list of two AR1 models. Note that each element of the components
list should be an instance of one of basic model
class objects. To break the recursion, often the components
attribute of a component is set to None
, i.e.,
components[0].components = None
.
StateSpaceModel.__repr__()
The double-under method __repr__()
is overwritten to provide some unique identification info:
>>> s1 = StateSpaceModel()
>>> s1
Ssm(0)<f4c0>
where the number inside parenthesis indicates the number of components (the ncomp
attribute) in the model, and the four-digits in <>
are the last four digits of the memory address of the object instance.
StateSpaceModel.__str__()
The double-under method __str__()
is overwritten so print()
returns useful info:
>>> print(s1)
<Ssm object at 0x102a8f4c0>
nstate = 0 ncomp = 0
nchannel = 0 ntime = 0
nmodel = 1
components = None
F .shape = None Q .shape = None
mu0.shape = None S0 .shape = None
G .shape = None R .shape = None
y .shape = None Fs = None
- Model stacking in
StateSpaceModel
In many applications, there are several possible parameter values for a given state-space model structure. Instead of duplicating
the same values in multiple instances, somata uses stacking to store multiple model values in the same object instance. Stackable
model parameters are F, Q, mu0, S0, G, R
. For example:
>>> s1 = StateSpaceModel(F=1, Q=2)
>>> s2 = StateSpaceModel(F=2, Q=2)
>>> print(s1)
<Ssm object at 0x11fd7bfa0>
nstate = 1 ncomp = 0
nchannel = 0 ntime = 0
nmodel = 1
components = None
F .shape = (1, 1) Q .shape = (1, 1)
mu0.shape = None S0 .shape = None
G .shape = None R .shape = None
y .shape = None Fs = None
>>> print(s2)
<Ssm object at 0x102acc130>
nstate = 1 ncomp = 0
nchannel = 0 ntime = 0
nmodel = 1
components = None
F .shape = (1, 1) Q .shape = (1, 1)
mu0.shape = None S0 .shape = None
G .shape = None R .shape = None
y .shape = None Fs = None
>>> s3 = s1+s2
>>> print(s3)
<Ssm object at 0x102acc280>
nstate = 1 ncomp = 0
nchannel = 0 ntime = 0
nmodel = 2
components = None
F .shape = (1, 1, 2) Q .shape = (1, 1)
mu0.shape = None S0 .shape = None
G .shape = None R .shape = None
y .shape = None Fs = None
Invoking the arithmetic operator +
stacks the two instances s1
and s2
into a new instance, where the third dimension of the
F
attribute is now 2
, with the two values from s1
and s2
. The nmodel
attribute is also updated to 2
.
>>> s3.F
array([[[1., 2.]]])
Notice how the third dimension of the Q
attribute is still None
. This is because the +
operator has a built-in duplication check
such that the identical model parameters will not be stacked. This behavior of __add__
and __radd__
generalizes to all model parameters, and it is convenient when bootstrapping or testing different parameter values during neural data analysis. Manual stacking of a particular
model parameter is also possible with .stack_attr()
.
- Model expanding in
StateSpaceModel
Similar to stacking, there is a related concept called expanding. Expanding a model is useful when we want to permutate multiple model parameters each with several possible values. For example:
>>> s1 = StateSpaceModel(F=1, Q=3, R=5)
>>> s2 = StateSpaceModel(F=2, Q=4, R=5)
>>> print(s1+s2)
<Ssm object at 0x1059626b0>
nstate = 1 ncomp = 0
nchannel = 1 ntime = 0
nmodel = 2
components = None
F .shape = (1, 1, 2) Q .shape = (1, 1, 2)
mu0.shape = None S0 .shape = None
G .shape = None R .shape = (1, 1)
y .shape = None Fs = None
>>> s3 = s1*s2
>>> print(s3)
<Ssm object at 0x1059626b0>
nstate = 1 ncomp = 0
nchannel = 1 ntime = 0
nmodel = 4
components = None
F .shape = (1, 1, 4) Q .shape = (1, 1, 4)
mu0.shape = None S0 .shape = None
G .shape = None R .shape = (1, 1)
y .shape = None Fs = None
>>> s3.F
array([[[1., 1., 2., 2.]]])
>>> s3.Q
array([[[3., 4., 3., 4.]]])
Multiplying two StateSpaceModel
instances with more than one differing model parameters results in expanding these parameters into all possible combinations while keeping other identical attributes intact.
- Arrays of
StateSpaceModel
Building on stacking and expanding, we can easily form an array of StateSpaceModel
instances using .stack_to_array()
:
>>> s_array = s3.stack_to_array()
>>> s_array
array([Ssm(0)<4460>, Ssm(0)<4430>, Ssm(0)<4520>, Ssm(0)<4580>],
dtype=object)
Note that a StateSpaceModel
array is duck-typing with a Python list
, which means one can also form a valid array with [s1, s2]
.
StateSpaceModel.__len__()
Invoking len()
returns the number of stacked models:
>>> len(s2)
1
>>> len(s3)
4
StateSpaceModel.append()
Another useful class method on StateSpaceModel
is .append()
. As one would expect, appending a model to another results in
combining them in block-diagonal form in the state equation. Compatibility checks happen in the background to make sure no conflict
exists on the respective observation equations and observed data, if any.
>>> s1 = StateSpaceModel(F=1, Q=3, R=5)
>>> s2 = StateSpaceModel(F=2, Q=4, R=5)
>>> s1.append(s2)
>>> print(s1)
<Ssm object at 0x1057cb4c0>
nstate = 2 ncomp = 0
nchannel = 1 ntime = 0
nmodel = 1
components = None
F .shape = (2, 2) Q .shape = (2, 2)
mu0.shape = None S0 .shape = None
G .shape = None R .shape = (1, 1)
y .shape = None Fs = None
>>> s1.F
array([[1., 0.],
[0., 2.]])
>>> s1.Q
array([[3., 0.],
[0., 4.]])
Notice that the nstate
attribute is now updated to 2
, which is different from the +
operator that updates the nmodel
attribute to 2
.
- Inference and learning with
StateSpaceModel
Two different implementations of Kalman filtering and fixed-interval smoothing are callable class methods:
.kalman_filt_smooth(y=None, R_weights=None, return_dict=False, EM=False, skip_interp=True, seterr=None)
.dejong_filt_smooth(y=None, R_weights=None, return_dict=False, EM=False, skip_interp=True, seterr=None)
With an array of StateSpaceModel
, one can easily run Kalman filtering and smoothing on all array elements with multithreading using the static method .par_kalman()
:
.par_kalman(ssm_array, y=None, method='kalman', R_weights=None, skip_interp=True, return_dict=False)
M-step updates are organized using m_estimate()
that will recurse into each element of the components
list and use
the appropriate m-step update methods associated with different types of state-space models.
Below we explain three kinds of basic state-space models currently supported in somata.
somata.OscillatorModel(a=None, freq=None, w=None, sigma2=None, add_dc=False,
components='Osc', F=None, Q=None, mu0=None, S0=None, G=None, R=None, y=None, Fs=None)
OscillatorModel
is a child class of StateSpaceModel
, which means it inherits all the class methods explained above. It has a particular form of the state-space model:
To create a simple oscillator model with rotation frequency
>>> o1 = OscillatorModel(a=0.9, freq=15, Fs=100)
>>> o1
Osc(1)<81f0>
>>> print(o1)
<Osc object at 0x1058081f0>
nstate = 2 ncomp = 1
nchannel = 0 ntime = 0
nmodel = 1
components = [Osc(0)<4b50>]
F .shape = (2, 2) Q .shape = (2, 2)
mu0.shape = (2, 1) S0 .shape = (2, 2)
G .shape = (1, 2) R .shape = None
y .shape = None Fs = 100.0 Hz
damping a = [0.9]
freq Hz = [15.]
sigma2 = [3.]
obs noise R = None
dc index = None
Notice the components
attribute auto-populates with a spaceholder OscillatorModel
instance, which is different from the o1
instance
as can be recognized by different memory addresses. State noise variance sigma2
argument to the constructor method.
somata.AutoRegModel(coeff=None, sigma2=None,
components='Arn', F=None, Q=None, mu0=None, S0=None, G=None, R=None, y=None, Fs=None)
AutoRegModel
is a child class of StateSpaceModel
, which means it inherits all the class methods explained above. It has a particular form of the state-space model. For example, an auto-regressive model of order 3 can be expressed as:
To create an AR3 model with parameters
>>> a1 = AutoRegModel(coeff=[0.5,0.3,0.1], sigma2=1)
>>> a1
Arn=3<24d0>
>>> print(a1)
<Arn object at 0x1035524d0>
nstate = 3 ncomp = 1
nchannel = 0 ntime = 0
nmodel = 1
components = [Arn=3<2680>]
F .shape = (3, 3) Q .shape = (3, 3)
mu0.shape = (3, 1) S0 .shape = (3, 3)
G .shape = (1, 3) R .shape = None
y .shape = None Fs = None
AR order = [3]
AR coeff = ([0.5 0.3 0.1])
sigma2 = [1.]
Note that __repr__()
is slightly different for AutoRegModel
, since the key information is not how many components but rather the AR order. We display the order of the auto-regressive model with an =
sign as shown above instead of showing the number of components in
()
as for OscillatorModel
and StateSpaceModel
.
somata.GeneralSSModel(components='Gen', F=None, Q=None, mu0=None, S0=None, G=None, R=None, y=None, Fs=None)
GeneralSSModel
is a child class of StateSpaceModel
, which means it inherits all the class methods explained above. The same general Gaussian linear dynamic system as before is followed:
GeneralSSModel
is added to somata so that one can perform the most general Gaussian updates for a state-space model without special structures as specified in OscillatorModel
and AutoRegModel
. In other words, with non-sparse structures in the model parameters
F, Q, S0, G, R
. To create a simple general state-space model:
>>> g1 = GeneralSSModel(F=[[1,2],[3,4]])
>>> g1
Gen(1)<2440>
>>> print(g1)
<Gen object at 0x103552440>
nstate = 2 ncomp = 1
nchannel = 0 ntime = 0
nmodel = 1
components = [Gen(0)<2710>]
F .shape = (2, 2) Q .shape = None
mu0.shape = None S0 .shape = None
G .shape = None R .shape = None
y .shape = None Fs = None
We provide a Jupyter notebook including three exercises to help new users to get started with somata
. After going through the exercises, one will become familiar with how the somata
basic models are structured and intended to be used for state-space modeling. Running Kalman filtering, fixed-interval smoothing, and EM algorithm on somata
basic models are also demonstrated in the notebook. The iOsc+ and dOsc oscillator search methods described in the next section are showcased in the notebook as well, which serve as examples of algorithms that can be easily built in somata
.
- Oscillator Model Learning
- Phase Amplitude Coupling Estimation
- Oscillator Search Algorithms
- Switching State-Space Inference
- Multi-channel Oscillator Component Analysis
- State-Space Event Related Potential
- Dynamic Source Localization
For fitting data with oscillator models, it boils down to three steps:
- Initialize an oscillator model object
- Perform state estimation, i.e., E-step
- Update model parameters, i.e., M-step
Given some univariate time series data
, we can fit an oscillator to the data using the expectation-maximization (EM) algorithm.
from somata.basic_models import OscillatorModel as Osc
o1 = Osc(freq=1, Fs=100, y=data) # create an oscillator object instance
_ = [o1.m_estimate(**o1.kalman_filt_smooth(EM=True))for x in range(50)] # 50 EM steps
Quantifying phase amplitude coupling, as described by Soulat et al. 2022, consists of three steps:
- Fitting an oscillator model to compute instantaneous phase and amplitude (either to full-length data or windowed epochs).
- Fitting a constrained regression to the estimated phase (
$\phi_t$ ) and amplitude ($A_t$ ) vectors:
- Note that with
$A_0 = \beta_0$ ,$K^{\text{mod}} = \sqrt{\beta_1^2 + \beta_2^2} / \beta_0$ , and$\phi^{\text{mod}} = \tan^{-1}\left(\beta_2/\beta_1\right)$ , this regression equation is equivalent to:
-
(If there are multiple windows) Smoothing using an AR(p) model with observation noise. Model parameters are first learned through an instance of the EM algorithm initialized by numerical optimization of modified Yule-Walker equations. Kalman smoothing is then applied to
$\boldsymbol{\beta}$ estimates across windows.
Step 1 can be accomplished using Oscillator Model Learning or Oscillator Search Algorithms. We accomplish step 2 through Markov Chain Monte Carlo sampling using CmdStanPy; see pac_model.py. Functions to facilitate step 3 are also implemented in this module and can be called.
When using this module, please cite the following paper:
Soulat, H., Stephen, E. P., Beck, A. M., & Purdon, P. L. (2022). State space methods for phase amplitude coupling analysis. Scientific Reports, 12(1), 15940.
There are two similarly flavored univariate oscillator search methods in this module: iterative oscillator search (iOsc+) and decomposition oscillator search (dOsc) algorithms.
- iOsc+: for a well-commented example script, see IterOsc_example.py.
N.B.: We recommend downsampling to 120 Hz or less, depending on the oscillations present in your data. Highly oversampled data will make it more difficult to identify oscillatory components, increase the computational time, and could also introduce high frequency noise.
One major goal of this method was to produce an algorithm that requires minimal user intervention, if any. This algorithm is designed to fit well automatically in most situations, but there will still be some data sets where it does not fit well without intervention. We recommend starting with the algorithm as is, but in the case of poor fitting, we suggest the following modifications:
-
If the model does not choose the correct number of oscillations, we recommend looking at all fitted models and selecting the best fitting model based on other selection criteria or using your best judgement. You can also choose a subset of well-fitted oscillations and run
kalman_filt_smooth()
to estimate oscillations using those fitted parameters. -
This algorithm assumes stationary parameters, and therefore a stationary signal. Although the Kalman smoothing allows the model to work with some time-varying signal, the success of the method depends on the strength and duration of the signal components. The weaker and more brief the time-varying component is, the more poorly the model will capture it, if at all. We recommend decreasing the length of your window until you have a more stationary signal.
When using the original iOsc algorithm, please cite the following paper:
Beck, A. M., He, M., Gutierrez, R. G., & Purdon, P. L. (2022). An iterative search algorithm to identify oscillatory dynamics in neurophysiological time series. bioRxiv, 2022-10.
- dOsc: for a well-commented example script, see DecOsc_example.py.
When using this module, please cite the following paper:
He, M., Das, P., Hotan, G., & Purdon, P. L. (2023). Switching state-space modeling of neural signal dynamics. PLOS Computational Biology, 19(8), e1011395.
Mingjian He, Proloy Das, Ran Liu, Amanda Beck, Patrick Purdon
Use different citation styles at: https://doi.org/10.5281/zenodo.7242130
SOMATA is licensed under the BSD 3-Clause Clear license.
Copyright © 2024. All rights reserved.