chapter-sima.tex

\acresetall
\chapter[SIMA Analysis Software]{SIMA: Python software for analysis of dynamic fluorescence imaging data\footnote{This work has been previously published \citep{Kaifosh2014} and is joint work with the coauthors. The open source SIMA project is the joint work of many contributors.}}
\label{ch:sima}

When we began this project, \emph{in vivo} Ca\super{2+} was a relatively new technique in neuroscience, and more relevantly, there were no publicly available tools to aid in the processing and analysis of the large amounts of data generated by these experiments.
To help fill this void and progress our own research, we developed a collection of Python tools to help with the analysis steps common to all Ca\super{2+} experiments, namely: motion correction, ROI segmentation, registering ROIs across sessions, and signal extraction.
We released all of the code for both the SIMA package and the ROI Buddy GUI under a free license, and continue to maintain the collaborative project on GitHub: \url{https://github.com/losonczylab/}. 
We designed SIMA with the goal for it to work with many different data formats, be easily extensible with new features, and be as simple to work with as possible.

In the sections below I will go through the main features and overview from our paper \citep{Kaifosh2014}, with slightly more emphasis on the areas I personally contributed to the most: ROI registration and signal extraction.
Extensive documentation for the SIMA package and ROI Buddy GUI
is available at \url{http://www.losonczylab.org/sima/}.
Software and source code can be downloaded from the Python Package Index: 
\url{https://pypi.python.org/pypi/sima}.
The source code repository is maintained on GitHub:
\url{https://github.com/losonczylab/sima}.

\section{Abstract}

Fluorescence imaging is a powerful method for monitoring dynamic signals in the nervous system.
However, analysis of dynamic fluorescence imaging data remains burdensome, in part due to the shortage
of available software tools.
To address this need, we have developed SIMA, an open source Python package 
that facilitates common analysis tasks related to fluorescence imaging.
Functionality of this package includes correction of motion artifacts occurring during 
\textit{in vivo} imaging with laser-scanning microscopy,
segmentation of imaged fields into regions of interest (ROIs),
and extraction of signals from the segmented ROIs.
We have also developed a graphical user interface (GUI) for manual editing of the 
automatically segmented ROIs
and automated registration of ROIs across multiple imaging datasets.
This software has been designed with flexibility in mind 
to allow for future extension with different analysis methods
and potential integration with other packages.
Software, documentation, and source code for the SIMA package and ROI Buddy GUI
are freely available at \url{http://www.losonczylab.org/sima/}.

\section{Introduction}

Two-photon fluorescence imaging of neuronal populations has proven to be a powerful method for studying dynamic signals in neural circuits.
For example, imaging of genetically-encoded fluorescent Ca\super{2+} indicators \citep{Looger2012} has been widely applied to simultaneously monitor the activity in large populations of spatially, morphologically, or genetically identified neurons.
These methods can be implemented \textit{in vivo} in awake rodents \citep{Dombeck2007, Komiyama2010,Lovett-Barron2014}, 
providing the potential to study the molecular, anatomical, and functional properties of neurons responsible for behavior \citep{Kerr2008, OConnor2010}.
Relative to the electrophysiological approaches traditionally used to study neuronal activity \textit{in vivo}, 
two-photon imaging provides the advantages of recording activity in entire local populations without spike-sorting ambiguities or bias towards highly active neurons,
imaging activity in subcellular compartments such as axons or dendrites, 
and tracking the same neurons across experiments spanning multiple days.
Additionally, fluorescence imaging can be used to measure other signals, such as membrane potentials and neurotransmitter release \citep{Looger2012}.


To facilitate the analysis of data from dynamic fluorescence imaging experiments,
we have developed two software tools:
the Sequential IMaging Analysis (SIMA) Python package,
and the ROI Buddy graphical user interface (GUI).
The SIMA package can be used for motion correction, automated segmentation,
and signal extraction from fluorescence imaging datasets.
The ROI Buddy GUI allows for editing and annotating ROIs
within a given imaging session, as well as registering ROIs across imaging sessions acquired at different times.
The output data resulting from analysis with SIMA can either be directly
analyzed using the NumPy/SciPy tools for scientific computing 
\citep{Oliphant2007, Jones2001}, or can be exported to common formats
allowing for subsequent analysis with other software.
The SIMA package and ROI Buddy GUI can be run on Linux, Windows, and MacOS
operating systems, have been made freely available under an open source
license, and require only other freely available open source software.

I provide here a brief overview of the SIMA package and ROI Buddy GUI.
Section \ref{sec:sima:functionality} explains the capabilities of these software
tools and how they can be used.
Section \ref{sec:sima:details} explains details of the algorithms that have
been implemented to provide this functionality.
% Finally, 
% Section \ref{sec:sima:discussion} compares this software with other
% available resources and discusses potential future developments.


\section{Functionality}
\label{sec:sima:functionality}
The SIMA package and ROI Buddy GUI provide a variety of functionality outlined in \autoref{fig:sima:workflow}.
To give an overview of this functionality, we provide sample code for typical use
in the case in which the raw imaging data is contained in two NumPy arrays
named \verb|channel_A| and \verb|channel_B|,
(other possibilities for input data formats are discussed in \autoref{sec:sima:inputs}),
and in which the output data is to be stored in the location 
\begin{verb}'/save/path.sima'\end{verb}.
Throughout our code examples, we assume that the SIMA package has been imported
with the \verb|import sima| Python command.

\begin{figure}[ht]
\centering
\includegraphics[width=0.7\textwidth]{sima/sima-fig1.pdf}
\caption[Workflow supported by SIMA]{
 Workflow supported by SIMA. 
 (1) An ImagingDataset object is first created either directly from the raw data
 or from the output of the motion correction algorithm.
 (2) ROIs are generated by automatic segmentation.
 (3) The ROI Buddy GUI can be used to edit the automatically generated ROIs and to
 automatically register ROIs across multiple datasets.
 (4) Dynamic fluorescence signals are extracted from the imaging data and ROIs.}
 \label{fig:sima:workflow}
\end{figure}

With just a few lines of code, the user can correct motion artifacts in the data,
and then segment the resulting \verb|ImagingDataset| object to identify 
ROIs:
\begin{verbatim}
dataset = sima.motion.hmm([[channel_A, channel_B]], '/save/path.sima')
dataset.segment()
\end{verbatim}
If the data lack motion artifacts (e.g.\ fluorescence imaging in \textit{ex vivo} brain slices), 
the motion correction step can be replaced with direct initialization of an
\verb|ImagingDataset| object.
The full set of commands in this case is an follows:
\begin{verbatim}
dataset = sima.ImagingDataset([[channel_A, channel_B]], '/save/path.sima')
dataset.segment()
\end{verbatim}
In either case, the result of these commands is an \verb|ImagingDataset| object
containing the raw or motion-corrected imaging data and the automatically
generated ROIs.
This object is permanently stored in the location \verb|/save/path.sima|
so that it can be reloaded at a later time.

Following automated segmentation, the generated ROIs can be manually edited with the ROI Buddy graphical user interface (GUI).
This GUI can be used to delete erroneous ROIs, add missing ROIs, merge ROIs that have been incorrectly split, and adjust the shapes and positions of existing ROIs.
The ROI Buddy GUI can also be used to register ROIs across multiple datasets acquired at different times, 
allowing for assessment of long-term changes in neural activity.

Once the ROIs have been edited and registered,
the \verb|ImagingDataset| object can be loaded in Python again,
and then dynamic fluorescence signals can be extracted from the ROIs as follows:
\begin{verbatim}
dataset = sima.ImagingDataset.load('/save/path.sima')
dataset.extract()
\end{verbatim}
The extracted signals are permanently saved with the \verb|ImagingDataset| object and can be accessed at any time with the command \verb|dataset.signals()|.
For further analysis with external software, the signals can be exported using the command \verb|dataset.export_signals('/export/path.csv')|.

The remainder of this section contains more detailed discussion of each of the 
stages of this workflow.
This discussion complements the API documentation that is available online at the project's website: \url{http://www.losonczylab.org/sima}.

\subsection{Object classes and input formats}
\label{sec:sima:inputs}
The SIMA package follows an object-oriented design.
The central object class around which the package is structured is the \verb|ImagingDataset|.
Objects of this class can be created either by direct initialization
or as the output of the motion correction function call.
Direct initialization of an \verb|ImagingDataset| object requires two mandatory arguments:
(1) the raw imaging data formatted according to the requirements discussed below, 
and (2) the path where the \verb|ImagingDataset| object is to be saved.
Names for the channels may be specified as an optional argument.
Once created, \verb|ImagingDataset| objects are automatically saved to the
designated location and can be loaded at a later time with
a call to the \verb|ImagingDataset.load| method.

A single \verb|ImagingDataset| object can contain imaging data from multiple
simultaneously recorded optical channels, as well as from multiple \textit{cycles} 
(i.e. continuous imaging epochs/trials) acquired at the same imaging location
during the same imaging session.
To allow for this flexibility, the raw imaging data used to initialize the
\verb|ImagingDataset| object must be packaged into a list of lists,
whose first index runs over the cycles and whose second index runs over the
channels.
For example, if the raw data is stored in an object called \verb|data|,
then the element \verb|data[i][j]| corresponds to the \verb|j|th channel of the
\verb|i|th cycle.

The formating requirements for each such element of the aforementioned list of lists
are designed to allow for flexible use of SIMA with
a variety of data formats.
The sole requirement is that each element be specified as a Python iterable
object satisfying the following properties: 
(1) the iterable may not be its own iterator, i.e. it should be able to spawn
multiple iterators that can be iterated over independently;
(2) each iterator spawned from the iterable must yield image frames in the form
of two-dimensional NumPy arrays;
and (3) the iterable must survive Python's pickling and unpickling methods
for saving and loading objects.

A simple example of an object that satisfies these requirements is a three-dimensional NumPy array,
with the first index corresponding to the frame, the second to the row, and the third to the column. 
Therefore, data in any format can be analyzed with SIMA following conversion
to a NumPy array.
We also implemented the \verb|sima.iterables.MultiPageTIFF| object class
for creating SIMA-compatible iterables from multi-page TIFF files,
and the \verb|sima.iterables.HDF5| object class for creating iterables from HDF5 files.
For example, a two-channel dataset can be initialized from TIFF files as follows:
\begin{verbatim}
iterables = [[sima.iterables.MultiPageTIFF('channel1.tif'),
              sima.iterables.MultiPageTIFF('channel2.tif')]]
dataset = sima.ImagingDataset(iterables, '/save/path.sima',
                              channel_names=['GCaMP', 'tdTomato'])
\end{verbatim}
Compared to converting data from TIFF or HDF5 files to NumPy arrays,
use of these custom iterables is advantageous because there is no need to
duplicate the data for separate storage in a second format.
Furthermore, less data need be held in memory at any one time because the
\verb|MultiPageTIFF| or  \verb|HDF5| iterables allow for imaging data
to be loaded one frame at a time on an as-needed basis.

Importantly, the SIMA package has been designed to allow for flexible extension
with additional custom iterable classes analogous to the \verb|MultiPageTIFF| class.
Such extensions can be developed to allow SIMA to use data from any required
input format.
Therefore, users wishing to use SIMA with other data formats have two options:
(1) to convert the data to a format already supported such as a TIFF stack or NumPy array,
or (2) to extend SIMA by creating a new iterable type to support the desired
data format.

\subsection{Motion correction}
During \textit{in vivo} laser-scanning microscopy, the animal's movements cause 
time-dependent displacements of the imaged brain region relative to the microscope and 
thus introduce substantial artifacts into the imaging data.
These artifacts are especially problematic when attempting to extract transient fluorescence 
signals from very small structures, such as dendritic branches and synaptic boutons \citep[e.g.][]{Kaifosh2013}.
Since individual pixels are acquired at different times during laser scanning microscopy,
motion artifacts can occur within a single frame and cannot be corrected by simple frame 
alignment methods.
To allow for correction of these within-frame motion artifacts, 
the SIMA package includes line-by-line motion correction software (\autoref{fig:sima:motion})
that we developed \citep{Kaifosh2013}
by extending upon the hidden Markov model (HMM) approach used previously \citep{Dombeck2007}.

\begin{figure}[hb!]
\centering
\includegraphics[width=0.7\textwidth]{sima/motion-correction.pdf}
\caption[Line-by-line correction of within-frame motion artifacts]{
 	Line-by-line correction of within-frame motion artifacts.
 	\textbf{(A)} Schematic diagram showing a single imaging frame before (left) and after (right)  line-by-line motion correction. 
	A separate displacement is calculated for each sequentially acquired line from the laser scanning process. As a result, some pixel locations may be accounted for multiple times (darker blue), while others may not be imaged in a given frame (white gap).
	\textbf{(B)} Overlay of different regions imaged by different frames due to motion. The light gray region indicates the maximum frame-size that can be selected for the motion correction output, such that all pixels locations that were ever imaged are within the frame. The dark gray region indicates the default and minimum frame-size that can be selected for the motion correction output, such that all pixels locations within the frame are within the field of view at all times.
 	}
 	\label{fig:sima:motion}
\end{figure}

A call to the hidden Markov model motion correction function \verb|sima.motion.hmm| returns a
motion-corrected \verb|ImagingDataset| object.
This function takes the same arguments used to directly initialize an
\verb|ImagingDataset| object, as well as additional arguments for specifying
parameters for the motion correction procedure.
One optional argument allows for specification of the number of states retained 
at each step of the Viterbi algorithm.
Retaining a larger number of states may in some cases result in more accurate
displacement estimates, though at the expense of longer run-times.
The maximum allowable displacement in the horizontal and vertical directions 
can also be specified. 
Use of this restriction can improve the quality of the estimated displacements
by ruling out unreasonably large estimates.
Optionally, a subset of the channels can be selected for use in estimating
the displacements, which will then be used to correct artifacts in all channels.
This option is useful in cases where there is a sparse or highly dynamic channel with
signals of interest, and an additional static channel providing a stable
reference for motion correction.

Once the motion artifacts are corrected, the frames of the resulting \verb|ImagingDataset| show static
imaged structures, but a field of view that moves from frame to frame (\autoref{fig:sima:motion}B).
Typically, a frame size larger than that of the original images is required to display
the full spatial extent that was imaged during the session.
Relatedly, the area imaged during all frames is smaller than that of the original
images.
To determine the spatial extent of the corrected image series that will be
retained for further analysis, the \verb|hmm| function takes an additional
optional argument, 
the \verb|trim_criterion|, which specifies the fraction of frames for which
a location must be within the field of view in order to be retained for further
analysis.
By default, the edges of the corrected images are conservatively trimmed to retain
only the rectangular region that remains within the field of view during all imaging frames.

\subsection{Segmentation and ROIs}
\label{sec:sima:ROIs}
The SIMA package allows for automated segmentation of the field of view with a call to the \verb|ImagingDataset.segment|
method.
The \verb|segment| method takes arguments that allow for specification of the approach
to be used and an optional label for the resulting set of ROIs, which are saved
with the \verb|ImagingDataset|.
Arguments specific to the particular method can also be passed into this
method call.
The SIMA package currently contains two implemented segmentation methods,
\verb|'normcut'| and \verb|'ca1pc'|, 
both of which are based on the normalized cuts approach \citep{Shi2000}.

A call to the \verb|segment| method returns an \verb|ROIList| object,
which contains the segmented \verb|ROI| objects.
As well, \verb|ROI| objects can be initialized independently in one of four ways: 
(1) with a mask, typically a NumPy array, indicating the weight of each pixel (see \autoref{sec:sima:details:extraction}),
(2) with a list of polygons, each consisting of a list of vertices,
(3) using ROI Buddy (see \autoref{sec:sima:ROIbuddy}), or
(4) by importing a set of ROIs created in ImageJ \citep{Schneider2012}.
Masks can either be binary, to select a subset of pixels, or real-valued, as in the case of
weights resulting from principal or independent component analysis.
Polygons are treated equivalently to binary masks.
ROIs typically consist of a single polygon, however multiple polygons are useful
for marking structures that leave and re-enter the imaging plane.

Additionally \verb|ROI| objects have the following
optional attributes: \verb|id|, \verb|label|, and \verb|tags|.
The \verb|label| attribute is a descriptor for the \verb|ROI| used for
referencing the region within one imaging session.
The \verb|id| of an \verb|ROI| object is an identifier used to track the region over
multiple imaging sessions, such that two \verb|ROI| objects from different experiments
that have the same \verb|id| are understood to correspond to the same neuron/dendrite/bouton.
The \verb|id| values are automatically set during ROI registration with the ROI Buddy GUI.
The \verb|tags| attribute is a set of strings associated with the \verb|ROI|, used for
sorting or marking the ROIs based on morphological, genetic, or other criteria.
These \verb|tags| can also be modified from within the ROI Buddy GUI or during analysis of fluorescence signals
to aid in the selection and sorting of ROIs during subsequent analysis.

\subsection{Manual ROI Editing}
\label{sec:sima:ROIbuddy}
The ROI Buddy GUI can be used to view and edit the automated segmentation results
or to manually draw new ROIs.
When the user loads an \verb|ImagingDataset| object, the time-averaged images are displayed as a static background on which \verb|ROI|
objects are displayed.
The underlying static image can be toggled between each of the imaged channels,
and optionally a contrast-enhanced ``processed'' image can be displayed.
Each \verb|ROI| object, consisting of one or more polygons, is displayed with a unique color over this background.
If multiple \verb|ROIList| objects are associated with an \verb|ImagingDataset| 
(automatically generated and manually edited sets, for example), 
the active set is selectable via a drop-down menu.
The user can also toggle between simultaneously loaded \verb|ImagingDataset| objects, 
which is useful for rapidly switching between multiple imaging sessions of the same field of view
in order to verify the ROIs during editing.

% \begin{figure}
% \includegraphics[width=\textwidth]{/lab-admin/Presentations/figures/segmentation/ROI_Buddy_Figure.pdf}
% \caption[The ROI Buddy graphical user interface]{\label{fig:gui}
%  The ROI Buddy graphical user interface.
%  \textbf{(A)} Image viewing panel with ROI editing tools. During typical use this
%  			  panel is expanded to occupy the majority of the screen.
%  \textbf{(B)} Panel for toggling between ``Edit'' and ``Align'' modes,
%  			  loading imaging datasets, and registering ROIs across datasets.
%  \textbf{(C)} Panel for selecting, creating, saving, and deleting sets of ROIs associated
%  			  with the active imaging dataset. In ``Align'' mode, ROIs from all loaded
%  			  datasets can be viewed simultaneously.
%  \textbf{(D)} List of ROIs in the currently selected set, and
%  			  tools for tagging, merging, unmerging, and re-coloring ROIs.
%  \textbf{(E)} Contrast adjustment for the underlying base image.
%  \textbf{(F)} Panel for selection of the underlying base image.
%  }
% \end{figure}

Once the \verb|ImagingDataset| and \verb|ROI| objects are loaded in the GUI,
the user can edit, delete, and add new ROIs as polygons while in the GUI's ``Edit'' mode.
All ROIs are directly editable, allowing for the user to adjust individual vertices or translate the entire ROI.
In addition, separate polygons can be merged either into a single
multiple-polygon \verb|ROI| or, if the polygons are overlapping, into a single polygon \verb|ROI|.
The interface also allows the user to directly set the \verb|label| and \verb|tags|
properties of each \verb|ROI| described in \autoref{sec:sima:ROIs}.


\subsection{ROI Registration}
To track the same structures over multiple imaging sessions of the same field of view (\autoref{fig:sima:registration}),
the ROI Buddy GUI also supports the registration of ROIs from different \verb|ImagingDataset| objects.
In the GUI's ``Align'' mode, affine transformations are estimated to align the time-averaged images
of the currently active \verb|ImagingDataset| with each of the other loaded sets.
These transformations are then applied to the respective \verb|ROI| objects to transform them all into the space of the active \verb|ImagingDataset| (\autoref{fig:sima:registration}C).
This allows ROIs to be imported from one set on to the active \verb|ImagingDataset| or for
all of the ROIs to be viewed simultaneously over the time-averaged image of a single \verb|ImagingDataset|.
The ROIs are then automatically identified across imaging datasets based on their degree of overlap
following transformation.  The \verb|id| attributes of co-registered \verb|ROI| objects are set to be equal,
thus allowing for tracking of the same regions over multiple imaging sessions.

When displayed in the GUI, 
co-registered \verb|ROI| objects are colored identically for easy visual 
inspection of the registration results (\autoref{fig:sima:registration}D).
Groups of co-registered ROIs can be manually modified by removing and adding \verb|ROI| objects
to correct any errors in the automated registration.
The \verb|tags| can also be propagated across co-registered ROIs from different \verb|ImagingDataset| objects.

\begin{figure}[]
	\centering
	\includegraphics[width=0.9\textwidth]{sima/roi_registration-paper2.pdf}
	\caption[Registration of ROIs across imaging sessions acquired on two different days]{Registration of ROIs across imaging sessions acquired on two different days.
	\textbf{(A)} ROIs (red) and time-averaged image for the first imaging session.
	\textbf{(B)} ROIs (green) and time-averaged image for the second imaging session, with ROIs for the first imaging session (red) shown for comparison.
	\textbf{(C)} Same as \textbf{(B)} but with an affine transformation applied to align the time-averaged image and ROIs from day 2 to those of day 1.
	\textbf{(D)} Same as \textbf{(C)} but with the ROIs colored by their automatically determined shared identities across both imaging sessions.}
	\label{fig:sima:registration}
\end{figure}

\subsection{Signal extraction}
Signal extraction is accomplished by the \verb|ImagingDataset.extract| method.
This \verb|extract| method can take several optional arguments.
The \verb|ROIList| to be used can be specified in cases where there are multiple
\verb|ROIList| objects (e.g.\ one that has an automatically generated and another
that has been manually edited) associated with the \verb|ImagingDataset|.
If multiple optical channels are present, the channel to be used for extraction can be
specified.
If the ROIs are either polygons or a binary masks, 
the \verb|extract| method can optionally exclude pixels that overlap between ROIs in 
order to reduce artifactual correlations between adjacent ROIs.

The output of the \verb|extract| method is a Python dictionary, which is
also automatically saved as part of the \verb|ImagingDataset| object.
This dictionary contains
(1) the raw extracted signals, 
(2) a time-averaged image of the extracted channel,
(3) a list of the overlapping pixels,
(4) a record of which \verb|ROIList| and channel were used for extraction, and
(5) a timestamp.
Additionally, a verification image is saved as a PDF file showing the extracted ROIs
and overlapping pixels overlaid on the time-averaged image.
Once the signals are extracted, they can be accessed at any time with a call to the
\verb|ImagingDataset.signals| method.

\subsection{Exporting data}
The SIMA package is intended to provide support for early stages of data analysis,
such that subsequent analysis of the extracted signals can be performed with separate software.
In cases where all analysis is performed using Python, no exporting is necessary,
since the SIMA objects can be used in conjunction with other Python code.
In other cases, data from SIMA objects can be easily exported into standard formats,
including TIFF images and CSV text files.

Such exporting of data can be performed at various stages of data processing
with the SIMA package.
For example, those wishing to use SIMA solely for motion correction can export the
motion-corrected time series with a call to the \verb|ImagingDataset.export_frames| method.
This method takes as its argument the filenames with which the exported data
will be saved,
formatted as a list of lists of strings organized similarly to the input data
(see \autoref{sec:sima:inputs}).
Additional optional arguments can be used to specify the output file format,
whether to scale the intensity values to the full range allowed by the output file format,
and whether to fill in unobserved rows (\autoref{fig:sima:motion}A) of motion corrected images with values
from adjacent frames.
Time-averaged images can similarly be exported with the \verb|ImagingDataset.export_averages|
method.

If SIMA is also used for signal extraction, then the extracted signals can be
exported to a CSV file with the \verb|ImagingDataset.export_signals| method.
The resulting CSV file contains the \verb|id|, \verb|label|, and \verb|tags| for each
ROI, and the extracted signal from each ROI at each frame time.


\section{Software details}
\label{sec:sima:details}

% \subsection{Motion correction}

% We have previously described the HMM formulation and
% parameter estimation procedures that we have implemented for correction of within-plane
% motion during laser scanning microscopy \citep{Kaifosh2013}.
% Here we provide some additional details about the software implementation.


% \subsubsection*{Viterbi-based algorithm}\label{sec:sima:viterbi}
% The Viterbi algorithm computes the maximum \textit{a posteriori} sequence of states for a HMM.
% For a general HMM with $S$ hidden states and $T$ timesteps, the Viterbi algorithm has time complexity $O(S^2T)$.
% When used for motion correction, the hidden states are the possible displacements,
% with one state per pair of $x$ and $y$ integer displacements in pixel units.
% By restricting state transitions to those between nearest neighbors in two dimensions, 
% we reduce the complexity of the algorithm implemented in SIMA to $O(ST)$.
% This restriction is justified by the same assumption 
% -- that negligible motion occurs during the time required to image a row --
% by which we justify applying the same displacement to all pixels in the same row.
% Some of the datasets from our laboratory exhibit substantial displacements 
% in two dimensions, resulting in the number of states $S$ being rather large; 
% however, at any one time-step, the probability is typically concentrated in a much smaller number of states.
% Our software exploits this concentration of probability by retaining only the $N\ll S$ most probable states at each time-step.
% This approximation of the Viterbi algorithm reduces the computational complexity to $O(NT)$.

% Further increases in speed have been achieved by storing precomputed results for a number 
% of transformations applied to the reference image and the image being aligned.
% These transformations include scaling by the estimated gain factor (see \citep{Kaifosh2013}) to convert intensity values to estimated photon counts, 
% and computation of the logarithm and gamma functions applied to these scaled values.
% Repeated computations have also been avoided by using lookup tables for the indices
% of overlapping pixels between the image and the reference frame,
% for the possible transitions between hidden states,
% and for the probabilities of the transitions.


% \subsection{Segmentation}\label{sec:sima:details:segment}

% Although SIMA is designed to be extended to allow for multiple approaches to segmentation,
% the initial release includes only two segmentation methods, 
% both using the normalized cuts approach \citep{Shi2000}.
% Specifically, we have implemented a basic normalized cuts segmentation (\verb|'normcut'|),
% as well as a variant designed for segmentation of  pyramidal cell nuclei in hippocampal area CA1 (\verb|'ca1pc'|).
% Here, we describe first how we use the normalized cuts approach to partition the
% field of view, and then how, in the case of the \verb|'ca1pc'| variant, these regions
% are post-processed to create ROIs.

% \subsubsection{Normalized cut formation}
% The normalized cut segmentation algorithm \citep{Shi2000} partitions the imaged field
% of view through an iterative process.
% At each stage, a subset of the image pixels is split into two new subsets in such 
% a way as to minimize a penalty that depends on a set of connection weights between the pixels.
% The resulting normalized cuts are uniquely determined by two factors: 
% (1) the connection weights between pixels, and (2)
% the termination criterion for the iterative splitting procedure.

% For the standard normalized cuts procedure implemented in SIMA,
% the weight $w_{ij}$ between each pair of pixels $i$ and $j$ is calculated as follows:
% \begin{equation}\label{eq:weights}
%   w_{ij} = e^{k_cc_{ij}} \cdot
%   \begin{cases}
%    e^{-\frac{||\mathbf x_i- \mathbf x_j||^2}{\sigma_{\mathbf x}^2}} 
%       & \text{if }||\mathbf x_i - \mathbf x_j|| < r\\
%     0 & \text{otherwise}
%   \end{cases},
% \end{equation}
% where $c_{ij}$ is an estimate of the correlation between the pixels' intensity signals,
% $||\mathbf x_i - \mathbf x_j||$ is the Euclidean distance between the
% positions $\mathbf x_i, \mathbf x_j$ of the pixels,
% and $\sigma_{\mathbf x}^2$ specifies the decay of weights with distance 
% up to a maximum distance $r$. 
% We set the parameter $k_c=9$ based on empirical observations of segmentation accuracy.

% For the \verb|'ca1pc'| variant, we use a different set of weights $w_{ij}^{\text{CA1PC}}$,
% which are calculated by multiplying the weights $w_{ij}$ from Equation \eqref{eq:weights}
% by a factor depending on the maximum pixel intensity along a line connecting the two pixels.
% Specifically, the modified weights are defined as
% \begin{equation}
%  w_{ij}^{\text{CA1PC}} = 
%  w_{ij}\cdot
%  \exp\left(-k_I\max_{s\in[0,1]} I_{\text{avg}}^*\left((1-s)\mathbf x_i + s\mathbf x_j\right)\right),
% \end{equation}
% where $I_{\text{avg}}^*(\mathbf x)$ is the intensity at location $\mathbf x$ 
% of the time-averaged image, processed with Contrast Limited Adaptive Histogram Equalization (CLAHE) and an unsharp mask
% in order to correct intensity inhomogeneities and enhance the contrast
% (Figure \ref{fig:segmentation}B).
% Based on empirical observations of segmentation accuracy,
% we set $k_I = 3 / (\max I_{\text{avg}}^* - \min I_{\text{avg}}^*)$,
% with the maximum and minimum taken over the entire image.
% The effect of this modification is to increase the weights between two pixels 
% within the same low-intensity pyramidal cell nucleus relative to the weights between other
% pixels.

% \begin{figure}
% \includegraphics[width=\textwidth]{/lab-admin/Presentations/figures/sima/processing_steps.pdf}
%  \cprotect\caption[Segmentation steps for identifying pyramidal cell nuclei with the \verb|'ca1pc'| variant of the normalized cuts segmentation approach]{\label{fig:segmentation}
%   Segmentation steps for identifying pyramidal cell nuclei with the \verb|'ca1pc'| variant of the normalized cuts segmentation approach.
%   \textbf{(A)} The time-averaged image of the time-series to be segmented.
%   \textbf{(B)} Application of CLAHE and unsharp mask image processing to \textbf{(A)}.
%   \textbf{(C)} Disjoint regions identified by iterative partitioning with the normalized cuts algorithm.
%   \textbf{(D)} Local Otsu thresholding of each region in \textbf{(C)}.
%   \textbf{(E)} Cleanup of the Otsu thresholded regions in \textbf{(D)} with opening and closing binary morphology operations.
%   \textbf{(F)} Resulting ROIs after rejection of regions in \textbf{(E)} that failed
%                to satisfy minimum size and circularity requirements.
%  }
% \end{figure}

% The termination criterion for the iterative partitioning of the image
% depends on the number of pixels in the region and
% the normalized cut penalty for the next potential partitioning. 
% Specifically, partitions containing fewer than a minimum number of pixels (\verb|cut_min_size|)
% do not undergo further partitioning,
% whereas partitions with greater than a maximum of pixels (\verb|cut_max_size|) always 
% undergo further partitioning.
% For partitions with an intermediate number of pixels, further partitioning
% occurs only if the penalty associated with the partitioning would be below
% a given threshold.
% For populations of uniformly sized neurons, such as those in the pyramidal 
% layer of CA1, suitable termination is achieved when the values for \verb|cut_max_size| and \verb|cut_min_size| are chosen
% as upper and lower bounds on the typical cell size.
% An example set of partitions obtained with the \verb|'ca1pc'| variant
% is shown in Figure \ref{fig:segmentation}C.

% \subsubsection{Post-processing of partitions}
% In contrast to the basic \verb|'normcut'| segmentation method,
% which simply returns the partitions as the ROIs,
% the \verb|'ca1pc'| variant applies a series of post-processing
% steps to these partitions to isolate the darker pixels corresponding
% to the putative CA1 pyramidal cell nuclei.
% First a threshold is calculated for each partition based on Otsu's method 
% \citep{Otsu1979} for 
% cluster-based thresholding, allowing for the rough separation of light and dark pixels
% (Figure \ref{fig:segmentation}D).
% Following this step a series of morphological operations,
% consisting of a binary opening followed by a binary closing,
% are applied to each identified region to regularize the ROI shapes by
% filling in gaps and smoothing the borders of each region (Figure \ref{fig:segmentation}E).
% Finally a minimum size and circularity criterion is applied to each region to
% reject small and irregularly shaped regions (Figure \ref{fig:segmentation}F).

% We evaluated this \verb|'ca1pc'| segmentation algorithm on two-photon fluorescence imaging data
% from GCaMP6f-expressing pyramidal cells in hippocampal area CA1 
% (see \citep{Lovett-Barron2014} for methodological details).
% Each of the 37 datasets consisted of 4575 frames of size 128x256 pixels acquired at 7.6 Hz
% with a 40x Nikon immersion objective at optical zoom 2X.
% We ran the segmentation algorithm with the following parameters: 
% \verb|num_pcs| = \verb|50|, 
% \verb|max_dist| = \verb|(3, 6)|,
% \verb|spatial_decay| = \verb|(3, 6)|,
% \verb|cut_max_pen| = \verb|0.10|,
% \verb|cut_min_size| = \verb|50|,
% \verb|cut_max_size| = \verb|150|,
% \verb|x_diameter| = \verb|14|,
% \verb|y_diameter| = \verb|7|,
% \verb|min_roi_size| = \verb|20|,
% \verb|circularity_threhold| = \verb|.5|,
% and \verb|min_cut_size| = \verb|40|.
% We compared the automatically segmented ROIs with manually curated segmentation.
% With a minimum Jaccard index of 0.25 as the criterion for a match between ROIs,
% the automatic segmentation had a false negative rate of 12$\pm$2\% and a false positive rate of
% 20$\pm$5\% (mean $\pm$ standard deviation).


\subsection{ROI Registration}
To estimate affine transformations between pairs of time-averaged images, we used the function
\verb|getAffineTransform| from OpenCV.
Once ROIs are transformed into the same reference space, the ROI Buddy GUI can automatically estimate
the degree of similarity between each pair of ROIs from different \verb|ImagingDataset| objects by calculating the Jaccard index,
defined as the area of the intersection divided by the area of the union.
ROIs are then clustered with the unweighted pair group method with arithmetic mean (UPGMA) hierarchical
clustering algorithm \citep{Sokal1958}, with distances between ROIs given by the reciprocal of the Jaccard index for that pair.
ROI pairs from the same \verb|ImagingDataset| are assigned infinite distance to prevent co-clustering of ROIs from the same imaging session. 
The termination criterion for clustering is set such that pairs of ROIs in a cluster have a minimum Jaccard index of 0.25.
The objects of each cluster are then assigned a common \verb|id| attribute, allowing for identification
of the same region over multiple imaging sessions.


\subsection{Signal extraction}
\label{sec:sima:details:extraction}
In discussing the extraction procedures, 
we use the notation $w_{ip}$ to denote the weighting of the $p$th pixel by the $i$th ROI.
For polygon or binary mask ROIs, 
created with SIMA's automated segmentation or the ROI Buddy GUI, or imported from ImageJ,
$w_{ip}$ is defined as $\frac{1}{N_i}$ for pixels $p$ within the ROI and 0 elsewhere, where $N_i$ is the
number of pixels in the $i$th ROI.

The simplest case for extraction occurs when the same pixel locations are imaged in every frame.
In this case, we calculate the signal by a simple weighting of the normalized fluorescence intensities
from each pixel.
Specifically, the signal of the $ith$ ROI at time $t$ is calculated as
\begin{equation}\label{eq:extraction-basic}
    s_{it} = \sum_p w_{ip}\cdot \frac{f_{pt}}{f_p},
\end{equation}
with $f_{pt}$ denoting the intensity of the $p$th pixel in the frame at time $t$,
and $f_p$ denoting the average intensity across all frames at pixel location $p$.

When extracting signals following correction of within-frame motion artifacts,
the situation is complicated by the fact that not all pixel locations are observed
in each frame.
To derive a method for extracting these signals, we first note that the simple extraction method (\autoref{eq:extraction-basic})
reduces to the least-squares error estimate for a simple linear model in which the 
pixel intensities are related to the underlying ROI signals as follows:
\begin{equation*}
    \frac{f_{pt}-f_p}{f_p} = \sum_i a_{pi} (s_{it} - s_i^*),
\end{equation*}
with the coefficients $a_{pi}$ defined as the entries of the pseudoinverse of the matrix with
entries given by the weights $w_{ip}$,
and with $s_i^*$ set as $\sum_p w_{ip}$.
Given this model, when a subset $P_t$ of the pixels are imaged in the frame taken at time $t$,
the least squares estimate of the signal is given by
\begin{equation*}
    s_{it} = \sum_{p} w_{ipt}\cdot \frac{f_{pt}-f_p}{f_p} + \sum_p w_{ip}.
\end{equation*}
Here, the time-dependent coefficients $w_{ipt}$ are defined as the entries of the pseudo-inverse
of the matrix with entries $a_{pi}$ for all pixels $p$ in $P_t$.

A few special cases are worth mentioning.
For non-overlapping ROIs, this formula reduces to
\begin{equation*}
    s_{it} = \frac{\sum_p w_{ip}^2}{\sum_{p\in P_t} w_{ip}^2}\cdot \sum_{p\in P_t} w_{ip}\frac{f_{pt}-f_p}{f_p} + \sum_p w_{ip}.
\end{equation*}
In cases of binary mask or polygon ROIs, the above formula simplifies to
\begin{equation*}
    s_{it} = \frac{1}{N_{it}}\cdot\sum_{p\in P_{it}}\frac{f_{pt}}{f_p},
\end{equation*}
where $P_{it}$ is the set of pixels in the $i$th ROI that were imaged at time $t$,
and $N_{it}$ the number of pixels in this set.
In cases in which no pixels of a given ROI are imaged in a given frame,
a not-a-number (\verb|numpy.NaN|) value is recorded in place of that ROI's signal at that time.

% \subsubsection{Signal demixing}
% In addition to the raw signal extraction, SIMA also contains the option to separate out
% different fluorophores that are mixed across multiple optical channels.
% In this case, we perform signal extraction not on the raw optical channels,
% but rather on linear combinations of the optical channels.
% To estimate the linear combinations required to isolate the fluorescence from
% individual fluorophores, we perform independent component analysis on the pixel
% intensities of the time-averaged images from each optical channel.


% \subsection{Requirements and Dependencies}
% The SIMA package and ROI Buddy GUI depend only upon freely available open source software.
% In particular, the NumPy and SciPy packages \citep{Oliphant2007,Jones2001} for numerical 
% and scientific computing are relied upon heavily throughout. 
% The extraction functionality uses Matplotlib \citep{Hunter2007} to generate 
% verification images.
% The Shapely Python package is used for geometric calculations relating to polygon ROIs.
% Automated segmentation relies upon Scikit-image \citep{Walt2014} and the Open Source 
% Computer Vision Library (OpenCV), the latter which is also used for ROI registration.
% The ROI Buddy user interface uses guiqwt (http://code.google.com/p/guiqwt/).
% HDF5 files are manipulated with the h5py interface (http://www.h5py.org/).
% These packages are available with a standard scientific Python installation.
% Since the libtiff C library and its Python bindings enable more memory-efficient 
% handling of multi-page TIFF files,
% their installation is strongly recommended if SIMA is to
% be used with large TIFF files containing many frames.

% \section{Discussion and Future Developments}\label{sec:sima:discussion}
% As a freely available open source software package, SIMA provides a variety of
% tools to facilitate common steps of dynamic fluorescence imaging analysis, 
% including correction of motion artifacts,
% segmentation of the field of view into ROIs,
% and extraction of the fluorescence time-series for each ROI.  
% Data can be imported or exported at various stages of processing with SIMA,
% so that the package can be used for all stages of analysis,
% or for any combination of the motion correction, segmentation, and signal extraction.
% The SIMA package can thus be used flexibly in conjunction with other analysis software.
% We have thoroughly documented the SIMA package to facilitate use and 
% future collaborative development of this open source project
% (project hosted on GitHub at https://github.com/losonczylab/sima).

% Some of the functionality contained in the SIMA package complements other existing 
% fluorescence imaging acquisition and
% analysis tools, such as Micro-Manager \citep{Edelstein2010} and ACQ4 \citep{Campagnola2014}.
% The TurboReg plug-in for ImageJ \citep{thevenaz1998pyramid}
% is capable of correcting motion artifacts that produce mis-aligned frames,
% but does not allow for correction of within-frame motion artifacts that occur during
% laser scanning microscopy.
% The normalized cuts approach to segmentation \citep{Shi2000} is a novel technique for the
% segmentation of dynamic fluorescence imaging data and is complementary to existing approaches,
% such as spatio-temporal independent complement analysis \citep{Mukamel2009},
% convolutional sparse block coding \citep{pachitariu2013extracting},
% and other methods implemented in ImageJ or CalTracer (\url{http://www.columbia.edu/cu/biology/faculty/yuste/methods.html}).
% In addition to providing this additional approach to segmentation, 
% we have also created a graphical user interface, ROI Buddy, 
% for manual editing of automatically generated ROIs,
% and for automated registration of ROIs across multiple datasets.
% ImageJ also provides the ability to draw ROIs and extract signals from image timeseries, but lacks the ability to handle missing data.
% Overall, a major advantage of SIMA is the integration of these various processing stages into a 
% single tool-kit, allowing for seamless execution of the early stages of analysis
% of time series laser-scanning microscopy data.

% We plan to extend the SIMA package, hopefully in collaboration with the
% neuroinformatics community, so that future versions have greater functionality.
% A major need is to extend SIMA with additional methods for automated segmentation.
% Since the optimal segmentation approach is dependent on the neural structures recorded,
% the imaging conditions, and the goals of the analysis, we have structured the SIMA module such that
% additional approaches can be easily implemented and applied to \verb|ImagingDataset| objects.
% Integration of other existing segmentation approaches into the SIMA package is an area of active development.

% A second avenue for future development is to generalize the applicability of the
% SIMA package to imaging data acquired by methods other than two-dimensional laser
% scanning microscopy.
% In particular, we are interested in extending SIMA to work with newer technologies
% allowing for three-dimensional imaging within a volume of neural tissue.
% Such technologies include temporal focusing \citep{schrodel2013brain}, 
% light sheet imaging \citep{verveer2007high}, light field imaging \citep{Levoy:2006:LFM:1141911.1141976},
% and resonance scanning in combination with a piezoelectric crystal.
% The extension of our software to these technologies should support their broader application.

% \section*{Supplemental Data}
% Extensive documentation for the SIMA package and ROI Buddy GUI
% is available at \url{http://www.losonczylab.org/sima/}.
% Software and source code can be downloaded from the Python Package Index: 
% \url{https://pypi.python.org/pypi/sima}.
% The source code repository is maintained on GitHub:
% \url{https://github.com/losonczylab/sima}.