phenom: A hierarchical non-parametric microbial phenotype model
This software has been run on macOS and linux, specifically:
- ubuntu 16.04
- osX 10.14.1
- note, special steps may be needed to install pystan on osX operating systems. See the pystan wiki for more details.
- python (2.7 or 3.6)
- major dependencies:
- pystan
- numpy
- patsy
- matplotlib
- GPy (for running examples)
Download the phenom repository. It is recommended to use a python virtual environment for installation. Run the install (this should take under five minutes):
python setup.py install
A complete example of phenom
usage is provided in the notebooks
folder as example.ipynb
. Details of the library are outlined below.
The core interface to phenom models is through the phenotype
object. phenotype
's requires two major components:
- A dataset
- and a design The following sections outline these components in detail.
A dataset consists of raw growth data and associated metadata for each well in the dataset. phenom expects both of these to be provided as pandas dataframes when creating a new dataset:
# load data and meta as pandas dataframes here
from phenom.dataset import DataSet
ds = DataSet(data, meta)
The shapes between raw growth data and metadata must match. A data
file with have NxK
data points of N
timepoints and K
individual
growth curves. The metadata will then be shape KxM
with M
metadata elements.
A useful data storage pattern is to save the data and metadata as csv files in the same directory:
path/
to/
folder/
data.csv
meta.csv
phenom then provides a convience function to load these files together for a dataset (note that data.csv and meta.csv are the assumed filenames):
ds = DataSet.fromDirectory("path/to/folder")
You can see examples of the data.csv and meta.csv files by running the processing step for raw growth data:
make data
and viewing the generated files in the data
directory.
phenom models require a design specifying the relationship between metadata and the latent functions to be estimated. Currently, design construction is supported through the use of patsy formulas to convert metadata into a design matrix.
For example, to create a design for data with the following metadata:
strain | condition |
---|---|
parent | standard |
mutant | standard |
parent | stress |
mutant | stress |
A design can be constructed as:
from phenom.design import Formula
treatment = Formula(meta, 'C(strain) + C(condition) + C(strain):C(condition)')
C(strain)
and C(condition)
specify categorical variables, and C(strain):C(condition)
specifies an interaction between strain and condition effects. For more details on equation formatting see the patsy docs.
The output of this design (treatment.frame
) is:
mean | strain=mutant | condition=stress |
---|---|---|
1 | 0 | 0 |
1 | 1 | 0 |
1 | 0 | 1 |
1 | 1 | 1 |
patsy supports the use of compositional operations for combining designs. These composition operations are:
- addition (
d1 + d2
): corresponds to concatenating the columns of two designs - multiplication (
d1 * d2
): corresponds to the kronecker product of design columns. useful for repeating a design at multiple hierarchical levels (see below)
To model batch effects, consider metadata of the form
strain | condition | batch |
---|---|---|
parent | standard | 1 |
mutant | standard | 1 |
parent | stress | 1 |
mutant | stress | 1 |
parent | standard | 2 |
mutant | standard | 2 |
parent | stress | 2 |
mutant | stress | 2 |
We combine the treatment design described above with a design corresponding to batch effects to make our complete design:
# this is the design to be modeled both at the global and batch level
treatment = Formula(meta, 'C(strain) + C(condition)')
# base phenotype common to all observations, design is a column of 1's
base = Formula(meta, '1')
# batch effects
# the '+0' in the formula is necessary to prevent patsy from creating an un-desired intercept column
batch = Formula(meta, 'C(batch) + 0')
# hierarchy is a combination of global phenotype and batch effects
hierarchy = base + batch
# the full design replicates treatment design across hierarchy
design = treatment * hierarchy
This project is covered under the Apache 2.0 License