-
Notifications
You must be signed in to change notification settings - Fork 10
Command Line Interface for Extraction and Modeling
Besides the Juypter notebooks, MoSeq also offers a series of Command Line Interface (CLI) tools. You can access a short description of the tool by running moseq2-extract --help
, moseq2-pca --help
, moseq2-model --help
, and moseq2-model --help
in the Terminal. You can use --help
to access the available options for a specific command, eg. moseq2-extract extract --help
.
If you are using Conda, and the environment name is moseq2-app
, please run conda activate moseq2-app
to activate the environment. If you are using the Docker container, please make sure your Docker is up and running, and you should see moseq
next to your terminal prompt.
You can verify that all MoSeq modules by running
moseq2-extract --version
# moseq2-extract, version 1.1.2
moseq2-pca --version
# moseq2-pca, version 1.1.3
moseq2-model --version
# moseq2-model, version 1.1.2
moseq2-viz --version
# moseq2-viz, version 1.2.0
We currently support .dat
, .tar.gz
, .avi
and .mkv
. You can read more about these depth data extensions here.
Each MoSeq project is contained within a base directory, and you should see the base directory using ls
in your current working directory and cd
to change the directory when you use the CLI. To better organize the output, you may want to specify <base_dir>
as your input directory and output directory in the CLI commands, if your working directory is not <base_dir>
. At this stage, the base directory should contain separate subfolders for each depth recording session, as shown below and you can read more about it here:
.
└── <base_dir>/
├── session_1/
├ ├── depth.dat
├ ├── depth_ts.txt
├ └── metadata.json
...
├── session_n/
├ ├── depth.dat
├ ├── depth_ts.txt
└── └── metadata.json
Note: if your data was acquired using an Azure Kinect, you will not have depth_ts.txt
or metadata.json
in your session subfolders. MoSeq will automatically generate the necessary files. The directory structure would be the following:
.
└── <base_dir>/
├── session_1/
├ └── session_1.mkv
...
├── session_n/
└── └── session_n.mkv
config.yaml
is the configuration file that holds all configurable parameters for all steps in the MoSeq pipeline, such as extraction parameters and PCA parameters. The parameters in config.yaml
are used to set relevant parameters in CLI commands.
To generate the initial default config.yaml
file in the current working directory, run the following command.
moseq2-extract generate-config # generates a config.yaml file in the current directory
Alternatively to generate the config.yaml
in a specified location pass the -o
flag.
moseq2-extract generate-config -o <specific path>/config.yaml # generates a config.yaml file in provided subfolder (if it exists)
MoSeq2 uses a Random Forest flip classifier to guarantee that the mouse's nose is always pointed to the right after cropping and rotationally aligning the depth videos.
The flip classifiers we provide are trained for experiments run with C57BL/6 mice using Kinect v2 depth cameras. We provide three kinds of pre-trained flip classifiers: large mice with fibers, adult male C57BL/6 mice, and mice with Inscopix cables.
To download a pre-trained flip classifier, run the following command. If the config.yaml
is in your current working directory, the downloaded flip classifier will add the flip classifier path to the config.yaml
file.
# The command below will prompt for input to indicate which one of 3 flip classifiers to download
moseq2-extract download-flip-file
You can pass in the path to your config file to the command:
# The command below will prompt for input to indicate which one of 3 flip classifiers to download
moseq2-extract download-flip-file <specific path>/config.yaml
If your dataset does not work well with our pre-trained flip classifiers, we provide a flip-classifier training notebook. After using this notebook, add the absolute path of your custom classifier to the flip_classifier
field config.yaml
file.
You can use the Interactive Arena Detection Tool in the MoSeq2 Extract Modeling Notebook to interactively find extraction parameters such as depth range, dilate iters, and mouse height, previewing the arena for extraction and extraction samples. Before running the cell for the interactive tool, run the Setup/Restore cell to set up progress variables.
Running IInteractive Arena Detection Tool is optional. If the Interactive Arena Detection Tool is not run, the default parameters in config.yaml
file will be used in the extraction step. You can find the file structure after running this tool here.
Instructions:
- Run the following cell to initialize the Arena detection widget. The cell renders a control panel to configure parameters for detecting the arena.
- By default the widget selects the first session in your dataset, sorted alphanumerically.
- Adjust the depth range for detecting the floor of the arena.
- Adjust the dilation iterations to include more of the wall of the arena.
- Click the
Compute arena mask
button to compute and display the mask for the detected floor given the parameters. The displayed mask won't recompute and refresh when you change the parameters unless you click the button. - Check the "Show advanced arena mask parameters" checkbox to display more advanced arena mask parameters and you can find more information about the parameters by running the CLI
moseq2-extract extract --help
. You can find documentation for CLI here. - If you like the arena mask, click the
Compute extraction
button to extract a subset of the data. - Once you are satisfied with the extraction, click the
Save parameters...
button to move on to the next session and save this session's parameters.
To extract data, pass path to any depth.dat
file in a session subfolder and specify the path to config.yaml
using --config-file
option. The extraction step uses parameters specified in config.yaml
and the the path to the flip classifier. If you use the interactive arena detection tool to find the parameters for each session, you should still use --config-file ./<base_dir>/config.yaml
to specify the path to the config file.
moseq2-extract extract ./<base_dir>/session_1/depth.dat --config-file ./<base_dir>/config.yaml
If everything worked, you should see an extraction movie that looks like the following video (within reason).
You can run the following command to extract the sessions sequentially. If your depth file extension is not .dat
, you can specify your file extension using --extensions
flag, eg --extensions .avi
. If you use the interactive arena detection tool to find the parameters for each sessions, use --config-file ./<base_dir>/session_config.yaml
instead of --config-file ./<base_dir>/config.yaml
.
moseq2-extract batch-extract <base_dir> --config-file ./<base_dir>/config.yaml
You can find the file structure after data extraction here.
Slurm
If you are running batch-extract
locally, the sessions will be extracted sequentially and the process can be fairly slow. You can extract the sessions parallelly with Google Compute Engine using Slurm. If you have a GCE Slurm cluster running (or Slurm locally), you can use --cluster-type slurm
to generate a bash script to run an extraction job for each session. Before running the bash script, don't forget to activate the virtual environment MoSeq packages are. You can find more information about the CLI options by running moseq2-extract batch-extract --help
.
ssh $slurm_login_node # login to a slurm cluster
moseq2-extract batch-extract <base_dir> --config-file ./<base_dir>/config.yaml --cluster-type slurm
After the bash script is generated, run the following command to run the script:
conda activate moseq2-app #activate virtual environment
bash ./<base_dir>/extract_out.sh
Once all of your raw data recordings have been extracted and are of good quality, to simply keep track of all the training data, you should consolidate all the output files from extraction in a single folder called aggregate_results/
. The command below recursively searches the current working directory for fully extracted recordings, copy the files contained within their respective proc/
directories into a new folder aggregate_results
and generate moseq2-index.yaml
(more information).
To aggregate your extraction results and generate the corresponding Index File, run the following command:
# assuming you are in the same working directory as the previous step
moseq2-extract aggregate-results --input-dir <base_dir> --output-dir <base_dir>
# assuming you are in the <base_dir> where the session subfolders live
moseq2-extract aggregate-results
The copied files in the aggregate_results/
will be named according to the variables in the recording's metadata.json
file with the following naming scheme: {start_time}_{session_name}_{subject_name}
.
The session information in metadata.json
is stored in metadata
field in moseq2-index.yaml
for each session, and a unique key (UUID) is given to each session. You can assign group labels to sessions for analyses comparing different cohorts or experimental conditions and the labels will also be stored in moseq2-index.yaml
.
moseq2-viz add-group
is used to specify groups for each session and the current working directory should be the same as aggregate_results
. For example, if your aggregate_results
is a folder in the base directory, then your current working directory should also be the base directory.
In this command, -k
flag is for specifying the keyword field in metadata
to look into, -v
flag is for specifying the value to look for, -g
flag is for specifying the group name. You can find more instructions for the command by running moseq2-viz add-group --help
.
For example, the following command will assign group name saline
to all the sessions whose SessionName
field in their metadata
matches the value saline
.
moseq2-viz add-group -k SessionName -v saline -g saline moseq2-index.yaml
You can also specify multiple values to look for. For example, the following command will assign group name saline
to all the sessions whose SubjectName
field in their metadata
matches one of the specified values.
moseq2-viz add-group -k SubjectName -v 000069 -v 000077 -v 000086 -g saline moseq2-index.yaml
You can use the interactive assign group tool in the MoSeq2 Extract Modeling Notebook to assign groups to sessions. Before running the cell for the interactive tool, run the Setup/Restore cell to set up progress variables.
The tool is intended for users to specify groups for the sessions interactively. The group
field in the moseq2-index.yaml
is used to store group labels, so the sessions can be grouped by experimental design for downstream analysis. Group labels in the moseq2-index.yaml
can be used analyses comparing different cohorts or experimental conditions. Initially, all sessions are labeled "default" and the Group Setter tool below is used to assign group labels to sessions. This step requires that all your sessions have a metadata.json file containing a session name.
Instructions:
- Run the following cell to launch the Group Setter tool.
- Click on a column name to sort the table by values in the column.
- Click the filter button to filter the values in a column.
- Click on the session to select the session. To select multiple sessions, click the sessions while holding the CTRL/COMMAND key, or click the first and last entry while holding the SHIFT key.
- Enter the group name in the text field and click
Set Group
to update thegroup
column for the selected sessions. - Click the
Update Index File
button to save current group assignments.
Fitting a PCA to the extracted data to determine the pose trajectories. This process computes Principal Components (PCs) that explain the largest possible percentage of the variance in your data.
Once all of your recordings have been correctly extracted and aggregated into the aggregate_results/
folder. Run the following command to fit PCA using input from aggregate_results/
and PCA output in _pca
.
Upon completion a new folder, _pca/
, will have been created containing the following files:
-
pca.h5
: HDF5 file that contains the principal components. -
pca.yaml
: YAML file that contains the configuration variables and metadata used to fit the PCA. -
pca_components.pdf/png
: Image containing a grid of 2D images representing each computed Principal Component. -
pca_scree.pdf/png
: Scree Plot that indicates the total number of computed PCs that explain 90% of the data.
# Assuming you are in the same directory as aggregate_results
moseq2-pca train-pca -i aggregate_results/ -o _pca/ --config-file config.yaml
Note: You may see warning messages that says: distributed.worker - WARNING - Memory use is high but worker has no data to store to disk. Perhaps some other process is leaking memory? Process memory: X GB -- Worker memory limit: Y GB
. This message doesn't mean there is an error and you can ignore the warning message as long as the process is not terminated. If you experience process getting killed or terminated, you may consider adding --nworkers 1
to limit the number of workers to 1.
You should see pca_componnts.png
and pca_scree.png
in _pca
along with pca.h5
, which stores the results of the computation. Here's a typical example of the first 50 components:

And the corresponding scree plot:

Slurm
If you are running train-pca
, the process can be fairly slow and you can train PCA parallelly with Google Compute Engine using Slurm. If you have a GCE Slurm cluster running (or Slurm locally), you can use --cluster-type slurm
to train PCA with Slurm. You can find more information about the CLI options by running moseq2-pca train-pca --help
.
ssh $slurm_login_node # login to a slurm cluster
srun --pty --mem=30G -n 5 bash # open an interactive node
# Assuming you are in the same directory as aggregate_results and the partition you intend to use is short
moseq2-pca train-pca -i aggregate_results/ -o _pca/ --cluster-type slurm -q short --config-file config.yaml
Applying the computed PCs onto the extracted data to output dimensionality reduced data points that will be used to train the AR-HMM in the following pipeline step. Run the following command to apply the PC coefficients and compute the principal component scores.
Upon completion, a pca_scores.h5
file will be created in the same directory as the files generated from train-pca
, for example _pca
.
# assuming you are in the same directory as _pca and the data aggregate_result
moseq2-pca apply-pca -i aggregate_results/ -o _pca/ --config-file config.yaml
You can find the file structure afte the PCA steps here.
Computing the distribution of block durations of the behaviors as captured by the PC Scores. The Model-Free Changepoints is used to compare with your AR-HMM model fits.
Computing the Model-Free Changepoints of your dataset requires both the pca.h5
and pca_scores.h5
. Run the following command to compute the Model-Free Changepoints captured by your Principal Components.
Note: Please make sure you specify -i aggregate_results
in the command so the sessions that go into the computation have no duplicates or the command will fail.
moseq2-pca compute-changepoints -i aggregate_results/ --config-file config.yaml
You can find the file structure after running Computer model-free changpoints here.
The changepoint distribution is typically a left-skewed distribution that has a median block duration of ~0.3 seconds.
Below is an example of an outputted changepoint distribution.

MoSeq uses AR-HMM on PC scores from the previous step to generate syllables.
You can fit different variations of AR-HMMs to your input data using moseq2-model
and you can access information about the CLI flags associated with the command by running moseq2-model learn-model --help
.
Below are examples of how to use each of the listed parameters above to train the different model types. The command specifies the input being _pca/pca_scores.h5
and the output model is model_dir/my_model.p
. --num-iter
specifies the number of times to resample the model, and the default number is 100. 100 iterations are good enough to explore the model parameters but we recommend setting --num-iter
to 1000 to get a more accurate model once decided on a set of parameters.
Non-Robust VS Robust
In non-robust models, the noise in autoregressive is z-distributed, whereas, in rousts models, the noise is t-distributed. Non-robust models generate fewer syllables than robust models.
Single Transition VS Separate Group Transition
Single transition means all groups will have one transition matrix and separate group transition means different groups will have different transition matrices. If the size of the data is small, we don't recommend modeling your data with separate group transitions.
Note: To attain accurate results, we recommend training at least 100 models, each with at least 1000 iterations by setting --num-iter 1000
. After all the models are trained, you can use moseq2-viz get-best-model
to find the best model that matches the PC changepoints.
- Non-Robust (z-distributed) Single Transition Graph AR-HMM The noise in autoregressive is Gaussian and all groups are modeled with one single transition matrix.
# assuming you are in the same directory as _pca
moseq2-model learn-model _pca/pca_scores.h5 model_dir/my_model.p --index ./moseq2-index.yaml
- Non-Robust (z-distributed) Separate Group Transition Graph AR-HMM
# assuming you are in the same directory as _pca
moseq2-model learn-model _pca/pca_scores.h5 model_dir/my_model.p --index ./moseq2-index.yaml --separate-trans
- Robust (t-distributed) Single Transition Graph AR-HMM
# assuming you are in the same directory as _pca
moseq2-model learn-model _pca/pca_scores.h5 model_dir/my_model.p --index ./moseq2-index.yaml --robust
- Robust (t-distributed) Separate Group Transition Graph AR-HMM
# assuming you are in the same directory as _pca
moseq2-model learn-model _pca/pca_scores.h5 model_dir/my_model.p --index ./moseq2-index.yaml --robust --separate-trans
The most important free parameter is kappa
, which corresponds to the model's prior probability distribution for outputted syllable durations. By default, kappa is set to the total number of frames in the dataset. Increasing the value of kappa will increase the outputted syllable durations by the model, and vice versa. To find the best kappa value that matches the PC score changepoints, you can use moseq2-model kappa-scan
to run models with a series of kappa values. You can find more information about the CLI options by running moseq2-model kappa-scan --help
.
You can find the file structure after fitting AR-HMM model here.
kappa-scan
Using Slurm
Currently, we support running kappa-scan
using Google Compute Engine using Slurm. If you have a GCE Slurm cluster running (or Slurm locally), you can use --cluster-type slurm
to generate a bash script to run a series of models with different kappa values. You can specify minimum kappa value using --min-kappa
flag (eg. --min-kappa 10000
), and maximum kappa value using --max-kappa
flag (eg. '--max-kappa 10000000
). You can find more information on scanning kappa and best practices running models in the analysis tips.
The bash script will be generated in the specified model directory. Before running the bash script, don't forget to activate the virtual environment MoSeq packages are.
ssh $slurm_login_node # login to a slurm cluster
moseq2-model kappa-scan _pca/pca_scores.h5 model_dir --cluster-type slurm
After the bash script is generated, run the following command to run the script:
conda activate moseq2-app #activate the virtual environment
bash ./model_dir/out.sh
When all the models finish running, you can run moseq2-viz get-best-model
to find the model that best fits the PC changepoints.
Note that the model results have -5 prepended to the actual labels to account for the number of lags in the model.
Thus, if you set nlags
to 3
(the default value), and your data has 5 frames, you should see the following,
import joblib
results = joblib.load('my_model.p')
print(results['labels'])
[-5, -5, -5, 0, 0, 48, 57, 57]
This command will save a plot model_pca_changepoints.png/pdf
showing the comparative changepoint distribution curves between the trained model and the PCA scores changepoints.
The command supports comparison concerning two objectives: duration
and jsd
. duration
finds the model where the median syllable duration best matches that of the principal components' changepoints. jsd
finds the model where the distribution of syllable durations best match that of the principal components' changepoints.
If there are multiple models in the inputted folder, then the outputted figure will plot multiple dashed distribution curves representing distributions of unselected models and 2 solid distribution curves that show the "Best"/chosen model and the principal compoments' changepoint durations.
moseq2-viz get-best-model model_dir/ _pca/changepoints.h5 model_pca_changepoints
moseq2-extract [Repository][Documentation]
moseq2-pca [Repository][Documentation]
moseq2-model [Repository][Documentation]
moseq2-viz [Repository][Documentation]
Home | Changelog | Acquisition | Installation | Analysis Pipeline | Troubleshooting and Tips |
Tutorials |
Join our slack |
-
- Conda installation
- Docker installation
-
Command-line alternatives
-
Troubleshooting and tips
-
Other resources