Skip to content

Analysis: pca

Jeffrey Markowitz edited this page Aug 22, 2018 · 8 revisions

Let's say you have extracted two sessions and you want to first compute the PC coefficients and then the PC scores. First, navigate to one folder up from your two extractions,

.
├── session_20180503101758
│   ├── depth.dat
│   ├── depth_ts.txt
│   ├── metadata.json
│   ├── proc
│   │   ├── bground.tiff
│   │   ├── first_frame.tiff
│   │   ├── results_00.h5
│   │   ├── results_00.mp4
│   │   ├── results_00.yaml
│   │   └── roi_00.tiff
│   ├── rgb.mp4
│   └── rgb_ts.txt
└── session_20180503112433
    ├── depth.dat
    ├── depth_ts.txt
    ├── metadata.json
    ├── proc
    │   ├── bground.tiff
    │   ├── first_frame.tiff
    │   ├── results_00.h5
    │   ├── results_00.mp4
    │   ├── results_00.yaml
    │   └── roi_00.tiff
    ├── rgb.mp4
    └── rgb_ts.txt

moseq2-pca will recursively search for results.h5 files, so you can organize your directories however you like, as long as you are at least one directory up from the extractions. To compute the PC coefficients run,

moseq2-pca train-pca

Here's how things should look,

You should also see pca_components.png and pca_scree.png in _pca along with pca.h5, which stores the results of the computation. Here's a typical example of the first 50 components,

And the corresponding scree plot,

Note that running locally can be fairly slow, we typically run our PCA and modeling on Google Compute Engine using Slurm. If you have a GCE Slurm cluster running (or Slurm locally), you need to start the analysis from an interactive node, e.g.

ssh $slurm_login_node # login to a slurm cluster
srun --pty --mem=30G -n 5 bash # open an interactive node
moseq2-pca train-pca --cluster-type slurm --queue short --wall-time 01:00:00 --nworkers 30 --timeout 10

This will start 30 worker processes using the short queue (so 30 jobs). It will wait 10 minutes to start the jobs, and then run with whatever workers have kicked off. NOTE: you may run into problems using slurm to compute PCA on your University's slurm cluster. First, try increasing the amount of memory per worker (using the --memory option). Second, try logging into the Dask dashboard (this will automatically start if you have bokeh installed prior to running the command), which is the infrastructure we use for parallelizing this calculation. If neither of these work, you may need to work with your University's IT department to make sure things are working as expected.

Now to apply the PC coefficients and retrieve the scores run the following in the same directory you run train-pca,

moseq2-pca apply-pca

You should see this,

This command can also be run on Slurm using the same syntax as above.