diff --git a/demos/notebooks/demo_pipeline_cnmfE.ipynb b/demos/notebooks/demo_pipeline_cnmfE.ipynb index b84797955..948135684 100644 --- a/demos/notebooks/demo_pipeline_cnmfE.ipynb +++ b/demos/notebooks/demo_pipeline_cnmfE.ipynb @@ -13,7 +13,7 @@ "2. Apply the constrained nonnegative matrix factorization endoscopic (CNMF-E) source separation algorithm to extract an initial estimate of neuronal spatial footprint and calcium traces.\n", "3. Apply quality control metrics to evaluate the initial estimates to narrow them down to a final set of estimates.\n", "\n", - "Some tools for visualization of movies and results are also included. \n", + "Tools for visualization of movies and results are also included. \n", "\n", "> This demo follows a similar pattern to the CNMF demo in `demo_pipeline.ipynb`. It includes less explanation except where there are important differences. If you want to get a more explanation-heavy picture of the fundamentals, we suggest starting with `demo_pipeline.ipynb`." ] @@ -117,7 +117,7 @@ "metadata": {}, "source": [ "# Load and visualize raw data\n", - "We visualize using the built-in movie object, which is described in detail in `demo_pipeline.ipynb`. In addition to neural activity, you can also see blood flow in the movie." + "We visualize using the built-in movie object, which is described in detail in `demo_pipeline.ipynb`. In addition to neural activity, you can also see blood flow through blood vessels in the movie. Such background activity, standard for 1p data, brings new analysis challenges. " ] }, { @@ -142,8 +142,8 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Set up a cluster\n", - "To enable parallel computing we will set up a local cluster. The resulting variable `cluster` contains the pool of processors (CPUs) that will be used in later steps. If you use `dview=cluster` in later steps, then parallel processing will be used. If you use `dview=None` then no parallel processing will be used. The `num_processors_to_use` variable determines how many CPU dores you will use (when set to `None` it goes to the default of one less than the number available):" + "# Set up multi-core processing\n", + "To enable parallel computing we will enable multiple CPUs for processing. The resulting `cluster` variable contains the pool of processors (CPUs) that will be used in later steps. If you use `dview=cluster` in later steps, then parallel processing will be used. If you use `dview=None` then no parallel processing will be used. The `num_processors_to_use` variable determines how many CPU cores you will use (when set to `None` it goes to the default of one less than the number available):" ] }, { @@ -160,7 +160,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Set up a cluster of processors. If one has already been set up (the `cluster` variable is already in your namespace), then that cluster will be closed and a new one created." + "If you have already set up multiprocessing (the `cluster` variable is already in your namespace), then that cluster will be closed and a new one created." ] }, { @@ -184,10 +184,10 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Set up parameters\n", + "# Define parameters\n", "We first set some parameters related to the data and motion correction and create a `params` object. We'll modify this parameter object later on with settings for source extraction. You can also set all the parameters at once as demonstrated in the `demo_pipeline.ipynb` notebook.\n", "\n", - "Note here we are setting `pw_rigid` to `False` as our data seems to mainly contain large-scale translational motion. We can always redo this later if it turns out to be a mistake." + "Note here we are setting `pw_rigid` to `False`, as our data seems to mainly contain large-scale translational motion. We can always redo this later if it turns out to be a mistake." ] }, { @@ -205,10 +205,10 @@ "# motion correction parameters\n", "motion_correct = True # flag for performing motion correction\n", "pw_rigid = False # flag for performing piecewise-rigid motion correction (otherwise just rigid)\n", - "gSig_filt = (3, 3) # size of high pass spatial filtering, used in 1p data\n", + "gSig_filt = (3, 3) # sigma for high pass spatial filter applied before motion correction, used in 1p data\n", "max_shifts = (5, 5) # maximum allowed rigid shift\n", "strides = (48, 48) # start a new patch for pw-rigid motion correction every x pixels\n", - "overlaps = (24, 24) # overlap between patches (size of patch strides+overlaps)\n", + "overlaps = (24, 24) # overlap between patches (size of patch = strides + overlaps)\n", "max_deviation_rigid = 3 # maximum deviation allowed for patch with respect to rigid shifts\n", "border_nan = 'copy' # replicate values along the boundaries\n", "\n", @@ -233,9 +233,9 @@ "metadata": {}, "source": [ "# Motion Correction\n", - "The background signal in micro-endoscopic data is very strong and makes motion correction challenging. As a first step the algorithm performs a high pass spatial filtering with a Gaussian kernel to remove the bulk of the lower-frequency background activity and enhance spatial landmarks. The size of the kernel is given from the parameter `gSig_filt`. If this is left to the default value of `None` then no spatial filtering is performed (default option, used in 2p data for CNMF). \n", + "The background signal in micro-endoscopic data is very strong and makes motion correction challenging. As a first step the algorithm performs a high pass spatial filtering with a Gaussian kernel to remove the bulk of the lower-frequency background activity and enhance spatial landmarks. The size of the kernel is given from the parameter `gSig_filt`. If this is left to the default value of `None` then no preprocessing is performed (default option, used in 2p data for CNMF). \n", "\n", - "After spatial filtering, the NoRMCorre algorithm is used to determine the motion in each frame. The inferred motion is then applied to the *original* data, so no information is lost before source separation. The motion corrected files are saved in memory mapped format. If no motion correction is performed (i.e., `motion_correct` was set to `False`), then the file gets directly memory mapped.\n", + "After spatial filtering, the NoRMCorre algorithm is used to determine the motion in each frame. The inferred motion is then applied to the *original* data, not the preprocessed data, so no information is lost before source separation. The motion corrected files are saved in memory mapped format. If no motion correction is performed (i.e., `motion_correct` was set to `False`), then the file gets directly memory mapped.\n", "\n", "> For a more detailed exploration of Caiman's motion correction pipeline, see `demo_motion_correction.ipynb`. \n", "\n", @@ -278,7 +278,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Compare original (left) and motion corrected movie (right). You will notice they look similar, as there wasn't much motion to begin with. You can see from the shift plot (plotted above) that the extracted shifts were all very small." + "Compare original (left panel) and motion corrected movie (right panel). You will notice they look similar, as there wasn't much motion to begin with. You can see from the shift plot (plotted above) that the extracted shifts were all very small." ] }, { @@ -401,7 +401,7 @@ "source": [ "
\n", "

CNMF-E: The Ring Model

\n", - " Background activity is very ill-behaved with 1p recordings: it often fluctuates locally and is much larger in magnitude than the neural signals we want to extract. In other words, the large-scale background model used for CNMF is not sufficient for most 1p data. Hence, Pengcheng Zhou and others came up with a localized model of background activity for CNMFE: the background at each pixel is represented as the weighted sum of activity from a circle (or ring) of pixels a certain distance from that pixel. The distance of this ring from the reference pixel is set by the ring_size_factor parameter. This more complex pixel-wise background model explains why CNMFE is computationally more expensive than CNMF, and also why it works better to mop up large-scale localized background noise in your 1p data. \n", + " Background activity is very ill-behaved with 1p recordings: it often fluctuates locally and is much larger in magnitude than the neural signals we want to extract. In other words, the large-scale background model used for CNMF is not sufficient for most 1p data. Hence, Pengcheng Zhou and others came up with a localized model of background activity for CNMFE: the background at each pixel is represented as the weighted sum of activity from a circle (or ring) of pixels a certain distance from that pixel. The distance of this ring from the reference pixel is set by the ring_size_factor parameter. This more complex pixel-by-pixel background model explains why CNMFE is computationally more expensive than CNMF, and also why it works better to mop up large-scale localized background noise in 1p data. \n", " \n", "When you set gnb in the CNMF model (usually to 1 or 2), you are setting the number of global background components. The fact that you can get away with so few is testament to how well-behaved the background activity is in 2p recordings. When we set gnb to 0 in Caiman, this is a flag telling Caiman's back end to switch to the more complicated ring model of the background activity. \n", "\n", @@ -414,7 +414,7 @@ "metadata": {}, "source": [ "## Key parameters for CNMFE\n", - "The key parameters for CNMFE are slightly different than for CNMF, but with some overlap. As we'll see, because of the high levels of background activity, we can't initialize the same way as with CNMF. We have two new important parameters directly related to initialization that come into play." + "The key parameters for CNMFE are slightly different than for CNMF, but with some overlap. As we'll see, because of the high levels of background activity, we can't initialize the same way as with CNMF. We have two new important parameters directly related to initialization that come into play: `min_corr` and `min_pnr`. " ] }, { @@ -424,20 +424,20 @@ "`rf` (int): *patch half-width*\n", "> `rf`, which stands for 'receptive field', is the half width of patches in pixels. The patch width is `2*rf + 1`. `rf` should be *at least* 3-4 times larger than the observed neuron diameter. The larger the patch size, the less parallelization will be used by Caiman. If `rf` is set to `None`, then CNMFE will be run on the entire field of view.\n", "\n", - "`stride_cnmf (int)`: *patch overlap*\n", - "> `stride_cnmf` is the overlap between patches in pixels (the actual overlap is `stride_cnmf + 1`). This should be at least the diameter of a neuron. The larger the overlap, the greater the computational load, but the results will be more accurate when stitching together results from different patches. This param should probably have been called 'overlap' instead of 'stride'.\n", + "`stride` (int): *patch overlap*\n", + "> `stride` is the overlap between patches in pixels (the actual overlap is `stride_cnmf + 1`). This should be at least the diameter of a neuron. The larger the overlap, the greater the computational load, but the results will be more accurate when stitching together results from different patches. This param should probably have been called 'overlap' instead of 'stride'.\n", "\n", "`gSig (int, int)`: *half-width of neurons*\n", - "> `gSig` is roughly the half-width of neurons in your movie in pixels (height, width). It is the standard deviation of the mean-centered Gaussian used to filter the movie before initialization for CNMFE. It is related to the `gSiz` parameter, which is the size (in pixels) of a bounding box created around each seed pixel during initilialization. You will usually set `gSiz` to between `2*gSig` and `4*gSig` for CNMFE. \n", + "> `gSig` is roughly the half-width of neurons in your movie in pixels (height, width). It is the standard deviation of the mean-centered Gaussian used to filter the movie before initialization for CNMFE. It is related to the `gSiz` parameter, which is the width of the entire kernel filter.\n", "\n", "`merge_thr (float)`: *merge threshold* \n", "> If the correlation between two spatially overlapping components is above `merge_thr`, they will be merged into one component. \n", "\n", "`min_corr` (float): *minimum correlation*\n", - "> Pixels from neurons tend to be correlated with their neighbors. For initialization we select for pixels above a minimum correlation `min_corr`. We discuss this more below.\n", + "> Pixels from neurons tend to be correlated with their neighbors. During initialization, Caiman filters out those pixels below `min_corr` to help select seed pixels. We discuss this more below.\n", "\n", "`min_pnr` (float): *minimum peak to noise ratio*\n", - "> Set a threshoild peak-to-noise ratio. Pixels from neurons tend to have a high signal-to-noise ratio. For initialization we select for pixels above a minimum peak-to-noise-ratio `min_pnr`. We discuss this more below." + "> Pixels from neurons tend to have a high signal-to-noise ratio. During initialization, Caiman filters out those pixels below `min_pnr` to help select seed pixels. We discuss this more below." ] }, { @@ -446,7 +446,7 @@ "source": [ "## Inspect summary images and set parameters\n", "### Correlation-pnr plot\n", - "For CNMFE, Caiman uses the correlation and peak-to-noise (PNR) ratio for initialization, which will both tend to be high in regions that contain neurons. Hence, we set a threshold for both quantitites to remove the low correlation/low pnr regions, and highlight the regions higher in both metrics, the regions most likely to contain neuronal activity. \n", + "As discussed above, for CNMFE, Caiman uses the correlation and peak-to-noise (PNR) ratio for initialization, which will both tend to be high in regions that contain neurons. Hence, we set a threshold for both quantitites to remove the low correlation/low pnr regions, and highlight the regions higher in both metrics: the regions most likely to contain neuronal activity. \n", "\n", "First, we calculate the correlation and pnr maps of the raw motion corrected movie after filtering with a mean-centered Gaussian with standard deviation `gSig` (for more information, see the sidebar below). These calculation can be computationally and memory demanding for large datasets, so we subsample if there are many thousands of frames:" ] @@ -486,10 +486,10 @@ "metadata": {}, "source": [ "We are looking for a couple of things in the above plot:\n", - "1) Did we filter with a `gSig` value small enough so that we aren't blending different neurons together? To see what it is like when `gSig` is too large, set `gsig_tmp` to `(6,6)` in the above cell and then inspect the resulting corr-pnr plots. \n", - "2) More importantly, we want to find the threshold correlation and pnr values so that the *lower* threshold eliminates most of the noise and blood vessels from the plots, leaving behind as many of the neural pixels as possible. For this data it will be at a correlation value lower bound between 0.8 and 0.9, and and pnr lower bound somewhere between 10 and 20 (as with CNMF, there is no perfect value: it is often an iterative search, but keep in mind it is better to have false positives later than false negatives).\n", + "1) Did we filter with a `gSig` value small enough so that we aren't blending different neurons together? To see what it is like when `gSig` is too large, set `gsig_tmp` to `(6,6)` in the above cell and then inspect the resulting correlation-pnr plots. \n", + "2) More importantly, we want to find the threshold correlation and pnr values so that the *lower* threshold eliminates most of the noise and blood vessels from the plots, leaving behind as many of the *neural* pixels as possible. For this data it will be at a correlation value lower bound between 0.8 and 0.9, and and pnr lower bound somewhere between 10 and 20. As with CNMF, there is no perfect value: it is often an iterative search, but keep in mind it is better to have false positives later than false negatives.\n", "\n", - "You can tweak the parameters in the following cell (included are some values that are reasonable), using the `change_params()` method:" + "You can tweak the parameters in the following cell, using the `change_params()` method:" ] }, { @@ -523,7 +523,7 @@ " \n", "How are correlation and peak-to-noise ratio actually calculated? First Caiman convolves the motion corrected movie with a mean-centered Gaussian (example to the right). The sigma of the Gaussian is gSig, and mean centering is turned on by setting center_psf to True. This mean centering creates a Gaussian with a positive peak in the middle of width approximately gSig/2, surrounded by a negative trench, and a ledge of zeros around the outer edges. This crucial preprocessing filter serves to highlight neuronal peaks and smooth away low-frequency background activity.\n", "\n", - "The function correlation_pnr() applies this mean-centered Gaussian to each frame of the motion corrected movie and returns the correlation image of that movie, as well as the peak-to-noise-ratio (PNR). The correlation image is the correlation of each pixel with its neighbors. The PNR is the ratio of the maximum magnitude at a pixel to the noise value at that pixel (it is a fast and rough measure of signal-to-noise). As mentioned above, both of these values tend to be higher in pixels that contain neurons: the CNMFE initialization procedure is to set a threshold for both quantities, take their product, and use the peaks in this product map to find seed pixels for initialization of the CNMFE source separation algorithm.\n", + "The function correlation_pnr() applies this mean-centered Gaussian to each frame of the motion corrected movie and returns the correlation image of that movie, as well as the peak-to-noise-ratio (PNR). The correlation image is the correlation of each pixel with its neighbors. The PNR is the ratio of the maximum magnitude at a pixel to the noise value at that pixel (it is a fast and rough measure of signal-to-noise). As mentioned above, both of these values tend to be higher in pixels that contain neurons. The CNMFE initialization procedure is to set a threshold for both quantities, take their product, and use the peaks in this product map to find seed pixels for initialization of the CNMFE source separation algorithm.\n", "\n", "More details on the initialization procedure used here can be found in the CNMFE paper, or by exploring the code. \n", "
" @@ -534,7 +534,7 @@ "metadata": {}, "source": [ "### Evaluate our spatial parameters\n", - "As discussed in `demo_pipeline.ipynb`, the other important paramters are those used for dividing the movie into patches for parallelization of the algorithm. Namely, we want to select `rf` and `stride` parameters so that at least 3-4 neuron diameters can fit into each patch, and at least one neuron fits in the overlap region between patches. You can visualize the patches using the `view_quilt()` function:" + "As discussed in `demo_pipeline.ipynb`, the other important parameters are those used for dividing the movie into patches for parallelization of the algorithm. Namely, we want to select `rf` and `stride` parameters so that at least 3-4 neuron diameters can fit into each patch, and at least one neuron fits in the overlap region between patches. You can visualize the patches using the `view_quilt()` function:" ] }, { @@ -553,9 +553,9 @@ "patch_ax = view_quilt(correlation_image, \n", " cnmfe_patch_stride, \n", " cnmfe_patch_overlap, \n", - " vmin=np.percentile(np.ravel(correlation_image),50), \n", - " vmax=np.percentile(np.ravel(correlation_image),99.5),\n", - " color='white',\n", + " vmin=np.percentile(np.ravel(correlation_image), 50), \n", + " vmax=np.percentile(np.ravel(correlation_image), 99.5),\n", + " color='yellow',\n", " figsize=(4,4));\n", "patch_ax.set_title(f'CNMFE Patch Width {cnmfe_patch_width}, Overlap {cnmfe_patch_overlap}');" ] @@ -564,7 +564,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "These patches and overlaps may seem a bit large, but that is ok: our main concern is that they not be too small. If you wanted to change them, you could use `change_params()` as dicussed in the CNMF notebook.\n", + "These patches and overlaps may seem a bit large, but that is ok: our main concern is that they not be too small. If you wanted to change them, you could use `change_params()`.\n", "\n", "Now that we are happy with our parameters, let's run the algorithm." ] @@ -639,15 +639,6 @@ "# Visualize results" ] }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "cnmfe_model.estimates.plot_contours_nb?" - ] - }, { "cell_type": "code", "execution_count": null, @@ -749,7 +740,7 @@ "metadata": {}, "source": [ "## Extract $\\Delta F/F$ values\n", - "Currently in Caiman, we don't return a true dff value for 1p data because Caiman normalizes to both the baseline fluorescence and background activity, and the background activity in 1p is so ill-behaved (as discussed above in the sidebar on the ring model). This is likely to change soon, but we currently only *detrend* the data but do not normalize to baseline (which explains the warning you will see when you run the following):" + "Currently in Caiman, we don't return a true dff value for 1p data. This is because, as mentioned in `demo_pipeline.ipynb`, Caiman normalizes to both the baseline fluorescence and background activity, and the background activity in 1p is so ill-behaved (as discussed above in the sidebar on the ring model). Hence, we currently only *detrend* the data by subtracting away the baseline but do not normalize to baseline. This explains the warning you will see when you run the following:" ] }, { @@ -772,7 +763,7 @@ "metadata": {}, "source": [ "## Deconvolution for 1p?\n", - "While we haven't discussed deconvolution (the estimation of the spikes that generated the calcium traces in `C`), we suggest treating the spike counts returned for 1p data (in `estimates.S`) with CNMFE with some caution. Currently (as of Fall 2023) we are aware of no no simultaneous ground-truth recordings used to evaluate deconvolution models for 1p data. There is a *lot* of such data for 2p data (for instance see [the Spikefinder paper](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1006157)).\n", + "While we haven't discussed deconvolution (the estimation of the spikes that generated the calcium traces in `C`), we suggest treating the spike counts returned for 1p data (in `estimates.S`) with CNMFE with some caution. Currently (as of early 2024) we are aware of no no simultaneous ground-truth recordings used to evaluate deconvolution models for 1p data. There is a *lot* of such data for 2p data (for instance see [the Spikefinder paper](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1006157)).\n", "\n", "Because of this, most researchers analyze the calcium traces directly for 1p recordings (the data in `estimates.C`) or a normalized version of them." ] @@ -824,7 +815,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "To view just the movie of pure neural activity:" + "### View neural movie" ] }, { @@ -847,7 +838,8 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Next we can view just the background model. You will see many regions that are constant such as blood vessels, but also lots of large-scale background flourescence, and some local activity which is is on spatial scales larger than `gSig`:" + "### View background model\n", + "You will see many regions that are constant such as blood vessels, but also lots of large-scale background flourescence, in addition to some local activity which is is on spatial scales larger than `gSig`:" ] }, { @@ -870,6 +862,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ + "### View model and residual\n", "We can also use the built-in `play_movie()` method to view the original movie, predicted movie, and the residual simultaneously as discussed in more detail in `demo_pipeline.ipynb`: " ] },