Skip to content

Commit

Permalink
Add check for number of normalized dispersions (scverse#2231)
Browse files Browse the repository at this point in the history
* Add check for number of normalized dispersions

In sc.pp.highly_variable_genes() when flavor='cell_ranger' and
n_top_genes is set check that enough normalized dispersions have been
calculated and if not raise a warning and set n_top_genes to the number
of calculated dispersions.

Fixes scverse#2230

* Use .size instead of len()

* Add test for n_top_genes warning

* Add release note

* Remove blank line

Co-authored-by: Isaac Virshup <[email protected]>
  • Loading branch information
lazappi and ivirshup authored Aug 12, 2022
1 parent 4623bb7 commit 1844b91
Show file tree
Hide file tree
Showing 3 changed files with 20 additions and 1 deletion.
2 changes: 1 addition & 1 deletion docs/release-notes/1.9.2.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
```{rubric} Bug fixes
```
- {func}`~scanpy.pp.highly_variable_genes` `layer` argument now works in tandem with `batches` {pr}`2302` {smaller}`D Schaumont`

- {func}`~scanpy.pp.highly_variable_genes` with `flavor='cell_ranger'` now handles the case in {issue}`2230` where the number of calculated dispersions is less than `n_top_genes` {pr}`2231` {smaller}`L Zappia`

```{rubric} Performance
```
6 changes: 6 additions & 0 deletions scanpy/preprocessing/_highly_variable_genes.py
Original file line number Diff line number Diff line change
Expand Up @@ -265,6 +265,12 @@ def _highly_variable_genes_single_batch(
if n_top_genes > adata.n_vars:
logg.info('`n_top_genes` > `adata.n_var`, returning all genes.')
n_top_genes = adata.n_vars
if n_top_genes > dispersion_norm.size:
warnings.warn(
'`n_top_genes` > number of normalized dispersions, returning all genes with normalized dispersions.',
UserWarning,
)
n_top_genes = dispersion_norm.size
disp_cut_off = dispersion_norm[n_top_genes - 1]
gene_subset = np.nan_to_num(df['dispersions_norm'].values) >= disp_cut_off
logg.debug(
Expand Down
13 changes: 13 additions & 0 deletions scanpy/tests/test_highly_variable_genes.py
Original file line number Diff line number Diff line change
Expand Up @@ -506,3 +506,16 @@ def test_seurat_v3_mean_var_output_with_batchkey():
)
np.testing.assert_allclose(true_mean, result_df['means'], rtol=2e-05, atol=2e-05)
np.testing.assert_allclose(true_var, result_df['variances'], rtol=2e-05, atol=2e-05)


def test_cellranger_n_top_genes_warning():
X = np.random.poisson(2, (100, 30))
adata = sc.AnnData(X, dtype=X.dtype)
sc.pp.normalize_total(adata)
sc.pp.log1p(adata)

with pytest.warns(
UserWarning,
match="`n_top_genes` > number of normalized dispersions, returning all genes with normalized dispersions.",
):
sc.pp.highly_variable_genes(adata, n_top_genes=1000, flavor="cell_ranger")

0 comments on commit 1844b91

Please sign in to comment.