Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix bug for partial_fit when the number of batch samples is less than n_comp #3227

Closed
wants to merge 2 commits into from

Conversation

DingWB
Copy link

@DingWB DingWB commented Sep 12, 2024

For incremental PCA: sc.tl.pca(adata, n_comps=ndim, chunked=True)
sometimes, the number of samples for the last chunk is smaller than ndim, an error would be throw:

File /anvil/projects/x-mcb130189/Wubin/Software/miniconda3/envs/m3c/lib/python3.9/site-packages/pym3c/clustering.py:377, in run_dimension_reduction(***failed resolving arguments***)
    375 if not downsample or obs_chunk_size > downsample or adata.n_obs < downsample:
    376         logger.info(f"Running IncrementalPCA without downsampling")
--> 377         sc.tl.pca(adata, n_comps=ndim, chunked=True,
    378                           chunk_size=obs_chunk_size)
    379 else: # downsample
    380         logger.info(f"Running IncrementalPCA with downsample = {downsample}")

File /anvil/projects/x-mcb130189/Wubin/Software/miniconda3/envs/m3c/lib/python3.9/site-packages/scanpy/preprocessing/_pca.py:255, in pca(***failed resolving arguments***)
    253 for chunk, _, _ in adata_comp.chunked_X(chunk_size):
    254     chunk = chunk.toarray() if issparse(chunk) else chunk
--> 255     pca_.partial_fit(chunk)
    257 for chunk, start, end in adata_comp.chunked_X(chunk_size):
    258     chunk = chunk.toarray() if issparse(chunk) else chunk

File /anvil/projects/x-mcb130189/Wubin/Software/miniconda3/envs/m3c/lib/python3.9/site-packages/sklearn/base.py:1473, in _fit_context.<locals>.decorator.<locals>.wrapper(estimator, *args, **kwargs)
   1466     estimator._validate_params()
   1468 with config_context(
   1469     skip_parameter_validation=(
   1470         prefer_skip_nested_validation or global_skip_validation
   1471     )
   1472 ):
-> 1473     return fit_method(estimator, *args, **kwargs)

File /anvil/projects/x-mcb130189/Wubin/Software/miniconda3/envs/m3c/lib/python3.9/site-packages/sklearn/decomposition/_incremental_pca.py:304, in IncrementalPCA.partial_fit(self, X, y, check_input)
    298     raise ValueError(
    299         "n_components=%r invalid for n_features=%d, need "
    300         "more rows than columns for IncrementalPCA "
    301         "processing" % (self.n_components, n_features)
    302     )
    303 elif not self.n_components <= n_samples:
--> 304     raise ValueError(
    305         "n_components=%r must be less or equal to "
    306         "the batch number of samples "
    307         "%d." % (self.n_components, n_samples)
    308     )
    309 else:
    310     self.n_components_ = self.n_components

ValueError: n_components=100 must be less or equal to the batch number of samples 77

To fix this bug, I added a try and except to line 255 of _pca.py.

  • Closes #
  • Tests included or not required because:
  • Release notes not necessary because:

Copy link

codecov bot commented Sep 12, 2024

Codecov Report

Attention: Patch coverage is 50.00000% with 2 lines in your changes missing coverage. Please review.

Project coverage is 76.62%. Comparing base (bec794c) to head (8d1cb04).
Report is 35 commits behind head on main.

Files with missing lines Patch % Lines
src/scanpy/preprocessing/_pca.py 50.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3227      +/-   ##
==========================================
- Coverage   76.63%   76.62%   -0.02%     
==========================================
  Files         109      109              
  Lines       12533    12536       +3     
==========================================
+ Hits         9605     9606       +1     
- Misses       2928     2930       +2     
Files with missing lines Coverage Δ
src/scanpy/preprocessing/_pca.py 92.47% <50.00%> (-0.97%) ⬇️

@flying-sheep
Copy link
Member

flying-sheep commented Sep 24, 2024

Hi, can you please create an issue with a minimal reproducible example?

Alternatively please add a unit test that will trigger your newly added branch. You’ll be able to see if that worked when this comment goes away:

grafik

Lastly, please follow the pre-commit instructions:

src/scanpy/preprocessing/_pca.py:268:13: E722 Do not use bare `except`
    |
266 |             try:
267 |                 pca_.partial_fit(chunk)
268 |             except:
    |             ^^^^^^ E722
269 |                 continue
    |

Found 1 error.

@DingWB DingWB closed this Oct 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants