find_highly_variable_features [PERFORMANCE]

Hello. Thanks for developing SCENIC+, it is super dope and it is giving me nice results so far.

**What type of problem are you experiencing and which function is you problem related too**
While preparing multiome datasets to run SCENIC+, when running `find_highly_variable_features`  my python process gets killed.  I am running a python script directly in my workstation which has good memory:

![image](https://github.com/user-attachments/assets/cf5852ee-6d11-40fd-ad40-ebb3997eabc3)


**Is this problem data set related? If so, provide information on the problematic data set**
It works without issues on another dataset of mine (~26k cells, ~ 480k regions) except in this larger dataset with cells ~45k cells and  ~540k regions

**Describe alternatives you've considered**
I tried running this step within Rstudio through reticulate() and in a python jupyter notebook too but it also gets killed because of memory. 


**Additional context**
I am running all of this in a conda environment where I installed scenicplus, pycistopic and pycistarget.
I tried running this step of the pipeline using a python script:

```
import scenicplus
import pycisTopic 
import scanpy as sc
import pandas as pd
import os
import pickle
import numpy as np

from pycisTopic.diff_features import (
  impute_accessibility,
  normalize_scores,
  find_highly_variable_features,
  find_diff_features)

import matplotlib.pyplot as plt

# Directories and working paths
projDir = "/mnt/data/SCENICplus_P1-B1-B2P1-B2P3R2/"
outDir = "/mnt/data/SCENICplus_P1-B1-B2P1-B2P3R2/output"
work_dir = "/mnt/data/SCENICplus_P1-B1-B2P1-B2P3R2/"
tmpDir = "/mnt/scratch/SCENICplus_temp/"

# Load the imputed accessibility object
with open(os.path.join(outDir, 'DARs', 'imputed_acc_obj.pkl'), 'rb') as infile:
    imputed_acc_obj = pickle.load(infile)

# Normalize the imputed data
normalized_imputed_acc_obj = normalize_scores(imputed_acc_obj, scale_factor=10**4)

# Find highly variable features without plotting to save memory
variable_regions = find_highly_variable_features(normalized_imputed_acc_obj,
                                                 min_disp=0.05,
                                                 min_mean=0.0125,
                                                 max_mean=3,
                                                 max_disp=np.inf,
                                                 n_bins=20,
                                                 n_top_features=None,
                                                 plot=True,
                                           	 save= outDir + '/DARs/HVR_plot.pdf')

# Save the results
with open(os.path.join(outDir, "DARs", "variable_regions.pkl"), "wb") as outfile:
    pickle.dump(variable_regions, outfile)
```

Then it gets killed:
![image](https://github.com/user-attachments/assets/e4b07946-d9d8-409d-99a0-172379addb3d)


I also tried running `normalize_scores` first, save the output as a pkl file to then run `find_highly_variable_features` but it doesn't work either. 

**Version information**
Report versions of modules relevant to this error
![image](https://github.com/user-attachments/assets/5cf20027-cfc2-447f-93a2-52f0734243bb)

Any help/insight would be greatly appreciated as I really need to finish preparing this file to then run SCENIC+. Thank you!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

find_highly_variable_features [PERFORMANCE] #156

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

find_highly_variable_features [PERFORMANCE] #156

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions