Description
Hello. Thanks for developing SCENIC+, it is super dope and it is giving me nice results so far.
What type of problem are you experiencing and which function is you problem related too
While preparing multiome datasets to run SCENIC+, when running find_highly_variable_features
my python process gets killed. I am running a python script directly in my workstation which has good memory:
Is this problem data set related? If so, provide information on the problematic data set
It works without issues on another dataset of mine (~26k cells, ~ 480k regions) except in this larger dataset with cells ~45k cells and ~540k regions
Describe alternatives you've considered
I tried running this step within Rstudio through reticulate() and in a python jupyter notebook too but it also gets killed because of memory.
Additional context
I am running all of this in a conda environment where I installed scenicplus, pycistopic and pycistarget.
I tried running this step of the pipeline using a python script:
import scenicplus
import pycisTopic
import scanpy as sc
import pandas as pd
import os
import pickle
import numpy as np
from pycisTopic.diff_features import (
impute_accessibility,
normalize_scores,
find_highly_variable_features,
find_diff_features)
import matplotlib.pyplot as plt
# Directories and working paths
projDir = "/mnt/data/SCENICplus_P1-B1-B2P1-B2P3R2/"
outDir = "/mnt/data/SCENICplus_P1-B1-B2P1-B2P3R2/output"
work_dir = "/mnt/data/SCENICplus_P1-B1-B2P1-B2P3R2/"
tmpDir = "/mnt/scratch/SCENICplus_temp/"
# Load the imputed accessibility object
with open(os.path.join(outDir, 'DARs', 'imputed_acc_obj.pkl'), 'rb') as infile:
imputed_acc_obj = pickle.load(infile)
# Normalize the imputed data
normalized_imputed_acc_obj = normalize_scores(imputed_acc_obj, scale_factor=10**4)
# Find highly variable features without plotting to save memory
variable_regions = find_highly_variable_features(normalized_imputed_acc_obj,
min_disp=0.05,
min_mean=0.0125,
max_mean=3,
max_disp=np.inf,
n_bins=20,
n_top_features=None,
plot=True,
save= outDir + '/DARs/HVR_plot.pdf')
# Save the results
with open(os.path.join(outDir, "DARs", "variable_regions.pkl"), "wb") as outfile:
pickle.dump(variable_regions, outfile)
I also tried running normalize_scores
first, save the output as a pkl file to then run find_highly_variable_features
but it doesn't work either.
Version information
Report versions of modules relevant to this error
Any help/insight would be greatly appreciated as I really need to finish preparing this file to then run SCENIC+. Thank you!