-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integration missing files #3
Comments
I attempted to run it again with the snakemake pathway. Not surprisingly I got the same error. Here's the script below which might be more helpful than the one I previously posted. (scrna-workflow) unam@IP:~/scrna-workflow$ snakemake -j 10 --config datafolder=data option=minimal all 1 Select jobs to execute... [Sat Nov 4 11:35:23 2023] Loading required package: optparse Attaching package: ‘MatrixGenerics’ The following objects are masked from ‘package:matrixStats’:
Loading required package: GenomicRanges Attaching package: ‘BiocGenerics’ The following objects are masked from ‘package:stats’:
The following objects are masked from ‘package:base’:
Loading required package: S4Vectors Attaching package: ‘S4Vectors’ The following objects are masked from ‘package:base’:
Loading required package: IRanges
Attaching package: ‘Biobase’ The following object is masked from ‘package:MatrixGenerics’:
The following objects are masked from ‘package:matrixStats’:
Warning messages: Attaching package: ‘celldex’ The following objects are masked from ‘package:SingleR’:
Loading required package: tidyverse Attaching package: ‘Seurat’ The following object is masked from ‘package:SummarizedExperiment’:
Warning message: Number of nodes: 6809 Running Louvain algorithm... Attaching package: ‘spam’ The following object is masked from ‘package:stats4’:
The following objects are masked from ‘package:base’:
Loading required package: viridisLite Try help(fields) to get started.
Execution halted Shutting down, this might take some time. |
Hi, First can you share the specifications of your computer. Second can you conda list your environment?
can you also type I suspect some packages changed in the environment due to updates which may cause some issues. Cheers, |
I tested myself. Just as I suspected, this is related to an updated package. I will pinpoint the problematic one and offer you a solution. Meantime, I can suggest our Docker container. You do not have to reinstall anything and it is pretty much run in the same way. https://hub.docker.com/r/sinanugur/cellsnake Cheers, |
I think I found the problematic package as explained here https://stackoverflow.com/questions/77370659/error-failed-to-collect-lazy-table-caused-by-error-in-db-collect-using In short, just run this in your environment to downgrade the problematic package:
Then run from the start. The problem was the prepocessing never finished so you cannot integrate. Therefore:
Now the conclusion of standard part may take more time since the clustering will be done. I recommend using Sorry for the inconvenience. I will fix the Bioconda recipe if this creates a problem for everyone before an update of that package. Cheers, |
Thanks for the help. I came across that bug as well and it's all up and running now! |
@ab4cp No worries, please share if you notice any other bugs or problems. Cellsnake is based on Seurat4 and usually requires more resources if you want to integrate more than 50,000 cells and after 100,000 cells Seurat4 is not that reliable. Seurat5 is better however the resource requirement still grows rapidly. I will push a new version on Seurat5. Unfortunately, high-performance computers are still required and laptops/desktops make single-cell analysis difficult. |
@sinanugur is the seurat 5 version available? I am on linux with 32vCPU 2000RAM and 5TB of free disk space. I was going to trial cellsnake to integrate about 25 datasets the number of cells would easily be over 1M. |
On the fetal brain dataset I tried running: cellsnake: cellsnake integrated advanced analyses_integrated/seurat/integrated.rds --resolution auto It drops out at about 86% completion. I'm not sure if this again is a related to an update in any packages. ##ERROR MESSAGE Error in rule celltypist_celltype: Removing output files of failed job celltypist_celltype since they might be corrupted: My conda info/list and R sessionInfo() is below:
populated config files : /home/student.unimelb.edu.au/acboynes/miniforge3/.condarc ##CONDA LIST packages in environment at /home/student.unimelb.edu.au/acboynes/miniforge3/envs/cellsnake:Name Version Build Channel_libgcc_mutex 0.1 conda_forge conda-forge ##R SESSIONINFO Matrix products: default locale: attached base packages: loaded via a namespace (and not attached): |
Do you have internet connection? I think celltypist requires the models to be download. type this and then rerun.
I think Seurat5 wont solve your resource problem. It requires quite a large computer (we usually run on 1TB ram and 126 CPUs, CPUs is not necessary but RAM is important) and it may takes days to finish differential expression analysis. You can integrate a smaller dataset in your current setup but it may still takes hours with cellsnake. I recommend to use a PCA dims 20 and integration methodology rcpa to test on 100K cells on your computer. For example,
|
Hi @sinanugur - this was actually a security restriction on my server sorry |
Hi @sinanugur I may have ran into another bug which I couldn't find too much troubleshooting information on. I'm getting a new error message with the fetal-brain dataset doubletFinder_v3. Also, getting the same error message when running some of my own datasets. ~/Downloads/fetal-brain$ cellsnake standard data --jobs 5 Loading required package: viridisLite Try help(fields) to get started. Trying to restart job 9. [Fri Nov 10 15:55:45 2023] |
Hi @ab4cp, Here is the solution:
Just to be on thre safe side, you can also check if the Seurat version messed up:
Cheers, |
@sinanugur thanks for the reply. I just updated both of those as you mentioned. I ran the fetal-brain dataset again and got the following error about cannot find seurat object Warning message: RuleException: |
OK, now the environment is missing Seurat, did you run?
Make sure the other dependencies are not updated. |
Yeh I ran the above. Looks like the correct version of seurat is in the environment packages in environment at /home/student.unimelb.edu.au/acboynes/miniforge3/envs/cellsnake:Name Version Build Channel_libgcc_mutex 0.1 conda_forge conda-forge |
Hmm this environment looks really crowded, can you create a clean environment.
|
@sinanugur thanks for that! It's up and running smoothly. I still have security limits for celltypist (this is being sorted out). Would you be able to give me the input for celltypist so I can run that part on my local computer with python or R. Thank you |
Hi @ab4cp, glad that it worked. So you struggle with Cellsnake celltypist or you need to run it elsewhere for your own purpose? First, cellsnake also uses SingleR, it is good enough for annotation. It is quite comprehensive. https://bioconductor.org/books/release/SingleRBook/ Celltypist is Python based, so cellsnake auto converts Seurat to Anndata format and so on. For example, this script converts https://github.com/sinanugur/scrna-workflow/blob/main/workflow/scripts/scrna-convert-to-h5ad.R Seurat RDS to anndata. this one reads anndata and use celltypist: https://github.com/sinanugur/scrna-workflow/blob/main/workflow/scripts/scrna-celltypist.py Then this one reads the prediction results and plot them: https://github.com/sinanugur/scrna-workflow/blob/main/workflow/scripts/scrna-celltypist.R You do not need the last one though, you can change Python code to get more plots, it also by default plot some annotation hetmaps etc. |
Hi, thanks so much for making this program it looks really promising. I had a couple of issues I was hoping to get help with.
My specifications are:
OS is ubuntu 20.04, linux 5.4
conda version 23.9.0
mamba version 1.5.3
python version 3.10.13.final.0
I downloaded cellsnake with no issues
installed the R packages and got the 'all packages are OK' message
and ran cellsnake standard data on the practice fetal-brain dataset provided in the workflow tutorial.
This is the path to the data files (cellsnake) unam@IP:~/Downloads/data$ ls
10X_17_028 10X_17_029
When I run 'cellsnake standard data'
(cellsnake) unam@IP:~/Downloads$ cellsnake standard data
It runs and I get some of the output files but then it terminates due to this following error. Any suggestions on how I can resolve this? Thanks!
Execution halted
[Fri Nov 3 15:52:34 2023]
Error in rule normalization_pca_rds:
jobid: 9
input: analyses/raw/percent_mt
10/resolution0.8/10X_17_028.rdsoutput: analyses/processed/percent_mt
10/resolution0.8/10X_17_028.rdsshell:
/home/unam@IP/miniforge3/envs/cellsnake/lib/python3.9/site-packages/cellsnake/scrna/workflow/scripts/scrna-normalization-pca.R --rds analyses/raw/percent_mt
10/resolution0.8/10X_17_028.rds --doublet.filter --normalization.method LogNormalize --cpu 1 --scale.factor 10000 --reference BlueprintEncodeData --variable.selection.method vst --nfeature 2000 --resolution 0.8 --output.rds analyses/processed/percent_mt10/resolution0.8/10X_17_028.rds --umap --tsne(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2023-11-03T153327.979328.snakemake.log
Traceback (most recent call last):
File "/home/#####/#####/miniforge3/envs/cellsnake/bin/cellsnake", line 10, in
sys.exit(main())
File "/home//#####/#####//miniforge3/envs/cellsnake/lib/python3.9/site-packages/cellsnake/command_line.py", line 379, in main
run_workflow(cli_arguments)
File "/home//#####/#####//miniforge3/envs/cellsnake/lib/python3.9/site-packages/cellsnake/command_line.py", line 351, in run_workflow
subprocess.check_call(str(snakemake_argument),shell=True)
File "/home//#####/#####//miniforge3/envs/cellsnake/lib/python3.9/subprocess.py", line 373, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'snakemake --retries 5 --rerun-incomplete -k -j 1 -s /home//#####/#####//miniforge3/envs/cellsnake/lib/python3.9/site-packages/cellsnake/scrna/workflow/Snakefile --config datafolder=data cellsnake_path=/home//#####/#####//miniforge3/envs/cellsnake/lib/python3.9/site-packages/cellsnake/scrna/ show_labels=True marker_plots_per_cluster_n=20 min_percentage_to_plot=5 reduction=cca highly_variable_features=2000 umap_markers_plot=True test_use=wilcox resolution=0.8 singler_ref=BlueprintEncodeData celltypist_model=Immune_All_Low.pkl mapping=org.Hs.eg.db min_features=200 percent_mt=10 max_molecules=Inf min_molecules=0 normalization_method=LogNormalize scale_factor=10000 metadata_column=condition confidence=0.05 organism=hsa percent_rp=0 variable_selection_method=vst doublet_filter=True species=human microbiome_min_features=3 metadata=None microbiome_min_cells=1 max_features=Inf logfc_threshold=0.25 tsne_markers_plot=False min_cells=3 min_hit_groups=4 taxa=genus dims=30 runid=haz29875 option=standard' returned non-zero exit status 1
I then attempted to integrate the data to see if it would run but I get the following output:
(cellsnake) unam:IP:~/Downloads$ cellsnake integrate data
{'10X_17_029', '10X_17_028'}
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Job stats:
job count
all 1
total 1
Select jobs to execute...
[Fri Nov 3 15:53:54 2023]
localrule all:
jobid: 0
reason: Rules with neither input nor output files are always executed.
resources: tmpdir=/tmp
The text was updated successfully, but these errors were encountered: