-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can I run the pipeline with slurm on HPC? #26
Comments
Hi Xiao, thanks for your interest in using the tool. There is a snakemake-executor plugin that allows you to you snakemake scripts over SURM. https://snakemake.github.io/snakemake-plugin-catalog/plugins/executor/slurm.html When this is installed you only have to add Another way is to simply run is normally in SLURMs interactive mode, e.g.,
and then using the basic call ( This may be a bit tedious if you have many different samples you want to have analyzed simultaneously. Actually, I working on this in v0.3 to make this more straight-forward which may be released somewhere next week. |
Thanks! `#!/bin/bash snakemake --cores all --software-deployment-method conda` Can I ask what's the best way to run scanneo2 for multiple samples? I run it for another sample in a separate directory as below, which looks scanneo2 pipeline downloads dependencies and reference database again. Step1 Step2 Step3 Thank you! |
Hi, thank your for your feedback. So in principle you can run ScanNeo2 from the same directory (for differeent files) using a different config file. Just change the Line 12 in 52a3818
I need to check if snakemake somehow allows to lock these files such that they are only downloaded once and also not overwritten by multiple processes. However, if you create it in new folders you would have to download it again each time. But you gave me an idea. I could write a small scripts that accepts multiple files and distribute it across different nodes.. let me look into that. Thanks |
OK. I see. Thanks! **[Tue Jun 18 00:36:20 2024] Activating conda environment: .snakemake/conda/d8900894438122f24cfe02e35dcb6bfd_
Removing output files of failed job get_reads_hlatyping_PE since they might be corrupted: The whole log files are attached. Could you help fix the problem? I'm trying checking what's going on, but haven't figured out why they crashed. The only thing looks strange to me is the samtools command shown as above. The issue here is samtools should be used for BAM/SAM files, is it? Thanks! |
Hi, thank you for reporting this. As I turned out the issue was that the non-existent tmp folder (which is not check in for PE reads). I have resolved this in #27. I did a test run with PE reads and I don't see that error anymore. Could you be re-run this with v0.2.5? Thanks a lot. |
Hi Richard, This issue looks solved. Thanks! But, unfortunately I encountered another on the BWA alignment.
Removing output files of failed job bwa_align_dnaseq since they might be corrupted:" The full log file is attached. This time, I created a new working folder, downloaded your new pipeline and ran it. Cheers, |
Oh, forgot to say. I usually do this by BWA -R, as below: |
Hi, Yes, you are right. In principle one could set the readgroups directly in the BWA call. But it should not make any difference. Also the command you posted with the RG directly on BWA call works with your data? Thanks |
Hi, Thanks for your support! Attached is a small set of the reads (1000 reads per file), including normal, tumour and RNA-Seq. I'm wondering if the issue caused by the read ID in my fastq files, though they're raw sequencing data. See below. ==> normal_R1.fastq <== ==> normal_R2.fastq <== ==> RNA_R1.fastq <== ==> RNA_R2.fastq <== ==> tumor_R1.fastq <== ==> tumor_R2.fastq <== Cheers, |
OK. I found fastq files in .tests/integration. They should be downloaded from NCBI SRA database, while my data was generated by illumina sequencer and hasn't been uploaded/downloaded to/from SRA database. @SRR22881654.1 1 length=254 See the details of the formats. |
Thanks for providing me with the data. Seems like the Thanks |
Your pipeline moved a little bit further, beyond BWA. Here, the error below was caused by an older version of samtools used (without -u, Version: 1.9 ) that is incompatible with the command (with -u). Thanks! That would be very helpful to have a thorough check and test. |
Hi, I think I have run through ScanNeo2/v0.2.5 for one sample, after worked around a few issues, like pyfaidx, regtools, bedtools, spladder, etc. I cannot remember what exactly the issues are, but most are like installed them manually, edited problematic commands. BYW, the way I ran ScanNeo2 shown as above is very slow, which I believe ran jobs sequentially on a compute node as in a single VM. I need to figure out how to run the processes of ScanNeo2 in parallel if possible. Plus, does ScanNeo2 use dna_normal reads? My input data includes dna_tumor, rna_tumor, normal in config.yaml as below. During the running, I haven't seen normal DNA reads have been used. I will check your paper and my analysis further. data: Look forward to your test and update for ScanNeo2. Thank you! |
Hi, thats strange as the the versions from the conda environments should suffice. I will test it again on another HPC instance. Well, yes also the normal samples should be calculated. The only different (at the moment) is that in some steps the normal samples are getting ignored. The major bottleneck is probably the indel calling (short). Unfortunately, GATK cannot be parallelized too much... I'm distributing this over each chromosome but GATK does not provide an option to speed it up more.. (especially since the best practice requires to do the germline calling first, filter out specific variants and then do another round of variant calling). |
So for example regtools should be used over the singularity container... using the software-deployment option. Its strange that you had to install it yourself |
The HPC I'm using doesn't have singularity, so I had to run it with conda as below. Does your pipeline install "scanexitron" with both singularity or conda? If singularity only, my run probably didn't install scanexitron properly (including regtools). |
Thanks very much! |
Yes, so the dependencies for scanexitron are resolved over the Docker container that is loaded via Docker/Singularity. Its the only option... ScanNeo2/workflow/rules/exitron.smk Lines 53 to 54 in 7dcdff8
The main reason for that is that regtools v0.4.2 is not available via conda (only never versions). But yeah you would have to install it manually along with the other packages.. which I think you did. But I'm assuming your remove the Unfortunately, its the only solution so far. A colleague from the group is working on a improved version that should get to run without singularity. I have to ask him how far he is with that. However, do you get it to work when you have it manually? Another solution which isn't great is that you disable the exitron module and instead determine the exitrons (using ScanExitron or any other tool by youself) and then provide the resulting VCF file as |
Thanks! I first had to run it without singularity, because our HPC like yours doesn't have singularity. But I managed to install singularity via conda, and something looks as below. It's said installing singularity requires root permission, while executing it doesn't. In my case, conda seems overcame the limit. Later, I also reran the pipelines a few times as: I'm still checking if results are complete and expected. Meanwhile I'm running your pipeline for another sample and haven't seen any issue. Great! Sorry. I cannot describe exactly what I did. Sometimes edited your code a little bit, sometimes installed dependencies manually though the version installed maybe is not identical to the one defined in our pipeline like regtools. |
Thanks for your insights. Its no problem as long as you get it to run. Let me know if you run into other problems. I did a re-run with the original TESLA data on a new install. So far it runs through... at least until the priorization. I may be able to install regtools into an alternative conda environment and use this then... need to look into that. Thanks again |
Hi,
Assuming the configfile is located in config/config.yml and the |
Hi, I'm new to snakemake, and have no idea of submitting Scanneo2 jobs to HPC. Any instructions would be very helpful! Thanks!
Regards,
Xiao
The text was updated successfully, but these errors were encountered: