Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

org.qcmg.picard.SAMFileReaderFactory keeps spawning processes #330

Open
ZhaoxiangSimonCai opened this issue Mar 20, 2023 · 7 comments
Open
Assignees

Comments

@ZhaoxiangSimonCai
Copy link

ZhaoxiangSimonCai commented Mar 20, 2023

Describe the bug
When I run the latest qmotif on WGS files, org.qcmg.picard.SAMFileReaderFactory seems to keep spawning processes, and then the program fails with Too many open files. The --thread argument doesn't seem to help in this case.

To Reproduce
qmotif_cmd = "java -Xmx512G -jar /home/scai/tools/adamajava/qmotif/build/flat/qmotif.jar --threads 8 " \ f"--input-bam {BAM} " \ f"--input-bai {BAM}.bai " \ f"--log /home/scai/SangerTMM/qmotif-1.2/results_wgs/{cell_line}/{cell_line}.qmotif.log " \ "-ini /home/scai/SangerTMM/qmotif-1.2/config.ini " \ f"-output-xml /home/scai/SangerTMM/qmotif-1.2/results_wgs/{cell_line}/{cell_line}.xml " \ f"-output-bam /home/scai/SangerTMM/qmotif-1.2/results_wgs/{cell_line}/{cell_line}.telomere.bam"

Screenshots
image

Desktop (please complete the following information):

  • OS: CentOS Linux 7 (Core)
  • qmotif version: GitHub head version.

Additional context
I could run qmotif successfully on WES data. SAMFileReaderFactory also spawns many processes but it was able to finish properly.

Thank you so much for your help!

@ChristinaXu2017 ChristinaXu2017 self-assigned this Mar 20, 2023
@ChristinaXu2017
Copy link
Contributor

Based on your error message, the program died before reaching the thread step. Could I have look at your ini file? I doubt there may be too many [INCLUDES] regions.

@ZhaoxiangSimonCai
Copy link
Author

Thank you for your reply Christina. I'm using the default [INCLUDES] regions from your documentation page.

[PARAMS]
stage1_motif_regex=(TTAGGG.*TTAGGG.*TTAGGG.*TTAGGG)|(CCCTAA.*CCCTAA.*CCCTAA.*CCCTAA)
;stage1_string_rev_comp=true
stage2_motif_regex=(...GGG){2,}|(CCC...){2,}
window_size=10000
[INCLUDES]
; name, regions (sequence:start-stop)
chr1p chr1:10001-12464
chr1q chr1:249237907-249240620
chr2p chr2:10001-12592
chr2q chr2:243187373-243189372
chr2xA chr2:243150480-243154648
chr3p chr3:60001-62000
chr3q chr3:197960430-197962429
chr3xB chr3:197897576-197903397
chr4p chr4:10001-12193
chr4q chr4:191041613-191044275
chr5p chr5:10001-13806
chr5q chr5:180903260-180905259
chr6p chr6:60001-62000
chr6q chr6:171053067-171055066
chr7p chr7:10001-12238
chr7q chr7:159126558-159128662
chr8p chr8:10001-12000
chr8q chr8:146302022-146304021
chr9p chr9:10001-12359
chr9q chr9:141151431-141153430
chr10p chr10:60001-62000
chr10q chr10:135522469-135524746
chr11p chr11:60001-62000
chr11q chr11:134944458-134946515
chr12p chr12:60001-62000
chr12q chr12:133839458-133841894
chr12xC chr12:93158-97735
chr13p chr13:19020001-19022000
chr13q chr13:115107878-115109877
chr14p chr14:19020001-19022000
chr14q chr14:107287540-107289539
chr15p chr15:20000001-20002000
chr15q chr15:102518969-102521391
chr16p chr16:60001-62033
chr16q chr16:90292753-90294752
chr17p chr17:1-2000
chr17q chr17:81193211-81195210
chr18p chr18:10001-12621
chr18q chr18:78014226-78017247
chr19p chr19:60001-62000
chr19q chr19:59116822-59118982
chr20p chr20:60001-62000
chr20q chr20:62963520-62965519
chr21p chr21:9411194-9413193
chr21q chr21:48117788-48119894
chr22p chr22:16050001-16052000
chr22q chr22:51242566-51244565
chrXp chrX:60001-62033
chrXq chrX:155257733-155260559
chrYp chrY:10001-12033
chrYq chrY:59360739-59363565

Also I tried running qmotif on the WGS data of another sample that has deeper sequencing coverage, and it actually worked well. Could it be something that's in the bam file that causes this issue?

@ChristinaXu2017
Copy link
Contributor

I am also curious why you ask for so much RAM. this tool is not RAM-hungry. I just ran on my own WGS data with the same ini file. It works fine.

@ChristinaXu2017
Copy link
Contributor

ChristinaXu2017 commented Mar 20, 2023

Your error happens during reading the BAM file with a provided index file. It is possible something wrong with the index file or the BAM file. Alternatively, make sure the file is not moving during qmotif running.

@ZhaoxiangSimonCai
Copy link
Author

Our server has enough RAM so I set it to 512G just in case. Changing to 32G does not solve the problem.
No file is moving during running qmotif.
I tried to index the bam again but it is still the same.

Is it possible to provide more details about where the error happens, and what could be wrong? From simply viewing the BAM I can't see any problem.

@ChristinaXu2017
Copy link
Contributor

ChristinaXu2017 commented Mar 20, 2023

It seems qmotif works well with WES BAMs and some WGS BAM, but throw the exception "Too many open files" for some WGS BAMs. I just test it with a 38G WGS BAM with 4 threads, and the run was completed in less than 10 minutes. The INI file is from our document page as well.
I am also sure your problem is nothing to do with the "thread" option. your error happens before the parallel jobs are called. Because I can't test your particular BAM file, only based on the screenshot message and our code investigation, the exception says "FileNotFound", so I guess it may be caused by:

  1. The running environment issue, such as file updating, opening file handle number limits, permission issues, etc.
  2. BAM file issues, such as uncorrected BAM file header, and dated index file, which tools created index file, etc.
  3. hardware issues, such as I/O connection issues, RAM, cache, etc.

I am sorry that I really can't provide strong opinions without a real testing dataset.

@ZhaoxiangSimonCai
Copy link
Author

Thanks again. I managed to run qmotif with setting includes_only=true in the config.
I couldn't find much about this parameter on your documentation website or paper. Could you please help explain the difference if possible? I thought qmotif always only looked at the specified regions?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants