-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fatal error empty file ... #35
Comments
Hi, I am having the same error. Were you able to solve it? |
Do you also see very little clustering going on? |
Thanks for the great software, I'm excited to analyze my data but I'm also getting the same empty file error: |
I have done some troubleshooting on our data (100k reads), running the software chunk by chunk and I observed that the problem was that few reads were assigned to UMIs with size >= 3. In fact, at line 260 of |
Hi everyone, Sorry about the late reply. The pipeline is not very debug friendly and right now requires a intimate understanding of each step. Hopefully, I will have time this summer to add proper checks and terminal messages, to remedy this. As MaestSi points out the problem is most likely a sub-optimal ratio between number of tagged molecules and data generated. From my ONT R9.4.1 experiments it seems like I need a per molecule coverage of >15x to properly detect the molecule UMI with the pipeline. This means any molecule with < 15x coverage is not "detected" and processed at all. In cliu32's case I am quite confident the molecule/data ratio is the problem. With 5544 reads you would need <370 tagged molecules (5544/15 ) in your sample to successfully produce UMI consensus sequences. cliu32 detects 2203 UMI sequences, which a clustered into 2202 unique UMI clusters. This indicates there is >>> 370 molecules tagged in the sample - my guess would be up towards 100 000 molecules, and hence the method breaks. The interesting question is then: How can we ensure we start with the correct amount of tagged molecules? This can be difficult to determine as copy number, DNA integrity, and primer efficiency will impact this in different ways for different targets. We have used test sequencing and semi-quantitative PCR to estimate number of templates in our samples. If you want to get started with method quickly I would recommend starting from an amplicon of your target and use that as input for the method. By using an amplicon template it is easy to dilute to the desired number of templates for your purpose. Just a word of caution, we have had problems generating PCR products from < 1000 molecules, but this is probably a PCR optimization problem. The downside of this method is that PCR chimeras and errors are not removed, but you will still get very high quality amplicons. |
Catches the error SorenKarst#35
Hi Soren, Thanks again for the tool - I am still excited about the idea of whole ribosomal operons in microbial ecology. I believe that I have finally installed your pipeline correctly (on Ubuntu 20.04 and Ubuntu 16.04 running in a virtual machine). Currently, I am testing the pipeline with the small nanopore test dataset and the settings suggested in the readme (copy pasted into the terminal). This line in the logs is especially suscipious: I have attached my logs. Thanks in advance, Mitja |
Hi, I had a similar issue with racon v1.10 as described here. I solved it installing a newer racon version, e.g. v1.13. |
Hi Simone, thanks for the tip! I have updated racon (which worked like a charm) but I still end up with the same errors and empty files. I have checked the folders and files that were generated. The UMI binning and trimming seemed to have worked and the files are populated with data. I also found a populated *.SAM file in /test_941/umi_binning/umi_ref. I believe that you are right that the error has to do with the racon polishing, though checking through the options of the command that is called illegal, they seem perfectly fine. I am worried that I make some kind of silly mistake. Besides downloading the git, installing conda and then longread_umi and getting usearch run, there is nothing else to install, is there? Thanks, Mitja |
Hi, Soren,
Thank you for developing the tool. It is easy to install and I can run your test dataset successfully.
However, when I run my own fastq file with the corresponding f/F/r/R sequences, I got a series of fatal errors (below). The same adapter sequences are used in this sample as in your paper.
Thanks for troubleshooting!
[command]
longread_umi nanopore_pipeline -d 34a.fastq -v 30 -o 34a_out -s 90 -e 90 -m 1000 -M 1800 -f CAAGCAGAAGACGGCATACGAGAT -F (forward primer) -r AATGATACGGCGACCACCGAGATC -R (reverse primer) -c 3 -p 1 -q r941_min_high_g330 -t 1
[from the log file]
...
=== Summary ===
Total reads processed: 5,544
Reads with adapters: 0 (0.0%)
Reads that were too short: 0 (0.0%)
Reads that were too long: 0 (0.0%)
Reads written (passing filters): 5,544 (100.0%)
Total basepairs processed: 7,063,877 bp
Total written (filtered): 7,063,877 bp (100.0%)
Scoring long reads
5,544 reads (7,063,877 bp)
Outputting passed long reads
local:0/10/100%/4.9s
Computers / CPU cores / Max jobs to run
1:local / 8 / 10
local:0/10/100%/4.9s
usearch v11.0.667_i86linux32, 4.0Gb RAM (32.6Gb total), 8 cores
(C) Copyright 2013-18 Robert C. Edgar, all rights reserved.
https://drive5.com/usearch
License: personal use only
00:00 37Mb 0.1% Reading 39a_out/umi_binning/umi_ref00:00 41Mb 100.0% Reading 39a_out/umi_binning/umi_ref/umi12f.fa
00:00 76Mb 0.4% DF 00:00 73Mb 100.0% DF
00:00 73Mb 2203 seqs, 2203 uniques, 2203 singletons (100.0%)
00:00 73Mb Min size 1, median 1, max 1, avg 1.00
00:00 73Mb 0.0% Writing 39a_out/umi_binning/umi_ref00:00 73Mb 100.0% Writing 39a_out/umi_binning/umi_ref/umi12u.fa
usearch v11.0.667_i86linux32, 4.0Gb RAM (32.6Gb total), 8 cores
(C) Copyright 2013-18 Robert C. Edgar, all rights reserved.
https://drive5.com/usearch
License: personal use only
00:00 37Mb 0.1% Reading 39a_out/umi_binning/umi_ref00:00 41Mb 100.0% Reading 39a_out/umi_binning/umi_ref/umi12u.fa
00:00 80Mb 0.4% DF 00:00 73Mb 100.0% DF
00:00 73Mb 2203 seqs (tot.size 2203), 2203 uniques, 2203 singletons (100.0%)
00:00 73Mb Min size 1, median 1, max 1, avg 1.00
00:00 77Mb 100.0% DB
00:00 85Mb 100.0% 2202 clusters, max size 2, avg 1.0
00:00 85Mb 0.0% Writing centroids to 39a_out/umi_bi00:00 85Mb 100.0% Writing centroids to 39a_out/umi_binning/umi_ref/umi12c.fa
Clusters 2202
Max size 2
Avg size 1.0
Min size 1
Singletons 2201, 99.9% of seqs, 100.0% of clusters
Max mem 85Mb
Time 1.00s
Throughput 2203.0 seqs/sec.
[bwa_index] Pack FASTA... 0.00 sec
[bwa_index] Construct BWT for the packed sequence...
[bwa_index] 0.03 seconds elapse.
[bwa_index] Update BWT... 0.00 sec
[bwa_index] Pack forward-only FASTA... 0.00 sec
[bwa_index] Construct SA from BWT and Occ... 0.01 sec
[main] Version: 0.7.17-r1188
[main] CMD: bwa index 39a_out/umi_binning/umi_ref/umi12c.fa
[main] Real time: 0.086 sec; CPU: 0.052 sec
[bwa_aln_core] calculate SA coordinate... 0.25 sec
[bwa_aln_core] write to the disk... 0.00 sec
[bwa_aln_core] 28817 sequences have been processed.
[main] Version: 0.7.17-r1188
[main] CMD: bwa aln -n 6 -t 1 -N 39a_out/umi_binning/umi_ref/umi12c.fa 39a_out/umi_binning/umi_ref/umi12p.fa
[main] Real time: 0.272 sec; CPU: 0.266 sec
[bwa_aln_core] convert to sequence coordinate... 0.01 sec
[bwa_aln_core] refine gapped alignments... 0.00 sec
[bwa_aln_core] print alignments... 0.01 sec
[bwa_aln_core] 28817 sequences have been processed.
[main] Version: 0.7.17-r1188
[main] CMD: bwa samse -n 10000000 39a_out/umi_binning/umi_ref/umi12c.fa 39a_out/umi_binning/umi_ref/umi12p_map.sai 39a_out/umi_binning/umi_ref/umi12p.fa
[main] Real time: 0.043 sec; CPU: 0.033 sec
cat: 39a_out/umi_binning/umi_ref/umi_ref.fa: No such file or directory
[E::stk_seq] failed to open the input file/stream.
[bwa_index] Pack FASTA... 0.05 sec
[bwa_index] Construct BWT for the packed sequence...
[bwa_index] 1.10 seconds elapse.
[bwa_index] Update BWT... 0.03 sec
[bwa_index] Pack forward-only FASTA... 0.06 sec
[bwa_index] Construct SA from BWT and Occ... 0.34 sec
[main] Version: 0.7.17-r1188
[main] CMD: bwa index 39a_out/umi_binning/read_binning/reads_tf_umi1.fa
[main] Real time: 1.629 sec; CPU: 1.577 sec
[bwa_index] Pack FASTA... 0.06 sec
[bwa_index] Construct BWT for the packed sequence...
[bwa_index] 1.17 seconds elapse.
[bwa_index] Update BWT... 0.03 sec
[bwa_index] Pack forward-only FASTA... 0.05 sec
[bwa_index] Construct SA from BWT and Occ... 0.35 sec
[main] Version: 0.7.17-r1188
[main] CMD: bwa index 39a_out/umi_binning/read_binning/reads_tf_umi2.fa
[main] Real time: 1.698 sec; CPU: 1.647 sec
[bwa_seq_open] fail to open file '39a_out/umi_binning/read_binning/umi_ref_b1.fa' : No such file or directory
[fread] Unexpected end of file
[bwa_seq_open] fail to open file '39a_out/umi_binning/read_binning/umi_ref_b2.fa' : No such file or directory
[fread] Unexpected end of file
[09:07:06] UMI match filtering...
[09:07:06] Read orientation filtering...
[09:07:06] UMI match error filtering...
[09:07:06] UMI bin/cluster size ratio filtering...
[09:07:06] Print UMI matches...
[09:07:06] Done.
Computers / CPU cores / Max jobs to run
1:local / 8 / 1
0
Computers / CPU cores / Max jobs to run
1:local / 8 / 1
Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
local:0/1/100%/0.0s
local:0/1/100%/0.0s
Computers / CPU cores / Max jobs to run
1:local / 8 / 1
local:0/1/100%/0.0s
[09:07:12 - DataIndex] No sample_registry in 39a_out/raconx3_medakax1/consensus/_consensus.hdf
Traceback (most recent call last):
File "/home/cliu/anaconda3/envs/longread_umi/bin/medaka", line 11, in
sys.exit(main())
File "/home/cliu/anaconda3/envs/longread_umi/lib/python3.6/site-packages/medaka/medaka.py", line 532, in main
args.func(args)
File "/home/cliu/anaconda3/envs/longread_umi/lib/python3.6/site-packages/medaka/stitch.py", line 125, in stitch
index = medaka.datastore.DataIndex(args.inputs)
File "/home/cliu/anaconda3/envs/longread_umi/lib/python3.6/site-packages/medaka/datastore.py", line 206, in init
self.metadata = self._load_metadata()
File "/home/cliu/anaconda3/envs/longread_umi/lib/python3.6/site-packages/medaka/datastore.py", line 244, in _load_metadata
with DataStore(first_file) as ds:
File "/home/cliu/anaconda3/envs/longread_umi/lib/python3.6/site-packages/medaka/datastore.py", line 39, in init
self.fh = h5py.File(self.filename, self.mode)
File "/home/cliu/anaconda3/envs/longread_umi/lib/python3.6/site-packages/h5py/_hl/files.py", line 269, in init
fid = make_fid(name, mode, userblock_size, fapl, swmr=swmr)
File "/home/cliu/anaconda3/envs/longread_umi/lib/python3.6/site-packages/h5py/_hl/files.py", line 99, in make_fid
fid = h5f.open(name, flags, fapl=fapl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5f.pyx", line 78, in h5py.h5f.open
OSError: Unable to open file (unable to open file: name = '39a_out/raconx3_medakax1/consensus/_consensus.hdf', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)
sed: can't read 39a_out/raconx3_medakax1/consensus_raconx3_medakax1.fa: No such file or directory
gawk: fatal: cannot open file `39a_out/consensus_raconx3_medakax1.fa' for reading (No such file or directory)
usearch v11.0.667_i86linux32, 4.0Gb RAM (32.6Gb total), 8 cores
(C) Copyright 2013-18 Robert C. Edgar, all rights reserved.
https://drive5.com/usearch
License: personal use only
usearch -fastx_uniques 39a_out/variants/m_temp.fa -strand both -fastaout 39a_out/variants/u_temp.fa -uc 39a_out/variants/u_temp.uc -sizeout
---Fatal error---
Empty file 39a_out/variants/m_temp.fa
usearch v11.0.667_i86linux32, 4.0Gb RAM (32.6Gb total), 8 cores
(C) Copyright 2013-18 Robert C. Edgar, all rights reserved.
https://drive5.com/usearch
License: personal use only
usearch -cluster_fast 39a_out/variants/u_temp.fa -id 0.995 -strand both -centroids 39a_out/variants/c1_temp.fa -uc 39a_out/variants/c1_temp.uc -sort length -sizeout -sizein
---Fatal error---
Cannot open 39a_out/variants/u_temp.fa, errno=2 No such file or directory
usearch v11.0.667_i86linux32, 4.0Gb RAM (32.6Gb total), 8 cores
(C) Copyright 2013-18 Robert C. Edgar, all rights reserved.
https://drive5.com/usearch
License: personal use only
usearch -cluster_fast 39a_out/variants/c1_temp.fa -id 0.995 -strand both -centroids 39a_out/variants/c2_temp.fa -uc 39a_out/variants/c2_temp.uc -sort length -sizein -sizeout
---Fatal error---
Cannot open 39a_out/variants/c1_temp.fa, errno=2 No such file or directory
gawk: fatal: cannot open file `39a_out/variants/u_temp.uc' for reading (No such file or directory)
cat: 39a_out/variants/centroids.fa: No such file or directory
Computers / CPU cores / Max jobs to run
1:local / 8 / 10
Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
local:0/10/100%/0.0s
cat: '39a_out/variants/phasing_consensus/*/*variant.fa': No such file or directory
The text was updated successfully, but these errors were encountered: