Empty file umi12f.fa when when running usearch in longread_umi nanopore_pipeline #53

mortenry · 2022-06-20T12:52:47Z

Hi

I am trying to run the longread_umi nanopore_pipeline with ONP data from a collaborator. However, after the adaptor/primer trimming step, the following error appears when the pipeline runs the usearch command:

usearch11.0.667_i86linux32 -fastx_uniques test_mb0/umi_binning/umi_ref/umi12f.fa -fastaout test_mb0/umi_binning/umi_ref/umi12u.fa -sizeout -minuniquesize 1 -relabel umi -strand both

---Fatal error---
Empty file test_mb0/umi_binning/umi_ref/umi12f.fa
usearch v11.0.667_i86linux32, 4.0Gb RAM (396Gb total), 72 cores
(C) Copyright 2013-18 Robert C. Edgar, all rights reserved.
https://drive5.com/usearch

Apparantly the umi12f.fa file is empty, and the pipeline fails to proceed.

I cannot find out why this is happening (or why the file is empty). The pipeline worked fine on the test data (both R9.4.1 and R10).

I have attached the output from the pipeline before the usearch command. I would highly appreciate any suggestions, and please let me know if you need more information.

best regards
Morten Rye

Program_Output_umi12f-empty.txt

rob234king · 2022-06-23T10:18:33Z

What your output of perfect_trim.log? I have below so not sure as used fast model that too many errors in my reads. But resulting files umi1.fq is empty? Seems to be the cutadapt command that is failing for me. I'm guessing the same as me?

Total read pairs processed: 16,766
Read 1 with adapter: 74 (0.4%)
Read 2 with adapter: 63 (0.4%)
Pairs that were too short: 127 (0.8%)
Pairs that were too long: 16,639 (99.2%)
Pairs written (passing filters): 0 (0.0%)

Total basepairs processed: 6,706,400 bp
Read 1: 3,353,200 bp
Read 2: 3,353,200 bp
Total written (filtered): 0 bp (0.0%)
Read 1: 0 bp
Read 2: 0 bp

rob234king · 2022-06-23T10:26:45Z

looks like some too long filter, so maybe changing --maximum-length parameter in cutadapt?

rob234king · 2022-06-23T10:31:34Z

putative trim did better but lost a lot. This output should be overwritten by the perfect trim which is why the file is empty as perfect trim fails. Total read pairs processed: 16,766
Read 1 with adapter: 13,114 (78.2%)
Read 2 with adapter: 7,593 (45.3%)
Pairs that were too short: 134 (0.8%)
Pairs written (passing filters): 5,889 (35.1%)

Total basepairs processed: 6,706,400 bp
Read 1: 3,353,200 bp
Read 2: 3,353,200 bp
Total written (filtered): 212,004 bp (3.2%)
Read 1: 106,002 bp
Read 2: 106,002 bp

rob234king · 2022-06-23T10:50:24Z

relaxing e value and max length and min length and more comes through the filter

rob234king · 2022-06-23T11:02:55Z

This is what I have so maybe a mistake here. Looks like cutadapt process is not working as intended, likely due to my misunderstanding of something.

Adapter number	adapter	Top strand	bottom strand	Top strand sequence	Bottom Strand sequence
PCA 2.05	13xN	ONLA30677	ONLA30678	TTTCTGTTGGTGCTGATATTGCNNNNNNNNNNNNNGGCGTCTGCTTGGGTGTTTAACCT	/5phos/GGTTAAACACCCAAGCAGACGCCNNNNNNNNNNNNNGAAGATAGAGCGACAGGCAAGT

-f GGCGTCTGCTTGGGTGTTTAACCT
-F TTTCTGTTGGTGCTGATATTGC
-r GGTTAAACACCCAAGCAGACGCC
-R GAAGATAGAGCGACAGGCAAGT \

mortenry · 2022-06-23T12:29:26Z

Thanks for input. Good idea to check "perfect_trim.log". It looks like this:

This is cutadapt 2.7 with Python 3.6.10
Command line parameters: -j 1 -e 0.2 -O 11 -m 18 -M 18 --discard-untrimmed -g GTATCGTGTAGAGACTGCGTAGG...TCAAGCCTCAGACAGTGGTTC -g AGTGATCGAGTCAGTGCGAGTG...CATAGCGTAAAAGGAGCAACA -G TGTTGCTCCTTTTACGCTA
TG...CACTCGCACTGACTCGATCACT -G GAACCACTGTCTGAGGCTTGA...CCTACGCAGTCTCTACACGATAC -o test_mb0/umi_binning/umi_ref/umi1.fq -p test_mb0/umi_binning/umi_ref/umi2.fq test_mb0/umi_binning/umi_ref/reads_tf_s
tart.fq test_mb0/umi_binning/umi_ref/reads_tf_end.fq
Processing reads on 1 core in paired-end mode ...
Finished in 0.21 s (67 us/read; 0.89 M reads/minute).

=== Summary ===

Total read pairs processed: 3,110
Read 1 with adapter: 2,036 (65.5%)
Read 2 with adapter: 1,557 (50.1%)
Pairs that were too short: 17 (0.5%)
Pairs that were too long: 3,093 (99.5%)
Pairs written (passing filters): 0 (0.0%)

Total basepairs processed: 1,217,910 bp
Read 1: 608,955 bp
Read 2: 608,955 bp
Total written (filtered): 0 bp (0.0%)
Read 1: 0 bp
Read 2: 0 bp

So it seems to be the problem that no reads pass the filter. However this does not change, even when I change to -m 10, -M 10000 and -e 10. It's also interesting to see the cutadapt commandline at the top where -m and -M are set to 18, but maybe it is supposed to be like this, since the UMI-length between the adapter/primer is 18. I don't know when the input -m and -M are used, so I am still confused.

Run command:
longread_umi nanopore_pipeline
-d fastq_runid_25dec2b6d463d9adc700c13264282f7c49ac4b50_0_0.fastq
-o test_mb0
-v 25
-q r10_min_high_g340
-m 10
-M 100000
-s 100
-e 10
-f GTATCGTGTAGAGACTGCGTAGG
-F TCAAGCCTCAGACAGTGGTTC
-r AGTGATCGAGTCAGTGCGAGTG
-R CATAGCGTAAAAGGAGCAACA
-c 2
-p 2
-t 1
-T 1

Morten

rob234king · 2022-06-23T12:34:10Z

you can not have an e value greater than 1 so would need to be 0.9 and should see them come through but this means 90% error on match so not a solution but something is not right with the cutadapt command. Either primer/adapter combinations are not right or something. Did you have UMI at one end or both?

mortenry · 2022-06-23T12:53:02Z

I understand now that you meant the e value in cutadapt and not the -e in the "longread_umi nanopore_pipeline" setup. The e in cutadapt cannot be specified as input (looks like defualt is 0.2 in the pipeline).
I got the primer/adapter inputs for the workflow from my collaborators. The excel-file they used to define them these look like this:

Oligo Name	Sequence 5' to 3' (include modification codes if applicable)	Scale (µmole)	Purification
GSP UMI fwd	GTATCGTGTAGAGACTGCGTAGGTTTVVVVTTVVVVTTVVVVTTVVVVTTTtcaagcctcagacagtggttc	0.05	DST
GSP UMI rev	AGTGATCGAGTCAGTGCGAGTGTTTVVVVTTVVVVTTVVVVTTVVVVTTTCATAGCGTAAAAGGAGCAACA	0.05	DST
UVP fwd	GGTGCTGAAGAAAGTTGTCGGTGTCTTTGTGTTAACCGTATCGTGTAGAGACTGCGTAGG	0.05	DST
UVP rev	GGTGCTGAAGAAAGTTGTCGGTGTCTTTGTGTTAACCAGTGATCGAGTCAGTGCGAGTG	0.05	DST

I am not an expert on the structures, but there is of course a possibility that something has been defined wrongly.

Morten

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Empty file umi12f.fa when when running usearch in longread_umi nanopore_pipeline #53

Empty file umi12f.fa when when running usearch in longread_umi nanopore_pipeline #53

mortenry commented Jun 20, 2022

rob234king commented Jun 23, 2022

rob234king commented Jun 23, 2022

rob234king commented Jun 23, 2022 •

edited

Loading

rob234king commented Jun 23, 2022

rob234king commented Jun 23, 2022

mortenry commented Jun 23, 2022

rob234king commented Jun 23, 2022

mortenry commented Jun 23, 2022

Empty file umi12f.fa when when running usearch in longread_umi nanopore_pipeline #53

Empty file umi12f.fa when when running usearch in longread_umi nanopore_pipeline #53

Comments

mortenry commented Jun 20, 2022

rob234king commented Jun 23, 2022

rob234king commented Jun 23, 2022

rob234king commented Jun 23, 2022 • edited Loading

rob234king commented Jun 23, 2022

rob234king commented Jun 23, 2022

mortenry commented Jun 23, 2022

rob234king commented Jun 23, 2022

mortenry commented Jun 23, 2022

rob234king commented Jun 23, 2022 •

edited

Loading