Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Empty file umi12f.fa when when running usearch in longread_umi nanopore_pipeline #53

Open
mortenry opened this issue Jun 20, 2022 · 8 comments

Comments

@mortenry
Copy link

Hi

I am trying to run the longread_umi nanopore_pipeline with ONP data from a collaborator. However, after the adaptor/primer trimming step, the following error appears when the pipeline runs the usearch command:

usearch11.0.667_i86linux32 -fastx_uniques test_mb0/umi_binning/umi_ref/umi12f.fa -fastaout test_mb0/umi_binning/umi_ref/umi12u.fa -sizeout -minuniquesize 1 -relabel umi -strand both

---Fatal error---
Empty file test_mb0/umi_binning/umi_ref/umi12f.fa
usearch v11.0.667_i86linux32, 4.0Gb RAM (396Gb total), 72 cores
(C) Copyright 2013-18 Robert C. Edgar, all rights reserved.
https://drive5.com/usearch

Apparantly the umi12f.fa file is empty, and the pipeline fails to proceed.

I cannot find out why this is happening (or why the file is empty). The pipeline worked fine on the test data (both R9.4.1 and R10).

I have attached the output from the pipeline before the usearch command. I would highly appreciate any suggestions, and please let me know if you need more information.

best regards
Morten Rye

Program_Output_umi12f-empty.txt

@rob234king
Copy link

What your output of perfect_trim.log? I have below so not sure as used fast model that too many errors in my reads. But resulting files umi1.fq is empty? Seems to be the cutadapt command that is failing for me. I'm guessing the same as me?

Total read pairs processed: 16,766
Read 1 with adapter: 74 (0.4%)
Read 2 with adapter: 63 (0.4%)
Pairs that were too short: 127 (0.8%)
Pairs that were too long: 16,639 (99.2%)
Pairs written (passing filters): 0 (0.0%)

Total basepairs processed: 6,706,400 bp
Read 1: 3,353,200 bp
Read 2: 3,353,200 bp
Total written (filtered): 0 bp (0.0%)
Read 1: 0 bp
Read 2: 0 bp

@rob234king
Copy link

looks like some too long filter, so maybe changing --maximum-length parameter in cutadapt?

@rob234king
Copy link

rob234king commented Jun 23, 2022

putative trim did better but lost a lot. This output should be overwritten by the perfect trim which is why the file is empty as perfect trim fails. Total read pairs processed: 16,766
Read 1 with adapter: 13,114 (78.2%)
Read 2 with adapter: 7,593 (45.3%)
Pairs that were too short: 134 (0.8%)
Pairs written (passing filters): 5,889 (35.1%)

Total basepairs processed: 6,706,400 bp
Read 1: 3,353,200 bp
Read 2: 3,353,200 bp
Total written (filtered): 212,004 bp (3.2%)
Read 1: 106,002 bp
Read 2: 106,002 bp

@rob234king
Copy link

relaxing e value and max length and min length and more comes through the filter

@rob234king
Copy link

This is what I have so maybe a mistake here. Looks like cutadapt process is not working as intended, likely due to my misunderstanding of something.

Adapter number adapter Top strand bottom strand Top strand sequence Bottom Strand sequence
PCA 2.05 13xN ONLA30677 ONLA30678 TTTCTGTTGGTGCTGATATTGCNNNNNNNNNNNNNGGCGTCTGCTTGGGTGTTTAACCT /5phos/GGTTAAACACCCAAGCAGACGCCNNNNNNNNNNNNNGAAGATAGAGCGACAGGCAAGT

-f GGCGTCTGCTTGGGTGTTTAACCT
-F TTTCTGTTGGTGCTGATATTGC
-r GGTTAAACACCCAAGCAGACGCC
-R GAAGATAGAGCGACAGGCAAGT \

@mortenry
Copy link
Author

Thanks for input. Good idea to check "perfect_trim.log". It looks like this:

This is cutadapt 2.7 with Python 3.6.10
Command line parameters: -j 1 -e 0.2 -O 11 -m 18 -M 18 --discard-untrimmed -g GTATCGTGTAGAGACTGCGTAGG...TCAAGCCTCAGACAGTGGTTC -g AGTGATCGAGTCAGTGCGAGTG...CATAGCGTAAAAGGAGCAACA -G TGTTGCTCCTTTTACGCTA
TG...CACTCGCACTGACTCGATCACT -G GAACCACTGTCTGAGGCTTGA...CCTACGCAGTCTCTACACGATAC -o test_mb0/umi_binning/umi_ref/umi1.fq -p test_mb0/umi_binning/umi_ref/umi2.fq test_mb0/umi_binning/umi_ref/reads_tf_s
tart.fq test_mb0/umi_binning/umi_ref/reads_tf_end.fq
Processing reads on 1 core in paired-end mode ...
Finished in 0.21 s (67 us/read; 0.89 M reads/minute).

=== Summary ===

Total read pairs processed: 3,110
Read 1 with adapter: 2,036 (65.5%)
Read 2 with adapter: 1,557 (50.1%)
Pairs that were too short: 17 (0.5%)
Pairs that were too long: 3,093 (99.5%)
Pairs written (passing filters): 0 (0.0%)

Total basepairs processed: 1,217,910 bp
Read 1: 608,955 bp
Read 2: 608,955 bp
Total written (filtered): 0 bp (0.0%)
Read 1: 0 bp
Read 2: 0 bp

So it seems to be the problem that no reads pass the filter. However this does not change, even when I change to -m 10, -M 10000 and -e 10. It's also interesting to see the cutadapt commandline at the top where -m and -M are set to 18, but maybe it is supposed to be like this, since the UMI-length between the adapter/primer is 18. I don't know when the input -m and -M are used, so I am still confused.

Run command:
longread_umi nanopore_pipeline
-d fastq_runid_25dec2b6d463d9adc700c13264282f7c49ac4b50_0_0.fastq
-o test_mb0
-v 25
-q r10_min_high_g340
-m 10
-M 100000
-s 100
-e 10
-f GTATCGTGTAGAGACTGCGTAGG
-F TCAAGCCTCAGACAGTGGTTC
-r AGTGATCGAGTCAGTGCGAGTG
-R CATAGCGTAAAAGGAGCAACA
-c 2
-p 2
-t 1
-T 1

Morten

@rob234king
Copy link

you can not have an e value greater than 1 so would need to be 0.9 and should see them come through but this means 90% error on match so not a solution but something is not right with the cutadapt command. Either primer/adapter combinations are not right or something. Did you have UMI at one end or both?

@mortenry
Copy link
Author

I understand now that you meant the e value in cutadapt and not the -e in the "longread_umi nanopore_pipeline" setup. The e in cutadapt cannot be specified as input (looks like defualt is 0.2 in the pipeline).
I got the primer/adapter inputs for the workflow from my collaborators. The excel-file they used to define them these look like this:

Oligo Name Sequence 5' to 3' (include modification codes if applicable) Scale (µmole) Purification
GSP UMI fwd GTATCGTGTAGAGACTGCGTAGGTTTVVVVTTVVVVTTVVVVTTVVVVTTTtcaagcctcagacagtggttc 0.05 DST
GSP UMI rev AGTGATCGAGTCAGTGCGAGTGTTTVVVVTTVVVVTTVVVVTTVVVVTTTCATAGCGTAAAAGGAGCAACA 0.05 DST
UVP fwd GGTGCTGAAGAAAGTTGTCGGTGTCTTTGTGTTAACCGTATCGTGTAGAGACTGCGTAGG 0.05 DST
UVP rev GGTGCTGAAGAAAGTTGTCGGTGTCTTTGTGTTAACCAGTGATCGAGTCAGTGCGAGTG 0.05 DST

I am not an expert on the structures, but there is of course a possibility that something has been defined wrongly.

Morten

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants