Skip to content
This repository has been archived by the owner on Dec 19, 2023. It is now read-only.

Importing single ended reads as 'SampleData[JoinedSequencesWithQuality]' does not work #2

Open
arivers opened this issue Jul 26, 2018 · 11 comments

Comments

@arivers
Copy link
Member

arivers commented Jul 26, 2018

the plugin runs single-ended sequences that are of type 'SampleData[SequencesWithQuality]' fine but when I import the same sequences with the same manifest using the data type 'SampleData[JoinedSequencesWithQuality]'

qiime tools import --type 'SampleData[JoinedSequencesWithQuality]' --input-path /Users/rivers/Documents/itsxpress/tests/test_data/manifest2.txt --output-path demux2.qza --source-format SingleEndFastqManifestPhred33

then run
trim-single --i-per-sample-sequences demux2.qza --p-region ITS2 --p-taxa F --o-trimmed trimmed2.qza --p-threads 2

I get the following error:

Plugin error from itsxpress:

  Command '['vsearch', '--cluster_size', '/var/folders/py/c71jxp_54712mpc0c84zgty40000gn/T/itsxpress_mvgh02y9/seq.fq.gz', '--centroids', '/var/folders/py/c71jxp_54712mpc0c84zgty40000gn/T/itsxpress_mvgh02y9/rep.fa', '--uc', '/var/folders/py/c71jxp_54712mpc0c84zgty40000gn/T/itsxpress_mvgh02y9/uc.txt', '--strand', 'both', '--id', '0.995', '--threads', '2']' returned non-zero exit status 1

Debug info has been saved to /var/folders/py/c71jxp_54712mpc0c84zgty40000gn/T/qiime2-q2cli-err-r6fe7soe.log

The log error is

vsearch v2.7.0_macos_x86_64, 16.0GB RAM, 4 cores
https://github.com/torognes/vsearch



Fatal error: File too small

ERROR:root:Could not perform clustering with Vsearch. Error from Vsearch was:
 vsearch v2.7.0_macos_x86_64, 16.0GB RAM, 4 cores
https://github.com/torognes/vsearch



Fatal error: File too small
Traceback (most recent call last):
  File "/Users/rivers/Documents/itsxpress/itsxpress/main.py", line 423, in cluster
    p2.check_returncode()
  File "/Users/rivers/anaconda3/envs/qiime2-2018.6/lib/python3.5/subprocess.py", line 349, in check_returncode
    self.stderr)
subprocess.CalledProcessError: Command '['vsearch', '--cluster_size', '/var/folders/py/c71jxp_54712mpc0c84zgty40000gn/T/itsxpress_mvgh02y9/seq.fq.gz', '--centroids', '/var/folders/py/c71jxp_54712mpc0c84zgty40000gn/T/itsxpress_mvgh02y9/rep.fa', '--uc', '/var/folders/py/c71jxp_54712mpc0c84zgty40000gn/T/itsxpress_mvgh02y9/uc.txt', '--strand', 'both', '--id', '0.995', '--threads', '2']' returned non-zero exit status 1
Traceback (most recent call last):
  File "/Users/rivers/anaconda3/envs/qiime2-2018.6/lib/python3.5/site-packages/q2cli/commands.py", line 274, in __call__
    results = action(**arguments)
  File "<decorator-gen-402>", line 2, in trim_single
  File "/Users/rivers/anaconda3/envs/qiime2-2018.6/lib/python3.5/site-packages/qiime2/sdk/action.py", line 232, in bound_callable
    output_types, provenance)
  File "/Users/rivers/anaconda3/envs/qiime2-2018.6/lib/python3.5/site-packages/qiime2/sdk/action.py", line 367, in _callable_executor_
    output_views = self._callable(**view_args)
  File "/Users/rivers/Documents/q2_itsxpress/q2_itsxpress/_itsxpress.py", line 213, in trim_single
    cluster_id=cluster_id)
  File "/Users/rivers/Documents/q2_itsxpress/q2_itsxpress/_itsxpress.py", line 284, in main
    sobj.cluster(threads=threads, cluster_id=cluster_id)
  File "/Users/rivers/Documents/itsxpress/itsxpress/main.py", line 426, in cluster
    raise e
  File "/Users/rivers/Documents/itsxpress/itsxpress/main.py", line 423, in cluster
    p2.check_returncode()
  File "/Users/rivers/anaconda3/envs/qiime2-2018.6/lib/python3.5/subprocess.py", line 349, in check_returncode
    self.stderr)
subprocess.CalledProcessError: Command '['vsearch', '--cluster_size', '/var/folders/py/c71jxp_54712mpc0c84zgty40000gn/T/itsxpress_mvgh02y9/seq.fq.gz', '--centroids', '/var/folders/py/c71jxp_54712mpc0c84zgty40000gn/T/itsxpress_mvgh02y9/rep.fa', '--uc', '/var/folders/py/c71jxp_54712mpc0c84zgty40000gn/T/itsxpress_mvgh02y9/uc.txt', '--strand', 'both', '--id', '0.995', '--threads', '2']' returned non-zero exit status 1

I exported the qza object and the fastq.gz file is not empty.

@arivers arivers assigned arivers and kweber1 and unassigned arivers Jul 26, 2018
@kweber1
Copy link
Contributor

kweber1 commented Jul 27, 2018

Should be fixed in the 1.6.5 update. If not reopen this issue.

@kweber1 kweber1 closed this as completed Jul 27, 2018
@ChristopherBurgess-USDA
Copy link

ChristopherBurgess-USDA commented Jan 31, 2019

Hello I am still getting a very similar error running itsxpress v1.7.2. Here is the command I am running after de-multiplexing. Do you think I would get a similar error with it not using the qiime2 plug in and using the tool itself?

EDIT: I just read the open issues and I think that is where my issue is I have samples with no reads likely.

qiime itsxpress trim-pair-output-unmerged \
  --i-per-sample-sequences "${output}r1_sequences_demux.qza" \
  --p-region ITS1 \
  --p-taxa F \
  --p-threads $threads \
  --verbose \
  --o-trimmed "${output}r1_sequences_demux_trimmed.qza"

Here is the error output:

(qiime2-2018.11) [Linux@symbiosis]$ qiime itsxpress --version
itsxpress version 1.7.2
(qiime2-2018.11) [Linux@symbiosis]$ cat full_qiime2_std/full_qiime2_std.e906556
ERROR:root:Could not perform clustering with Vsearch. Error from Vsearch was:
 vsearch v2.7.0_linux_x86_64, 1007.4GB RAM, 128 cores
https://github.com/torognes/vsearch

Fatal error: File too small
Traceback (most recent call last):
  File "/home/roots/burgesch/miniconda3/envs/qiime2-2018.11/lib/python3.5/site-packages/itsxpress/main.py", line 557, in cluster
    p2.check_returncode()
  File "/home/roots/burgesch/miniconda3/envs/qiime2-2018.11/lib/python3.5/subprocess.py", line 349, in check_returncode
    self.stderr)
subprocess.CalledProcessError: Command '['vsearch', '--cluster_size', '/data/itsxpress_oz_u74ad/seq.fq.gz', '--centroids', '/data/itsxpress_oz_u74ad/rep.fa', '--uc', '/data/itsxpress_oz_u74ad/uc.txt', '--strand', 'both', '--id', '0.995', '--threads', '20']' returned non-zero exit status 1
Traceback (most recent call last):
  File "/home/roots/burgesch/miniconda3/envs/qiime2-2018.11/lib/python3.5/site-packages/q2cli/commands.py", line 274, in __call__
    results = action(**arguments)
  File "<decorator-gen-268>", line 2, in trim_pair_output_unmerged
  File "/home/roots/burgesch/miniconda3/envs/qiime2-2018.11/lib/python3.5/site-packages/qiime2/sdk/action.py", line 231, in bound_callable
    output_types, provenance)
  File "/home/roots/burgesch/miniconda3/envs/qiime2-2018.11/lib/python3.5/site-packages/qiime2/sdk/action.py", line 362, in _callable_executor_
    output_views = self._callable(**view_args)
  File "/home/roots/burgesch/miniconda3/envs/qiime2-2018.11/lib/python3.5/site-packages/q2_itsxpress/_itsxpress.py", line 242, in trim_pair_output_unmerged
    cluster_id=cluster_id)
  File "/home/roots/burgesch/miniconda3/envs/qiime2-2018.11/lib/python3.5/site-packages/q2_itsxpress/_itsxpress.py", line 301, in main
    sobj.cluster(threads=threads, cluster_id=cluster_id)
  File "/home/roots/burgesch/miniconda3/envs/qiime2-2018.11/lib/python3.5/site-packages/itsxpress/main.py", line 560, in cluster
    raise e
  File "/home/roots/burgesch/miniconda3/envs/qiime2-2018.11/lib/python3.5/site-packages/itsxpress/main.py", line 557, in cluster
    p2.check_returncode()
  File "/home/roots/burgesch/miniconda3/envs/qiime2-2018.11/lib/python3.5/subprocess.py", line 349, in check_returncode
    self.stderr)
subprocess.CalledProcessError: Command '['vsearch', '--cluster_size', '/data/itsxpress_oz_u74ad/seq.fq.gz', '--centroids', '/data/itsxpress_oz_u74ad/rep.fa', '--uc', '/data/itsxpress_oz_u74ad/uc.txt', '--strand', 'both', '--id', '0.995', '--threads', '20']' returned non-zero exit status 1

Plugin error from itsxpress:

  Command '['vsearch', '--cluster_size', '/data/itsxpress_oz_u74ad/seq.fq.gz', '--centroids', '/data/itsxpress_oz_u74ad/rep.fa', '--uc', '/data/itsxpress_oz_u74ad/uc.txt', '--strand', 'both', '--id', '0.995', '--threads', '20']' returned non-zero exit status 1

@arivers
Copy link
Member Author

arivers commented Jan 31, 2019

If you are using the newest versions of ITSxpress and Q2-itsxpress then this may be the same symptom but it likely has a different cause.

From within your qiime conda environment run
conda list | grep "itsxpress"
you may need to update itsxpress with conda

If you are up to date then the same error will be generated by any file that is empty after merging. This often happens when people include negative control files in their manifests.

@ChristopherBurgess-USDA

DOn't think the version is the problem

(qiime2-2018.11) [Linux@roots1]$ conda list | grep "itsxpress"
itsxpress                 1.7.2                    py35_0    bioconda
q2-itsxpress              1.7.2                     <pip>

I'm pretty sure the problem is that some of my samples didn't sequence well so after demultiplex they had less than 10 reads. I wrote a little work around script which exports out the demux.qza and creates a manifest of samples with more than 100 sequences then imports it back into qiime. This should fix them problem.

@ChristopherBurgess-USDA
Copy link

@arivers I think I worked around the problem with files with no sequences by exporting the qiime artifact and reimporting only the files with more than 100 sequences; however, I've run into a different problem in which the output form ITSxpress is empty. I'm not entirely sure what is going on. When I take it through a otu vsearch pipeline I definitely have fungal sequences. Here is my pipeline:

threads=15

output="${DIRPATH}dada2/qiime/qiime2_its_"


source activate qiime2-2018.11

qiime tools import \
  --type EMPPairedEndSequences \
  --input-path "${DIRPATH}READS/RAW1" \
  --output-path "${output}r1_sequences.qza"

qiime demux emp-paired \
  --i-seqs "${output}r1_sequences.qza" \
  --m-barcodes-file "${DIRPATH}READS/backup/its_mappign_file_01.txt" \
  --m-barcodes-column BarcodeSequence \
  --p-rev-comp-mapping-barcodes \
  --o-per-sample-sequences "${output}r1_sequences_demux.qza"



qiime tools export \
  --input-path "${output}r1_sequences_demux.qza" \
  --output-path "${DIRPATH}dada2/qiime/temp"

source deactivate

Rscript --vanilla --verbose "${DIRPATH}dada2/qiime/filter_sample_seq_count.R"

source activate qiime2-2018.11

qiime tools import \
  --type 'SampleData[PairedEndSequencesWithQuality]' \
  --input-format PairedEndFastqManifestPhred33 \
  --input-path "${DIRPATH}dada2/qiime/manifest" \
  --output-path "${output}r1_sequences_demux_sample_filtered.qza"

rm -rf "${DIRPATH}dada2/qiime/temp"

qiime itsxpress trim-pair-output-unmerged \
  --i-per-sample-sequences "${output}r1_sequences_demux_sample_filtered.qza" \
  --p-region ITS1 \
  --p-taxa F \
  --p-cluster-id .99 \
  --p-threads $threads \
  --verbose \
  --o-trimmed "${output}r1_sequences_demux_trimmed.qza"

and this are the files I get out:

(qiime2-2018.11) [Linux@roots1]$ ls -ltr | grep "r1"
-rw-r--r-- 1 burgesch roots_dept 7751398939 Feb  6 11:11 qiime2_its_r1_sequences.qza
-rw-r--r-- 1 burgesch roots_dept 2382247498 Feb  6 12:20 qiime2_its_r1_sequences_demux.qza
-rw-r--r-- 1 burgesch roots_dept 2382203106 Feb  6 12:35 qiime2_its_r1_sequences_demux_sample_filtered.qza
-rw-r--r-- 1 burgesch roots_dept     376144 Feb  8 13:24 qiime2_its_r1_sequences_demux_trimmed.qza

You can see after the itsxpress step, the sequencing file drops significantly in size.

The sequences were generated using the earth microbiome protocol.

@brittonstrickland
Copy link

Any luck with this error? I am getting the same error as OP with latest qiime2 and ITSxpress 1.8.0. Like a previous poster, I limited the files with few reads by implementing a cutoff (simply by deleting samples from my manifest file) of samples with reads >100,200,500, 1000, and 5000. I am still getting the same error "File too small"

@arivers
Copy link
Member Author

arivers commented Sep 16, 2021

It may be an issue with your sequences not merging well. can you run the qiime2 command in -verbose mode? and post the log files?

@arivers arivers reopened this Sep 16, 2021
@brittonstrickland
Copy link

brittonstrickland commented Sep 16, 2021

I am also thinking the same thing, but ITSxpress log file only gives the temporary fasta instead of the actual sample. it's probably just 1 samples throwing it off, but I'm not sure how to fix it. Either run BBmerge on individual samples, or possibly implement the newest version of vsearch that outputs an empty result file instead of an error.

I actually opened up an issue here (USDA-ARS-GBRU/itsxpress#22) that includes the tail of my log file.

I am also including the full debug output using the --verbose tag.

its.debug.txt

@brittonstrickland
Copy link

I downloaded BBmerge and ran independently. I am running this analysis on some nasal swab samples that have a small bioload, so I initially implemented a very small read cutoff. However, even with the 500 read cutoff, I'm seeing several samples that have 100% ambiguity and no overlapping reads (i.e. see output below).

`/Users/stricba1/bbmap//calcmem.sh: line 75: [: -v: unary operator expected
java -ea -Xmx1000m -Xms1000m -Djava.library.path=/Users/stricba1/bbmap/jni/ -cp /Users/stricba1/bbmap/current/ jgi.BBMerge in=/Volumes/ExtremeSSD/ITS/6224_qiime/fasta/6224-MS-3-45_S45_L001_R1_001.fastq.gz in2=/Volumes/ExtremeSSD/ITS/6224_qiime/fasta/6224-MS-3-45_S45_L001_R2_001.fastq.gz
Executing jgi.BBMerge [in=/Volumes/ExtremeSSD/ITS/6224_qiime/fasta/6224-MS-3-45_S45_L001_R1_001.fastq.gz, in2=/Volumes/ExtremeSSD/ITS/6224_qiime/fasta/6224-MS-3-45_S45_L001_R2_001.fastq.gz]
Version 38.92

[W::bgzf_read_block] EOF marker is absent. The input is probably truncated
[W::bgzf_read_block] EOF marker is absent. The input is probably truncated
Total time: 0.244 seconds.

Pairs: 1530
Joined: 0 0.000%
Ambiguous: 1530 100.000%
No Solution: 0 0.000%
Too Short: 0 0.000%

Avg Insert: NaN
Standard Deviation: 0.0
Mode: 0

Insert range: 999999999 - 0
90th percentile: 0
75th percentile: 0
50th percentile: 0
25th percentile: 0
10th percentile: 0
/Users/stricba1/bbmap//calcmem.sh: line 75: [: -v: unary operator expected
java -ea -Xmx1000m -Xms1000m -Djava.library.path=/Users/stricba1/bbmap/jni/ -cp /Users/stricba1/bbmap/current/ jgi.BBMerge in=/Volumes/ExtremeSSD/ITS/6224_qiime/fasta/6224-MS-4-73_S73_L001_R1_001.fastq.gz in2=/Volumes/ExtremeSSD/ITS/6224_qiime/fasta/6224-MS-4-73_S73_L001_R2_001.fastq.gz
Executing jgi.BBMerge [in=/Volumes/ExtremeSSD/ITS/6224_qiime/fasta/6224-MS-4-73_S73_L001_R1_001.fastq.gz, in2=/Volumes/ExtremeSSD/ITS/6224_qiime/fasta/6224-MS-4-73_S73_L001_R2_001.fastq.gz]
Version 38.92

[W::bgzf_read_block] EOF marker is absent. The input is probably truncated
[W::bgzf_read_block] EOF marker is absent. The input is probably truncated
Total time: 0.519 seconds.

Pairs: 1619
Joined: 1196 73.873%
Ambiguous: 331 20.445%
No Solution: 92 5.683%
Too Short: 0 0.000%

Avg Insert: 245.3
Standard Deviation: 85.4
Mode: 196

Insert range: 122 - 482
90th percentile: 398
75th percentile: 319
50th percentile: 196
25th percentile: 196
10th percentile: 196`

I have run ITS analysis in the past where I have run through DADA2 and not through ITSxpress. However, these unpaired sequences will need to be omitted due to sequencing quality. I'm in the process of manually removing these reads with ambiguity near the pair, I will report back with results.

@brittonstrickland
Copy link

I was able to run BBMerge outside of qiime2 and found that several of my samples did not pair. The new version of BBMerge seems to only send a warning, not an error when it runs into this issue. This way it will not break the pipeline.

@arivers
Copy link
Member Author

arivers commented Sep 21, 2021

Hi @brittonstrickland if you are willing to share your data, I would like to see if I can replicate the problem and make the application more robust to this kind of error.

Thanks,

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants