Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed bug in tempfile, minor updates #55

Merged
merged 6 commits into from
Sep 23, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 3 additions & 4 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -28,9 +28,8 @@ Author
Citations
---------

Einarsson, SV and Rivers, AR. ITSxpress Version 2: Software to rapidly trim internal
transcribed spacer sequences with quality scores for amplicon sequencing.
Microbiology Spectrum. In press, 2024.
Einarsson, SV and Rivers, AR. ITSxpress Version 2: Software to rapidly trim internal
transcribed spacer sequences with quality scores for amplicon sequencing. Microbiology Spectrum. In press, 2024.

Rivers AR, Weber KC, Gardner TG, Liu, S, Armstrong, SD. ITSxpress: Software to rapidly trim
internally transcribed spacer sequences with quality scores for marker gene
Expand Down Expand Up @@ -71,7 +70,7 @@ Installing ITSxpress for use as a QIIME2 Plugin

To install ITSxpress as a plugin for QIIME 2 first install QIIME 2 as a separate Conda/Mamba environemnt using thier instructions
https://docs.qiime2.org/2024.5/install/ then add ITSxress to the QIIME 2 Conda environment. The examples below are for QIIME2 2
version 2024.5 an so please update the commands if you want a newer release.
version 2024.2 an so please update the commands if you want a newer release.


For Linux:
Expand Down
75 changes: 41 additions & 34 deletions changelog.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,12 @@
2.1.1 (2024-9-19)
------------------
# 2.1.2 (2024-9-23)

- Fixed bug [Issue 54](https://github.com/USDA-ARS-GBRU/itsxpress/issues/54) that casued `--tempdir` input to be ignored in the ITSxpress CLI and updated q2_itsxpress.py and test code for this change.
- updates to Dockerfile
- updates to changelog formatting and readme


# 2.1.1 (2024-9-19)

- Changed settings to allow FASTQ quality scores up to 93. This prevents an error in processing some reads from PacBio, Nanopore and Element Biosciences AVITI sequencers with quality scores over 41.
- Updated from manual software versioning to automated versioning based on git tag metadata and `setuptools-scm`.
- Updated Github action workflow for QIIME 2024.5 and Python 3.9.19
Expand All @@ -9,8 +16,8 @@
- Cleaned up some PEP formatting


2.1.0 (2024-4-10)
------------------
# 2.1.0 (2024-4-10)

- HMMs are updated to version 2 of the HMM database curated by Henrik Nilson at the University of Gothenburg.
-Version 2 (see https://github.com/USDA-ARS-GBRU/ITS_HMMs)
-5 April 2024
Expand All @@ -21,22 +28,22 @@
- Added option of Y.hmm to ITSxpress standalone and Qiime2 plugin
- Added documentation for Apple Silicon chip support.

2.0.2 (2024-3-20)
------------------
# 2.0.2 (2024-3-20)

- Fixed a bug where the 3' end of the ITS region was not being trimmed from both forward and reverse reads if the read extended past the ITS region. This was due to the trimming being done at the start of both forward and reverse reads and not the end of each read. Thus if the read overlaped the opposite end of the ITS read, part of the conserved region would still be found on the ends of the forward and reverse read. This was fixed by trimming to just the ITS region for both forward and reverse reads. This bug did not affect the results of ASV calling with Dada2 becasue Dada2 ignored sequecne beyond the ITS region. This fix will make the output more consistent with expectation.

- Fixed a bug for submodule logging, where submodules were not logging to the main log file. This was fixed by passing the log file to the submodules and having them write to the same log file. This issue was introduced in version 2.0.0.

- Added unit test to confirm that the 3' end of the ITS region is being trimmed from both forward and reverse reads.

2.0.1 (2023-11-07)
------------------
# 2.0.1 (2023-11-07)

Fix single-end logic bug, which looked for a reverse read file even if single-end reads were provided because the single_end flag wasn't indicated by user.

Fix unit test bug, which was failing because the test data was interleaved and the test was expecting single-end reads. This was due to the logic bug mentioned above being fixed.

2.0.0 (2023-06-28)
------------------
# 2.0.0 (2023-06-28)

Release Highlights
- Removed BBmap dependency
- BBmap scripts are no longer used in the pipeline, including:
Expand All @@ -52,8 +59,8 @@ Release Highlights
Bug Fixes
- Fixed bug where the q2-itsxpress plugin was not handling single-end reads correctly, and was looking for a reverse read file

1.8.1 (2023-06-02)
------------------
# 1.8.1 (2023-06-02)

Release Highlights

- This is the final version that uses BBmap scripts. The next version will remove the BBmap dependency and use Vsearch for all steps.
Expand All @@ -66,58 +73,58 @@ Release Highlights
- Added read count output to log file


1.8.0 (2019-12-9)
-----------------
# 1.8.0 (2019-12-9)

- Added support for primer sets in the reverse orientation
- Fixed a bug that could cause crashes when an intermediate file was empty

1.7.2 (2018-11-8)
-----------------
# 1.7.2 (2018-11-8)

- This release fixes issue [#8](https://github.com/USDA-ARS-GBRU/itsxpress/issues/8)
- This issue caused ITSxpress to incorrectly trim about 0.2% of read pairs. Sometimes this would result in it writing blank fastq records which would cause Qiime to detect an error and stop processing.

1.7.0 (2018-09-12)
------------------
# 1.7.0 (2018-09-12)

New Features:

- Support for the output of unmerged paired end files. This allows users to use Dada2 for sequence variant calling.
- The API is now documented at ReadTheDocs

1.6.4 (2018-7-26)
-----------------
# 1.6.4 (2018-7-26)

- Fix for issue validating fastq.gz files that was not solved by v1.6.3

1.6.3 (2018-7-25)
-----------------
# 1.6.3 (2018-7-25)

- Fixed issue validating fastq.gz and added tests.

1.6.2 (2018-7-25)
-----------------
# 1.6.2 (2018-7-25)

- This release fixes an error that occasionally occurred when validating FASTQ files. ITSxpress used BBtools reformat.sh which occasionally threw an exception when validating FASTQ files due to a race condition. FASTQ file validation is now done with Biopython instead.

1.6.1 (2018-7-19)
-----------------
# 1.6.1 (2018-7-19)

- Changed the default clustering identity to 99.5%.
- Experiments with fungal soil samples showed that ITSxpress and ITSx trimmed 99.822% of reads in the ITS1 region within 2 bases of each other and 99.099% of reads in the ITS2 region within 2 bases of each other at 99.5% identity. For higher accuracy, dereplication can be run at at 100% identity.


1.6.0 (2018-7-13)
-----------------
# 1.6.0 (2018-7-13)

- This release adds a new feature to cluster merged sequences at less than 100% identity. This speeds up typical dataset trimming by about 10x over previous versions depending on the sample, without major effects on trimming accuracy. This feature is controlled with the --cluster_id flag. Default behavior is now to cluster at 0.987 identity.

1.5.6 (2018-7-3)
-----------------
# 1.5.6 (2018-7-3)

- Database is now included in the release

1.5.4 (2018-6-27)
-----------------
# 1.5.4 (2018-6-27)

- Fixed bug in handling of temporary files files specified with the --tempfile flag
- updated readme
- fixed issues with exception handling


1.5.2 (2018-6-21)
-----------------
# 1.5.2 (2018-6-21)

- Fixed an indexing error causing ITS trimming to be off by 1 base.
- Fixed error when raising file not found exception
- removed old readme
5 changes: 2 additions & 3 deletions docker-images/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ FROM condaforge/mambaforge:24.7.1-0

# # Install system dependencies
RUN apt-get update && \
apt-get install -y build-essential && \
apt-get install -y --no-install-recommends build-essential && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*

Expand All @@ -14,8 +14,7 @@ RUN mamba install -c bioconda vsearch=2.22.1 hmmer=3.1b2

# Copy the itsxpress package files and install dependencies
COPY .. /app
WORKDIR /app
RUN git clone https://github.com/USDA-ARS-GBRU/itsxpress.git && cd itsxpress && pip install .
RUN cd /app && git clone https://github.com/USDA-ARS-GBRU/itsxpress.git && cd itsxpress && pip install .

# Set the default command to run itsxpress
CMD ["itsxpress"]
11 changes: 1 addition & 10 deletions itsxpress/Dedup.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
import logging
import gzip
import os
import tempfile
from itertools import tee

import pyzstd as zstd
Expand Down Expand Up @@ -271,7 +270,7 @@ def map_func(record):
return map(map_func, filt)


def create_trimmed_seqs(self, outfile, gzipped,zstd_file, itspos,wri_file, tempdir=None):
def create_trimmed_seqs(self, outfile, gzipped,zstd_file, itspos,wri_file, tempdir):
"""Creates a FASTQ file, optionally gzipped, with the reads trimmed to the
selected region.
Args:
Expand All @@ -281,14 +280,6 @@ def create_trimmed_seqs(self, outfile, gzipped,zstd_file, itspos,wri_file, tempd
itspos (object): an ItsPosition object
wri_file (bool): Should file be written or checked for empty sequences?
"""
if tempdir:
if not os.path.exists(tempdir):
logging.warning("Specified location for tempfile ({}) does not exist, using default location.".format(tempdir))
self.tempdir = tempfile.mkdtemp(prefix='itsxpress_')
else:
self.tempdir = tempfile.mkdtemp(prefix='itsxpress_', dir=tempdir)
else:
self.tempdir = tempfile.mkdtemp(prefix='itsxpress_')

def _write_seqs():
if gzipped:
Expand Down
12 changes: 2 additions & 10 deletions itsxpress/SeqSample.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
import os
import logging
import tempfile
import subprocess

logger = logging.getLogger(__name__)
Expand All @@ -16,15 +15,8 @@ class SeqSample:
"""


def __init__(self, fastq, tempdir=None):
if tempdir:
if not os.path.exists(tempdir):
logging.warning("Specified location for tempfile ({}) does not exist, using default location.".format(tempdir))
self.tempdir = tempfile.mkdtemp(prefix='itsxpress_')
else:
self.tempdir = tempfile.mkdtemp(prefix='itsxpress_', dir=tempdir)
else:
self.tempdir = tempfile.mkdtemp(prefix='itsxpress_')
def __init__(self, fastq, tempdir):
self.tempdir = tempdir
self.fastq = fastq
self.uc_file = None
self.rep_file = None
Expand Down
2 changes: 1 addition & 1 deletion itsxpress/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
except ModuleNotFoundError as e:
#logging
print("{}.Could not initialize the Qiime plugin portion of ITSxpress. Command line ITSxpress will still work normally. If you wish to use the Qiime2 ITSxpress plugin, you need to install Qiime2 first into your environment.\n".format(e))
pass


__all__ = ["main", "definitions"]

Expand Down
50 changes: 45 additions & 5 deletions itsxpress/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@
import os
import shutil
import math
import tempfile
from itertools import tee

from numpy import empty
Expand Down Expand Up @@ -233,6 +234,46 @@ def core(file):
reads = core(file2)
logging.info("Total number of reads in file {} is {}.".format(file2, reads))

def create_temp_directory(tempdir_arg=None):
"""
Creates a temporary directory at a user-defined location or at the default location.
The directory name is prefixed with 'itsxpress_'.

Ensures no file with the same name exists before creating the directory.

Parameters:
- tempdir_arg (str): Path to a directory provided by the user. If None,
a new temporary directory is created at the default location.

Returns:
- str: Path to the temporary directory, or None if there was an error.
"""
try:
if tempdir_arg:
if not os.path.exists(tempdir_arg):
os.makedirs(tempdir_arg)
logging.info(f"Directory '{tempdir_arg}' has been created.")
else:
if os.path.isfile(tempdir_arg):
logging.error(f"A file with the same name '{tempdir_arg}' already exists. Cannot create directory.")
return None
logging.info(f"Directory '{tempdir_arg}' already exists.")

# Try creating a unique temporary directory inside user-specified directory
temp_dir = tempfile.mkdtemp(prefix="itsxpress_", dir=tempdir_arg)
logging.info(f"Temporary directory '{temp_dir}' has been created at the user-defined location.")
else:
# Create a temporary directory at default location
temp_dir = tempfile.mkdtemp(prefix="itsxpress_")
logging.info(f"Temporary directory '{temp_dir}' has been created at the default location.")

return temp_dir

except Exception as e:
logging.error(f"Failed to create temporary directory: {e}")
return None


def main(args=None):
"""Run Complete ITS trimming workflow.
"""
Expand All @@ -248,14 +289,14 @@ def main(args=None):
_check_fastqs(args.fastq, args.fastq2)
# Parse input types
paired_end = _is_paired(args.fastq, args.fastq2, args.single_end)
session_tempdir = create_temp_directory(tempdir_arg=args.tempdir)
if paired_end:
logging.info("Sequences are paired-end in two files. They will be merged using Vsearch.")
sobj = SeqSamplePairedNotInterleaved(fastq=args.fastq, fastq2=args.fastq2, tempdir=args.tempdir, reversed_primers=args.reversed_primers)
sobj = SeqSamplePairedNotInterleaved(fastq=args.fastq, fastq2=args.fastq2, tempdir=session_tempdir, reversed_primers=args.reversed_primers)
sobj._merge_reads(threads=str(args.threads), stagger=args.allow_staggered_reads)
elif not paired_end:
logging.info("Sequences are assumed to be single-end.")
sobj = SeqSampleNotPaired(fastq=args.fastq, tempdir=args.tempdir)
logging.info("Temporary directory is: {}".format(sobj.tempdir))
sobj = SeqSampleNotPaired(fastq=args.fastq, tempdir=session_tempdir)
# Deduplicate
logging.info("Unique sequences are being written to a temporary FASTA file with Vsearch.")
if math.isclose(args.cluster_id, 1, rel_tol=1e-05):
Expand Down Expand Up @@ -320,12 +361,11 @@ def main(args=None):
finally:
try:
if not args.keeptemp:
shutil.rmtree(sobj.tempdir)
shutil.rmtree(session_tempdir)
except UnboundLocalError:
pass
except AttributeError:
pass


if __name__ == '__main__':
main()
Loading