HybPiper version 2.2.0
chrisjackson-pellicle
released this
17 Jul 03:18
·
73 commits
to master
since this release
- Add option
--end_with
to commandhybpiper assemble
. Allows the user to end the assembly pipeline at a chosen step (map_reads, distribute_reads, assemble_reads, exonerate_contigs). - Add option
--exonerate_skip_hits_with_frameshifts
to commandhybpiper assemble
. If provided, skip Exonerate hits where the SPAdes contig contains frameshifts when considering hits for assembly of an*.FNA
sequence. Default behaviour in HybPiper v2.2.0 is to include these hits; previous versions allowed them automatically. - Add option
--exonerate_skip_hits_with_internal_stop_codons
to commandhybpiper assemble
. If provided, skip Exonerate hits where the SPAdes contig contains internal in-frame stop codon(s) when considering hits for assembly of an*.FNA
sequence. A single terminal stop codon is allowed. Default behaviour in HybPiper v2.2.0 is to include these hits; previous versions allowed them automatically. - Add option
--exonerate_skip_hits_with_terminal_stop_codons
to commandhybpiper assemble
. If provided, skip Exonerate hits where the SPAdes sequence contains a single terminal stop codon. Only applies when option--exonerate_skip_hits_with_internal_stop_codons
is also provided. Only use this flag if your target file exclusively contains protein-coding genes with no stop codons included, and you would like to prevent any in-frame stop codons in the output sequences. Default behaviour in HybPiper v2.2.0 is to include these hits; previous versions allowed them automatically. - Add option
--chimeric_stitched_contig_check
to commandhybpiper assemble
. If provided, HybPiper will attempt to determine whether a stitched contig is a potential chimera of contigs from multiple paralogs. Default behaviour in HybPiper v2.2.0 is to skip this check; previous versions performed the check automatically. Skipping this check speeds up the final 'exonerate_contigs' step of the pipeline, significantly. - Add option
--no_pad_stitched_contig_gaps_with_n
to commandhybpiper assemble
. If provided, when constructing stitched contigs, do not pad any gaps between hits (with respect to the "best" protein reference) with a number of Ns corresponding to the reference gap multiplied by 3. Default behaviour in HybPiper v2.2.0 is to pad gaps with Ns; previous versions did this automatically. - Add option
--skip_targetfile_checks
to commandhybpiper assemble
. Skip the target file checks. Can be used if you are confident that your target file has no issues (e.g. if you have previously runhybpiper check_targetfile
). - Add option
--no_spades_eta
to commandhybpiper assemble
. When SPAdes is run concurrently using GNU parallel, the "--eta" flag can result in many "sh: /dev/tty: Device not configured" errors written to stderr. Using this option removes the "--eta" flag to GNU parallel, silencing both ETA output and the error message. - Fixed a bug in
exonerate_hits.py
that could (rarely) result in a duplicated region in the output*.FNA
sequence. - Fixed a bug in
exonerate_hits.py
that occurred when more than two Exonerate hits had identical query ranges and similarity scores; this could result in a sequence not being returned for the given gene. - Added
tests
folder containing initial unit tests. Some tests require python packagepyfakefs
to run. - Refactor of the hybpiper package. New module
hybpiper_main.py
with entry point (moved fromassemble.py
), and someassemble.py
functions moved toutils.py
. Target file checking functionality has been consolidated. - HybPiper now logs to
stdout
rather thanstderr
. - Commands
hybpiper check_targetfile
andhybpiper assemble
now write a report file when checking the target file (check_targetfile_report-<target file name>.txt
), rather than logging details to the main sample log. Commandhybpiper check_targefile
writes the report to the current working directory, whereas commandhybpiper assemble
writes it to the sample directory. - If the option
--cpu
is not specified forhybpiper assemble
, HybPiper will now use all available CPUs minus one, rather than all available CPUs. - Command
hybpiper assemble
now checks for output from previous runs for the pipeline steps selected via--start_from
and--end_with
(default is to select all steps). If previous output is found, HybPiper will exit with an error unless the option--force_overwrite
is provided. - Corrected the reading frame of sequence
Artocarpus-gene660
in the test dataset target file. - Command
hybpiper assemble
now writes the file<prefix>_chimera_check_performed.txt
to the sample directory. This is a text file containing 'True' or 'False' depending on whether the option--skip_chimeric_genes
was provided to commandhybpiper assemble
. Used byhybpiper retrieve_sequences
andhybpiper paralog_retriever
.