Releases · edgardomortiz/Captus

23 Dec 17:49

edgardomortiz

v1.1.0

1e3c44e

Captus v1.1.0 Latest

Latest

New in the assemble module:

Contig depth of coverage is now calculated by mapping the reads back to the contigs using Salmon right after the assembly with MEGAHIT. This is now the default behavior unless --disable_mapping is enabled.
The assembly is then automatically filtered by depth of contig, if --disable_mapping is used then only contigs with depth of coverage >1x are retained, otherwise contigs with depth of coverage >=1.5x are retained. The filtering threshold can be changed with --min_contig_depth.
To replicate the behavior of previous versions use --disable_mapping and --min_contig_depth 0.
The filtering can be repeated with --redo_filtering, without the need to reassemble, to try different values for --max_contig_gc and --min_contig_depth.
The assembly HTML report has been completely rewritten to reflect these changes.

New in the extract module:

Options --nuc_depth_tolerance, --ptd_depth_tolerance, --mit_depth_tolerance, and --dna_depth_tolerance allow to filter contigs by depth of coverage during locus extraction. Among the contigs with hits to a particular marker type (e.g., nuclear), the median of the depths of coverage is calculated and this tolerance factor is used to determine the minimum (median / tolerance) and maximum (median * tolerance) depth allowed. The depth of coverage is taken from the contig names when they contain the pattern _cov_X.XX_.
To replicate the behavior of previous versions use --ignore_depth.
Added option --disable_stitching. By default, Captus recover a locus across multiple contigs, this option forces the recovery of a locus in a single contig (for example when providing chromosome-level genome assemblies).

Other improvements or additions:

The accessory script filter_most_common_target_per_locus.py creates a new reference target file with only the most common target per locus found during the extraction step. This new reference target set can be used to re-extract the loci and potentially improve the informed paralog filtering.
All the reports have been updated to include the version and command of Captus used.
Updated installation instructions and documentation.
Some long output filenames have been shortened.

Assets 2

02 Mar 19:48

edgardomortiz

v1.0.1

da21567

Captus v1.0.1

During assembly of hits when extracting a miscellaneous DNA reference target, the delta in identity percentage between two hits to be considered compatible has been reduced from 5% to 3.33%, initial test indicate slight improvement in recovery.
In some edge cases, when translating a CDS reference target set, the same nucleotide sequence can produce perfectly translated protein in more than a single reading frame, we give now priority to positive reading frames in case of a tie.
Latest pandas versions introduced breaking changes, we provide a fix.
When creating a new miscellaneous DNA reference from clustering, each target sequence in a reference locus can have different strands. We add a method to uniformize the strand per reference locus.
Added an option to the align step to --only_collect the extracted markers and exit afterwards (requested by Diego Morales)
Fixed multiple small bugs.

Assets 2

21 Nov 10:57

edgardomortiz

v1.0.0

706b9e3

Captus v1.0.0

Additional improvements to captusd bait: added options --min_expected_tiling and --remove_ambiguous_loci for the creation of baitsets and their corresponding reference target files.

Assets 2

14 Nov 15:47

edgardomortiz

v0.9.99

8b58e37

Captus v0.9.99 Pre-release

Pre-release

Now any BUSCO lineage database can be used as reference target file, just download a .tar.gz from https://busco-data.ezlab.org/v5/data/lineages/ and provide the file path for Captus extraction
Added shortcut for captus_assembly as simply captus (data assembly)
Added entry point for captus_design and a shortcut as captusd (bait design)
The cluster step of bait design now reports mean number of copies per locus instead of just classifying it as single- or multi-copy
Added a function to create a reference target file (for locus extraction) after bait clustering and tiling
Code cleanup and minor cosmetic changes

Assets 2

30 Oct 11:03

edgardomortiz

v0.9.98

9c6e6d4

Captus v0.9.98 Pre-release

Pre-release

Fixed potential problem with recognition of _R1. or _R1_ patterns in filenames
Support for FastQC v0.12.1 update (s-andrews/FastQC@fbd9cf5)
Speed up QC step during cleaning step
If the user provides a clustering threshold with --cl_min_identity then the miscellaneous DNA extraction is performed using the same identity.
Allow decimals in maximum average number of copies in a cluster via --cl_max_copies
Minor cosmetic improvements

Assets 2

02 Sep 13:56

edgardomortiz

v0.9.97

ebca51d

Captus v0.9.97 Pre-release

Pre-release

Fixed a bug in the extraction report happening when the extraction statistics tables are not sorted. This bug doesn't affect the output at all, just the report heatmap.

Assets 2

25 Aug 14:57

edgardomortiz

v0.9.96

3f598b1

Captus v0.9.96 Pre-release

Pre-release

Fixed indentation bugs that prevented Falco or FastQC from running during the clean step and the subsampling of reads during the assemble step
Secret feature, coding genes databases can also be extracted as nucleotide
Code cleanup and minor fixes

Assets 2

18 Aug 14:52

edgardomortiz

v0.9.95

037b4ce

Captus v0.9.95 Pre-release

Pre-release

Updated perl dependencies, now the latest bioperl and yaml can be used by Scipio
Improved Scipio parallelization, assemblies sorted by size in decreasing order before processing
Reduce maxIntron search for Scipio to 50000bp (previous settings took too long and created unlikely gene models when chromosome-level assemblies are analyzed)
Code cleanup and multiple cosmetic changes