Inversion patching and mashmap3 index saving
Buildable source tarball: wfmash-v0.16.0.tar.gz
The primary enhancement in this release is the implementation of inversion detection during the alignment patching process. This feature significantly improves the alignment accuracy for sequences containing inversions.
How it works:
-
Patching Process: During the wflign high-level trace patching, the algorithm identifies regions that do not align well in the forward orientation.
-
Reverse Complement Alignment: For these poorly aligned regions, the algorithm attempts an alignment with the reverse complement of the sequence.
-
Score Comparison: The algorithm compares the alignment scores of the forward and reverse complement alignments.
-
Selection: If the reverse complement alignment produces a better score, it is selected for that region.
-
Output: Reverse complement alignments are reported with an additional SAM tag
rc:Z:true
.
Key Components:
- New parameter
wflign_min_inv_patch_len
: Sets the minimum length of an inverted patch to be considered (default: 23). calculate_alignment_score
function: Computes alignment scores based on the CIGAR string and penalties.- Modified
do_wfa_patch_alignment
function: Now handles both forward and reverse complement alignments. - Updated
write_merged_alignment
function: Processes and outputs reverse complement alignments.
This feature allows wfmash to accurately align sequences with inversions, improving its utility for complex genomic comparisons.
Other Significant Changes
-
MashMap Index Support:
- Implemented creation and usage of MashMap indexes for faster repeat mapping.
- New CLI options:
--mm-index
,--create-index-only
,--overwrite-mm-index
.
-
Memory Optimization:
- Improved memory usage in the
Sketch
class.
- Improved memory usage in the
-
Kmer Size Calculation:
- Modified to handle edge cases with high-identity alignments.
-
Alignment Class Improvements:
- Enhanced
alignment_t
class with proper copy and move semantics.
- Enhanced
-
Index File Handling:
- Improved reading and writing processes with parameter validation.
Detailed Log of Changes
src/align/include/align_parameters.hpp
- Added
wflign_min_inv_patch_len
parameter toParameters
struct.
src/align/include/computeAlignments.hpp
- Integrated
wflign_min_inv_patch_len
intoWFlign
constructor call.
src/common/wflign/src/wflign.cpp and wflign.hpp
- Added
min_inversion_length
toWFlign
constructor and member variables. - Modified
minhash_kmer_size
calculation for edge cases.
src/common/wflign/src/wflign_alignment.cpp and wflign_alignment.hpp
- Implemented copy/move constructors and assignment operators for
alignment_t
. - Added
calculate_alignment_score
function.
src/common/wflign/src/wflign_patch.cpp and wflign_patch.hpp
- Modified
do_wfa_patch_alignment
for reverse complement handling. - Updated
write_merged_alignment
for reverse complement output. - Refined patching process for bidirectional alignment consideration.
src/interface/parse_args.hpp
- Added CLI options for MashMap indexing and
wflign_min_inv_patch_len
.
src/map/include/map_parameters.hpp
- Added parameters for MashMap indexing support.
src/map/include/parseCmdArgs.hpp
- Updated parsing for new MashMap indexing options.
src/map/include/winSketch.hpp
- Implemented MashMap index functions (create, read, write).
- Added CLI-index file parameter validation.
- Optimized
Sketch
class memory usage.