Skip to content

Inversion patching and mashmap3 index saving

Compare
Choose a tag to compare
@ekg ekg released this 02 Jul 08:22
· 967 commits to main since this release
2243583

Buildable source tarball: wfmash-v0.16.0.tar.gz

The primary enhancement in this release is the implementation of inversion detection during the alignment patching process. This feature significantly improves the alignment accuracy for sequences containing inversions.

How it works:

  1. Patching Process: During the wflign high-level trace patching, the algorithm identifies regions that do not align well in the forward orientation.

  2. Reverse Complement Alignment: For these poorly aligned regions, the algorithm attempts an alignment with the reverse complement of the sequence.

  3. Score Comparison: The algorithm compares the alignment scores of the forward and reverse complement alignments.

  4. Selection: If the reverse complement alignment produces a better score, it is selected for that region.

  5. Output: Reverse complement alignments are reported with an additional SAM tag rc:Z:true.

Key Components:

  • New parameter wflign_min_inv_patch_len: Sets the minimum length of an inverted patch to be considered (default: 23).
  • calculate_alignment_score function: Computes alignment scores based on the CIGAR string and penalties.
  • Modified do_wfa_patch_alignment function: Now handles both forward and reverse complement alignments.
  • Updated write_merged_alignment function: Processes and outputs reverse complement alignments.

This feature allows wfmash to accurately align sequences with inversions, improving its utility for complex genomic comparisons.

Other Significant Changes

  1. MashMap Index Support:

    • Implemented creation and usage of MashMap indexes for faster repeat mapping.
    • New CLI options: --mm-index, --create-index-only, --overwrite-mm-index.
  2. Memory Optimization:

    • Improved memory usage in the Sketch class.
  3. Kmer Size Calculation:

    • Modified to handle edge cases with high-identity alignments.
  4. Alignment Class Improvements:

    • Enhanced alignment_t class with proper copy and move semantics.
  5. Index File Handling:

    • Improved reading and writing processes with parameter validation.

Detailed Log of Changes

src/align/include/align_parameters.hpp

  • Added wflign_min_inv_patch_len parameter to Parameters struct.

src/align/include/computeAlignments.hpp

  • Integrated wflign_min_inv_patch_len into WFlign constructor call.

src/common/wflign/src/wflign.cpp and wflign.hpp

  • Added min_inversion_length to WFlign constructor and member variables.
  • Modified minhash_kmer_size calculation for edge cases.

src/common/wflign/src/wflign_alignment.cpp and wflign_alignment.hpp

  • Implemented copy/move constructors and assignment operators for alignment_t.
  • Added calculate_alignment_score function.

src/common/wflign/src/wflign_patch.cpp and wflign_patch.hpp

  • Modified do_wfa_patch_alignment for reverse complement handling.
  • Updated write_merged_alignment for reverse complement output.
  • Refined patching process for bidirectional alignment consideration.

src/interface/parse_args.hpp

  • Added CLI options for MashMap indexing and wflign_min_inv_patch_len.

src/map/include/map_parameters.hpp

  • Added parameters for MashMap indexing support.

src/map/include/parseCmdArgs.hpp

  • Updated parsing for new MashMap indexing options.

src/map/include/winSketch.hpp

  • Implemented MashMap index functions (create, read, write).
  • Added CLI-index file parameter validation.
  • Optimized Sketch class memory usage.