Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add translation of alignment #78

Draft
wants to merge 29 commits into
base: main
Choose a base branch
from
Draft

Conversation

gordonkoehn
Copy link
Collaborator

This PR tackles the long dreaded issue of skipping the realignment when using nextclade for translation.

@gordonkoehn gordonkoehn added the enhancement New feature or request label Jan 10, 2025
@gordonkoehn
Copy link
Collaborator Author

It appears nextclade has two functions that do the alignment. Within the nextclade/run/nextclade_run_one.rs there is a function called align_nuc() which does the alignment via Smith-Waterman. It returns the qry_seq aligned qry_seq so has inserted Gaps into it, but astonishingly leaves the inserts in. Then the next function comes along called insertions_strip which returns qry_seq and insertions which is the main output nextclade works on with. Here the insertions_strip now holds the cleartext sequence with Gaps inserted to match the reference, and insertions stopped. insertions_strip does this by comparing the output of the aligment qry_seq and the reference.

Bottom line: I can indeed just input insertions_strip output <StripInsertionsResult> and run nextclade from there. In order to do that, I need to get clear text reads, with insretions stripped and the reference.

This sounds doable.

@gordonkoehn gordonkoehn self-assigned this Jan 10, 2025
@gordonkoehn
Copy link
Collaborator Author

gordonkoehn commented Jan 16, 2025

sr2silo side is read. I can now produce a cleartext file with reads aligned to the reference with gaps and padding such that the string has the same length as the actual reference and also have the inserts with position put out.

This is what I have to feed to nextclade.

@gordonkoehn gordonkoehn linked an issue Jan 16, 2025 that may be closed by this pull request
@gordonkoehn
Copy link
Collaborator Author

Hurray - half way there !!

@gordonkoehn
Copy link
Collaborator Author

Translation works now. Next update the Python side to do the translation and create the ndjson.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Translate & Align V-Pipe Reads
1 participant