Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Phasing of bubbles for a primate #297

Closed
fergsc opened this issue Oct 25, 2024 · 2 comments
Closed

Phasing of bubbles for a primate #297

fergsc opened this issue Oct 25, 2024 · 2 comments

Comments

@fergsc
Copy link

fergsc commented Oct 25, 2024

Hi,
I am assembling a collection of primate genomes and have run into what looks to be a phasing issue, perhaps similar to #245.

My assemblies use HiFi, ONT, and Hi-C and my first has run to completion. When I view unitigs.hpc.noseq.gfa I get many regions that look like the figure below. I have loaded hicverkko.colors.tsv and coloured the graph as maternal/paternal. The coverage within bubbles looks good, but many are short ~3 Kbp. Given we have 64x HiFi and long ONT, N50 of 81.73Kbp with 50x coverage does this graph look correct? Shouldn't the ONT data resolve the many small ~3 Kbp bubbles?

Thanks.

Screenshot from 2024-10-24 16-59-40
232.zip

@skoren
Copy link
Member

skoren commented Oct 25, 2024

The graph is correct, the issue is not the size of the heterozygous bubble but the homozygous nodes surrounding them. In this case, all the homozygous (2x coverage) nodes are very large, over 100kb in HPC space, mostly over 200kb. No ONT read can tell you how to connect the pairs of the short het nodes across that distance so they remain unphased. I expect if you looked at the Hi-C paths, the assembly should include these nodes with more or less random selection of a haplotype. Given how short and similar length the bubbles are, I think that is the correct assembly strategy as it'd potentially introduce small phasing errors while resolving the chromosomes.

The only large unphased pieces I see are utig4-1101 and utig4-5 but I suspect these are the X and Y, respectively. Again, in verkko v2.2.1 I expect these to have been assigned to the haplotypes in the final output.

@skoren
Copy link
Member

skoren commented Nov 6, 2024

Idle, graph looks OK.

@skoren skoren closed this as completed Nov 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants