Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

confusion on the results of hapmers.sh #147

Open
leon945945 opened this issue Nov 26, 2024 · 5 comments
Open

confusion on the results of hapmers.sh #147

leon945945 opened this issue Nov 26, 2024 · 5 comments

Comments

@leon945945
Copy link

Following the advice of marbl/meryl#53 (comment), I tried hapmers.sh of merqury, and I got the following results:

KmerType FolderSize
maternal.inherited.meryl/ 3.5Gb
paternal.inherited.meryl/ 2.8Gb
shrd.inherited.meryl/ 1.1Gb
read.only.meryl/ 14Gb

According to the folder size of results, the read-only-kmers account a large proportion of progeny's kmers. Which means the origin of these DNA sequences are unknown.
In my opinion, the DNA of progeny should be inherited from mat or pat, why there exists a large proportion of unknown DNA sequences, and what is the probable reason?
Very hope for your reply. Thanks very much.

@arangrhie
Copy link
Contributor

Hi @leon945945 ,

glad you are trying the hapmer script! Do you have the plot handy? Or the final .hist file?

@leon945945
Copy link
Author

Hi @arangrhie ,

I get the final plot, I'm not clear with each filled region in the plot, and I want to understand why the read-only-kmers account for a large proportion. Looking forward to your reply.
inherited_hapmers fl v2

@leon945945
Copy link
Author

Curiously, how to understand the peak of the read-only-kmer ?

@arangrhie
Copy link
Contributor

arangrhie commented Nov 28, 2024

That means the child read set has some kmers not found in either parent. The high peak in 1-copy kmer of the child-only kmers (at ~13x) is observed in cases where the parents are not the biological parents. Could be sample mis labeling. Or sequencing platform bias. Are the parents coming from illumina, the child from hifi kmers? If so, can you try again with k=31?

@leon945945
Copy link
Author

Hi @arangrhie ,

I tried K=31, and the peak of the read-only-kmer becomes lower, and seems half of the previous result. If I want to drop error kmers, the kmer count below what number could be removed and use which command? Thanks very much.

By the way, how the very high kmer count was generated, like kmer count over hundreds or thousands. Are these kmers errors?
inherited_hapmers fl

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants