Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible/appropriate to determine the WGD events from the genome coverages? #9

Open
zengxiaofei opened this issue Oct 12, 2016 · 2 comments

Comments

@zengxiaofei
Copy link
Contributor

zengxiaofei commented Oct 12, 2016

In many whole genome de novo sequencing project, the scaffolds and contigs are not assembled into pseudomolecules. As a result, it's very difficult to determine exact ratio between subject genome and reference genome from a synteny map.

According to your description in README.rst, quota_align.py can calculate the genome coverages from a specified ratio, and the coverage will be two low if a wrong ratio is specified. So my question is, is it possible or appropriate to determine the exact ratio between two genomes?

Here are two real examples

Example 1:

sp1: the species we studied (unknown)
Cca: Coffea canephora (no WGD event after γ)
Vvi: Vitis vinifera (no WGD event after γ)
Sly: Solanum lycopersicum (genome triplicated after γ)

sp1 vs Cca

--quota genome X coverage (sp1) genome Y coverage (Cca)
1:1 55.6% 95.6%
2:1 85.4% 97.0%
3:1 95.6% 96.9%
4:1 95.7% 96.9%
6:1 95.7% 96.9%

sp1 vs Vvi

--quota genome X coverage (sp1) genome Y coverage (Vvi)
1:1 58.7% 93.3%
2:1 84.4% 94.9%
3:1 93.3% 94.5%
4:1 93.5% 94.2%
6:1 93.5% 94.2%

sp1 vs Sly

--quota genome X coverage (sp1) genome Y coverage (Vvi)
1:3 72.5% 95.5%
2:3 92.6% 99.5%
3:3 99.6% 99.6%

Question:

Can I infer that sp1 genome underwent a whole genome triplication after γ?

Example 2:

sp2: the species we studied (unknown)
Cca: Coffea canephora (no WGD event after γ)

sp2 vs Cca

--quota genome X coverage (sp2) genome Y coverage (Cca)
1:1 37.5% 95.0%
2:1 60.5% 97.1%
3:1 74.3% 95.9%
4:1 83.4% 95.9%
6:1 89.6% 95.6%
8:1 90.3% 95.5%

Question:

Can I infer that sp2 genome underwent a round of whole genome triplication and a round of whole genome duplication (3 * 2 = 6) after γ?

I examined this method in Arabidopsis vs grape, Arabidopsis vs Brassica rapa and poplar vs peach. It seemed to work well.

Thanks for your attention!
Xiaofei Zeng

@tanghaibao
Copy link
Owner

@zengxiaofei I have been struggling with finding an objective method to call WGD ploidies over the past few years. The method you described might work although the cutoff (beyond which the coverage saturates) is a bit difficult to call sometimes. What BLAST filtering option did you use? I would often like to filter the results to only reciprocal best hits (blast_to_raw.py, use something like --cscore=.99) for these types of analyses.

@zengxiaofei
Copy link
Contributor Author

zengxiaofei commented Oct 13, 2016

@tanghaibao Thank you for your reply!

First of all, please forgive me for deleting the figures and modifying the spcies names in this issue.
I used --score=.5 for these analyses yesterday.
And I also tried --score=.99. Here are the results:

Example 1:

sp1 vs Cca

--quota genome X coverage (sp1) genome Y coverage (Cca)
1:1 58.4% 94.3%
2:1 87.7% 96.9%
3:1 95.4% 97.2%
4:1 95.4% 97.2%
6:1 95.4% 97.2%

sp1 vs Vvi

--quota genome X coverage (sp1) genome Y coverage (Vvi)
1:1 64.4% 92.7%
2:1 90.7% 94.7%
3:1 93.6% 94.7%
4:1 93.6% 94.6%
6:1 93.6% 94.6%

Does it make the guess 3:1 more reliable?

Example 2:

sp2 vs Cca

--quota genome X coverage (sp2) genome Y coverage (Cca)
1:1 47.9% 93.2%
2:1 73.5% 94.7%
3:1 85.5% 93.8%
4:1 89.1% 94.0%
6:1 89.8% 93.8%
8:1 89.8% 93.8%

Can I still infer 6:1?
It became difficult to distinguish 4:1 and 6:1 while the actual ratio is too high.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants