-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to improve the BUSCO score of the resulted predictions? #19
Comments
Hi Bob, BRAKER2 is intended to be used with a large protein database. In your case, 3 species is probably not enough. I would suggest that you download a large database of related species from OrthoDB (e.g. the phylum of your species) and add your 3 closely related species to them. With this database, I would run BRAKER2 again (with ProtHint) and combine the result with your BRAKER1 run. I would discard the GenomeThreader run altogether. If the result still has too many genes, I would try to increase the 'intron_support' parameter in the TSEBRA config file (e.g. to 0.8, 0.9, or 1.0). I hope this helps. |
Hi Lars,
Many thanks for your timely reply. I have tried the braker2 as you suggested. I downloaded more than 4,500,000 vertebrata pep sequences from OrthoDB and combined my 3 closely related species to them. In total 4,832,878 pep sequences were used to run braker2. But the BUSCO score had only improved a little as follows:
C:64.2%[S:58.4%,D:5.8%],F:14.5%,M:21.3%,n:2586
The commond I used are as follows:
braker.pl --species=Pcar --genome=Pcar.genome.fa.maskered --prot_seq=all.orthodb.pep.1.fa --softmasking --cores=48 --nocleanup --gff3 --workingdir=braker2_out --epmode --useexisting
Could you give me some more advice to improve it?
Thanks a lot,
Best,
Bob
…------------------ 原始邮件 ------------------
发件人: "Gaius-Augustus/TSEBRA" ***@***.***>;
发送时间: 2022年6月27日(星期一) 下午3:31
***@***.***>;
***@***.******@***.***>;
主题: Re: [Gaius-Augustus/TSEBRA] How to improve the BUSCO score of the resulted predictions? (Issue #19)
Hi Bob,
BRAKER2 is intended to be used with a large protein database. In your case, 3 species is probably not enough. I would suggest that you download a large database of related species from OrthoDB (e.g. the phylum of your species) and add your 3 closely related species to them. With this database, I would run BRAKER2 again (with ProtHint) and combine the result with your BRAKER1 run. I would discard the GenomeThreader run altogether.
If the result still has too many genes, I would try to increase the 'intron_support' parameter in the TSEBRA config file (e.g. to 0.8, 0.9, or 1.0).
I hope this helps.
Best, Lars
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Hi Bob, it looks like you didn't train BRAKER again with the new database. Best, |
Hi Lars,
Thanks for your timely reply. I have tried to train BRAKER using new species name for '--species' and remove the '--useexisting' option as you suggested. However, the result is still not good:
C:64.7%[S:58.6%,D:6.1%],F:13.5%,M:21.8%,n:2586
Could you give me other advice for improving? Maybe, I should change the default parameters to others?
Best,
Bob
…------------------ 原始邮件 ------------------
发件人: "Gaius-Augustus/TSEBRA" ***@***.***>;
发送时间: 2022年7月1日(星期五) 晚上6:07
***@***.***>;
***@***.******@***.***>;
主题: Re: [Gaius-Augustus/TSEBRA] How to improve the BUSCO score of the resulted predictions? (Issue #19)
Hi Bob,
it looks like you didn't train BRAKER again with the new database.
You have to give it a new species name for '--species' and remove the '--useexisting' option.
Best,
Lars
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Hi Bob, if BRAKER2 performs this poorly, you can try to use pref_braker1.cfg instead of the default configuration for TSEBRA. Best, Lars |
Hi Lars,
Many thanks for your advice. I have tried to use the pref_braker1.cfg file and achieved a
reasonable result as follows:
C:96.1%[S:90.1%,D:6.0%],F:1.9%,M:2.0%,n:2586
However, even I increase the 'intron_support' parameter to 1.0, there still are too many predicted genes (43 256). Could you give me more idea about how to decrease the gene counts ?
Thanks again.
Best,
Bob
------------------ 原始邮件 ------------------
发件人: "Gaius-Augustus/TSEBRA" ***@***.***>;
发送时间: 2022年7月8日(星期五) 晚上6:15
***@***.***>;
***@***.******@***.***>;
主题: Re: [Gaius-Augustus/TSEBRA] How to improve the BUSCO score of the resulted predictions? (Issue #19)
Hi Bob,
if BRAKER2 performs this poorly, you can try to use pref_braker1.cfg instead of the default configuration for TSEBRA.
I created this cfg file for a project where I had a similar situation. However, I haven't tested it on different species, so analyzing the result and visually inspecting it is all the more important here.
Best, Lars
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Dear Lars,
Thanks for your valuable advice. I have tried the latest version of TSEBRA and BRAKER. The BUSCO score have improved a lot but I still have some questions. The results are as fellows:
braker1: C:96.1%[S:80.3%,D:15.8%],F:1.8%,M:2.1%,n:2586
braker2 prothint: C:62.6%[S:52.0%,D:10.6%],F:16.0%,M:21.4%,n:2586
braker2 GenomeThreader: C:54.9%[S:47.3%,D:7.6%],F:24.4%,M:20.7%,n:2586
TSEBRA: C:93.4%[S:89.5%,D:3.9%],F:2.9%,M:3.7%,n:2586
Question1: The BUSCO score of braker2 is still very low even I uses different aligners. I just used three closely relative species peps to run braker2 following the examples. I do not kown why. Could you give me some advice to improve it?
Question2: The TSEBRA resulted 47 953 predicted genes, which I think is much more than expected. Do you think I should alter the config file to remove some genes? If so, I am afraid the BUSCO score will decrease. Could you give me some suggestions?
Thanks again.
Best,
Bob
The text was updated successfully, but these errors were encountered: