How to improve the BUSCO score of the resulted predictions? #19

bioinformaticspcj · 2022-06-27T02:58:25Z

Dear Lars,

Thanks for your valuable advice. I have tried the latest version of TSEBRA and BRAKER. The BUSCO score have improved a lot but I still have some questions. The results are as fellows:
braker1: C:96.1%[S:80.3%,D:15.8%],F:1.8%,M:2.1%,n:2586
braker2 prothint: C:62.6%[S:52.0%,D:10.6%],F:16.0%,M:21.4%,n:2586
braker2 GenomeThreader: C:54.9%[S:47.3%,D:7.6%],F:24.4%,M:20.7%,n:2586
TSEBRA: C:93.4%[S:89.5%,D:3.9%],F:2.9%,M:3.7%,n:2586

Question1: The BUSCO score of braker2 is still very low even I uses different aligners. I just used three closely relative species peps to run braker2 following the examples. I do not kown why. Could you give me some advice to improve it?

Question2: The TSEBRA resulted 47 953 predicted genes, which I think is much more than expected. Do you think I should alter the config file to remove some genes? If so, I am afraid the BUSCO score will decrease. Could you give me some suggestions?

Thanks again.

Best,
Bob

LarsGab · 2022-06-27T07:31:35Z

Hi Bob,

BRAKER2 is intended to be used with a large protein database. In your case, 3 species is probably not enough. I would suggest that you download a large database of related species from OrthoDB (e.g. the phylum of your species) and add your 3 closely related species to them. With this database, I would run BRAKER2 again (with ProtHint) and combine the result with your BRAKER1 run. I would discard the GenomeThreader run altogether.

If the result still has too many genes, I would try to increase the 'intron_support' parameter in the TSEBRA config file (e.g. to 0.8, 0.9, or 1.0).

I hope this helps.
Best, Lars

bioinformaticspcj · 2022-07-01T02:06:12Z

Hi Lars, Many thanks for your timely reply. I have tried the braker2 as you suggested. I downloaded more than 4,500,000 vertebrata pep sequences from  OrthoDB and combined my 3 closely related species to them. In total 4,832,878 pep sequences were used to run braker2. But the BUSCO score had only improved a little as follows: C:64.2%[S:58.4%,D:5.8%],F:14.5%,M:21.3%,n:2586 The commond I used are as follows: braker.pl --species=Pcar --genome=Pcar.genome.fa.maskered --prot_seq=all.orthodb.pep.1.fa --softmasking --cores=48  --nocleanup --gff3  --workingdir=braker2_out --epmode --useexisting Could you give me some more advice to improve it? Thanks a lot, Best,  Bob

…

------------------ 原始邮件 ------------------ 发件人: "Gaius-Augustus/TSEBRA" ***@***.***>; 发送时间: 2022年6月27日(星期一) 下午3:31 ***@***.***>; ***@***.******@***.***>; 主题: Re: [Gaius-Augustus/TSEBRA] How to improve the BUSCO score of the resulted predictions? (Issue #19) Hi Bob, BRAKER2 is intended to be used with a large protein database. In your case, 3 species is probably not enough. I would suggest that you download a large database of related species from OrthoDB (e.g. the phylum of your species) and add your 3 closely related species to them. With this database, I would run BRAKER2 again (with ProtHint) and combine the result with your BRAKER1 run. I would discard the GenomeThreader run altogether. If the result still has too many genes, I would try to increase the 'intron_support' parameter in the TSEBRA config file (e.g. to 0.8, 0.9, or 1.0). I hope this helps. Best, Lars — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: ***@***.***>

LarsGab · 2022-07-01T10:07:02Z

Hi Bob,

it looks like you didn't train BRAKER again with the new database.
You have to give it a new species name for '--species' and remove the '--useexisting' option.

Best,
Lars

bioinformaticspcj · 2022-07-03T15:38:10Z

Hi Lars， Thanks for your timely reply. I have tried to train BRAKER using new species name for '--species' and remove the '--useexisting' option as you suggested. However, the result is still not good: C:64.7%[S:58.6%,D:6.1%],F:13.5%,M:21.8%,n:2586 Could you give me other advice for improving? Maybe, I should change the default parameters to others? Best, Bob

…

------------------ 原始邮件 ------------------ 发件人: "Gaius-Augustus/TSEBRA" ***@***.***>; 发送时间: 2022年7月1日(星期五) 晚上6:07 ***@***.***>; ***@***.******@***.***>; 主题: Re: [Gaius-Augustus/TSEBRA] How to improve the BUSCO score of the resulted predictions? (Issue #19) Hi Bob, it looks like you didn't train BRAKER again with the new database. You have to give it a new species name for '--species' and remove the '--useexisting' option. Best, Lars — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: ***@***.***>

LarsGab · 2022-07-08T10:15:22Z

Hi Bob,

if BRAKER2 performs this poorly, you can try to use pref_braker1.cfg instead of the default configuration for TSEBRA.
I created this cfg file for a project where I had a similar situation. However, I haven't tested it on different species, so analyzing the result and visually inspecting it is all the more important here.

Best, Lars

bioinformaticspcj · 2022-07-11T06:18:13Z

Hi Lars, Many thanks for your advice. I have tried to use the pref_braker1.cfg file and achieved a reasonable result as follows: C:96.1%[S:90.1%,D:6.0%],F:1.9%,M:2.0%,n:2586 However, even I increase the 'intron_support' parameter to 1.0, there still are too many predicted genes (43 256). Could you give me more idea about how to decrease the gene counts ?  Thanks again. Best, Bob ------------------ 原始邮件 ------------------ 发件人: "Gaius-Augustus/TSEBRA" ***@***.***>; 发送时间: 2022年7月8日(星期五) 晚上6:15 ***@***.***>; ***@***.******@***.***>; 主题: Re: [Gaius-Augustus/TSEBRA] How to improve the BUSCO score of the resulted predictions? (Issue #19) Hi Bob, if BRAKER2 performs this poorly, you can try to use pref_braker1.cfg instead of the default configuration for TSEBRA. I created this cfg file for a project where I had a similar situation. However, I haven't tested it on different species, so analyzing the result and visually inspecting it is all the more important here. Best, Lars — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: ***@***.***>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to improve the BUSCO score of the resulted predictions? #19

How to improve the BUSCO score of the resulted predictions? #19

bioinformaticspcj commented Jun 27, 2022

LarsGab commented Jun 27, 2022

bioinformaticspcj commented Jul 1, 2022 via email

LarsGab commented Jul 1, 2022

bioinformaticspcj commented Jul 3, 2022 via email

LarsGab commented Jul 8, 2022

bioinformaticspcj commented Jul 11, 2022 via email

How to improve the BUSCO score of the resulted predictions? #19

How to improve the BUSCO score of the resulted predictions? #19

Comments

bioinformaticspcj commented Jun 27, 2022

LarsGab commented Jun 27, 2022

bioinformaticspcj commented Jul 1, 2022 via email

LarsGab commented Jul 1, 2022

bioinformaticspcj commented Jul 3, 2022 via email

LarsGab commented Jul 8, 2022

bioinformaticspcj commented Jul 11, 2022 via email