Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to improve the BUSCO score of the resulted predictions? #19

Open
bioinformaticspcj opened this issue Jun 27, 2022 · 6 comments
Open

Comments

@bioinformaticspcj
Copy link

Dear Lars,

Thanks for your valuable advice. I have tried the latest version of TSEBRA and BRAKER. The BUSCO score have improved a lot but I still have some questions. The results are as fellows:
braker1: C:96.1%[S:80.3%,D:15.8%],F:1.8%,M:2.1%,n:2586
braker2 prothint: C:62.6%[S:52.0%,D:10.6%],F:16.0%,M:21.4%,n:2586
braker2 GenomeThreader: C:54.9%[S:47.3%,D:7.6%],F:24.4%,M:20.7%,n:2586
TSEBRA: C:93.4%[S:89.5%,D:3.9%],F:2.9%,M:3.7%,n:2586

Question1: The BUSCO score of braker2 is still very low even I uses different aligners. I just used three closely relative species peps to run braker2 following the examples. I do not kown why. Could you give me some advice to improve it?

Question2: The TSEBRA resulted 47 953 predicted genes, which I think is much more than expected. Do you think I should alter the config file to remove some genes? If so, I am afraid the BUSCO score will decrease. Could you give me some suggestions?

Thanks again.

Best,
Bob

@LarsGab
Copy link
Collaborator

LarsGab commented Jun 27, 2022

Hi Bob,

BRAKER2 is intended to be used with a large protein database. In your case, 3 species is probably not enough. I would suggest that you download a large database of related species from OrthoDB (e.g. the phylum of your species) and add your 3 closely related species to them. With this database, I would run BRAKER2 again (with ProtHint) and combine the result with your BRAKER1 run. I would discard the GenomeThreader run altogether.

If the result still has too many genes, I would try to increase the 'intron_support' parameter in the TSEBRA config file (e.g. to 0.8, 0.9, or 1.0).

I hope this helps.
Best, Lars

@bioinformaticspcj
Copy link
Author

bioinformaticspcj commented Jul 1, 2022 via email

@LarsGab
Copy link
Collaborator

LarsGab commented Jul 1, 2022

Hi Bob,

it looks like you didn't train BRAKER again with the new database.
You have to give it a new species name for '--species' and remove the '--useexisting' option.

Best,
Lars

@bioinformaticspcj
Copy link
Author

bioinformaticspcj commented Jul 3, 2022 via email

@LarsGab
Copy link
Collaborator

LarsGab commented Jul 8, 2022

Hi Bob,

if BRAKER2 performs this poorly, you can try to use pref_braker1.cfg instead of the default configuration for TSEBRA.
I created this cfg file for a project where I had a similar situation. However, I haven't tested it on different species, so analyzing the result and visually inspecting it is all the more important here.

Best, Lars

@bioinformaticspcj
Copy link
Author

bioinformaticspcj commented Jul 11, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants