Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

web tool HHblits not giving the same results as command line HHblits #1523

Open
RippeiHayashi opened this issue Jul 12, 2023 · 0 comments
Open

Comments

@RippeiHayashi
Copy link

Hello,
I would like to run HHpred on the commandline and encountered a problem of not being able to reproduce the same results as the web tool.

I am using hhsuite-3.3.0 on a Linux computer to run hhblits to make a multiple sequence alingement, then use it to search for similar structures using hhsearch.
I noticed that hhblits is not behaving in the same way as the web version.

Here is an example.

Protein_x
MLYLLILSMSTLSIGLGDGRFWDRGSEEDLVYENLGLKAEKIGMISLVDDHALISVLIKLPNFKTTVKPASDKERGLIDACCGDFLDHDQGRSKGIPASQSTFQIVFKEKYLKLIQEYKLRANQFLTSRINILEPYVLPSQLIRNARRSKRQLFAFLGSTVLSFAMGAVSEYQMYKINRHVSANSEAIKDLKSRLDLEQVQIIALKDGLIGLSKEITSKMAIFLERNSCTQLYSDLSHRLEFAFHEYTHVVDDLLFTIIDGHGRSLLSPRTVPPEVMGQLIRQHSELNNTVFYENPLLLYSSAKVNIANIDNNLEYAHFVLDVPLLYRNNTSYKLFKPSQVGVFVSNNTCAYYDMPKLMYDRQDIFFEIRDMDDCTQHNALFICPSSSIFKIRSCIQRKQVTCNYRRDNCDYHYSYKVSTVGVLIRDNLDHDAFVLNEKGWTTLLNFPVQRTAYVPWTKVQALQIGDAILSSPNIPRDPITMVNLTSNLTLYDFVDSKEVSTVFGEICERYNSSLSELITPVITEAHSNKWFNWETLWLISLTITLIGLISWIISQQITICKGVTIQPNGELQHLAKKETATTSPGNVISTCPEVADDSGTNSPPKYTNSSLSY

first we need to make a3m file

hhblits -i Protein_x.fasta
-d ${DATABASE}/UniRef30_2022_02
-oa3m ${OUTPUT}/Protein_x.a3m
-e 1e-3 -n 3 -p 20 -Z 250 -z 1 -b 1 -B 250

--- Protein_x.a3m produced by the commandline hhblits
Query Protein_x
Match_columns 614
No_of_seqs 1 out of 1
Neff 1
Searched_HMMs 358
Date Wed Jul 12 16:33:58 2023
Command /hh-suite/bin/hhblits -i /Protein_x.fasta -d /UniRef30_2022_02 -oa3m /Protein_x.a3m -e 1e-3 -n 3 -p 20 -Z 250 -z 1 -b 1 -B 250

No Hit Prob E-value P-value Score SS Cols Query HMM Template HMM
1 UniRef100_A0A2Z4Z3N9 Putative 93.8 0.062 1.7E-07 63.4 0.0 128 268-398 236-365 (547)
2 UniRef100_A0A482JPN5 Fusion gl 92.3 0.24 5.4E-07 59.8 0.0 224 146-387 221-460 (666)
3 UniRef100_A0A7D6IY64 Uncharact 90.3 0.75 1.5E-06 55.2 0.0 283 174-476 160-464 (586)
4 UniRef100_UPI0003F7A1EF hypoth 84.9 3.2 6.4E-06 41.8 0.0 23 9-31 1-23 (88)

--- Protein_x.a3m produced by the web tool hhblits
Query Protein_x
Match_columns 614
No_of_seqs 37 out of 39
Neff 4.51371
Searched_HMMs 8902
Date Wed Jul 12 07:48:34 2023
Command hhblits -cpu 8 -i ../results/Protein_x.in.a3m -d /cluster/toolkit/production/databases/hhblits/UniRef30 -o /ebio/toolkit_rye/user/toolkit/production/jobs/Protein_x/results/Protein_x.hhr -oa3m /ebio/toolkit_rye/user/toolkit/production/jobs/Protein_x/results/Protein_x.a3m -e 1e-3 -n 3 -p 20 -Z 250 -z 1 -b 1 -B 250

No Hit Prob E-value P-value Score SS Cols Query HMM Template HMM
1 UniRef100_A0A0B5KEU4 ORF1 n=1 100.0 5.2E-66 1.1E-71 547.8 0.0 412 3-477 6-460 (566)
2 UniRef100_A0A7D6IY64 Glycoprot 100.0 8.5E-64 1.8E-69 535.4 0.0 408 24-475 21-462 (586)
3 UniRef100_A0A210QYP1 Transmemb 100.0 9.8E-63 2.5E-68 561.6 0.0 355 24-432 262-676 (1056)
4 UniRef100_A0A8B6CM16 Chromo do 100.0 2E-62 5E-68 568.7 0.0 361 28-438 398-814 (1313)

More HMMs passed through the filters and the E-values are generally much better with the web version hhblits.
Is the web version hhblits using parameters that are not used by default?
May those details not show up in the output file?

I appreciate that the web version HHblits uses UniRef30_2023-02, which is a more recent version than what I am using (UniRef30_2022-02). But, I doubt that that would make a big difference.

Please could you help?
Thank you,
Rippei

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant