Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong TM score? Order matters? #6

Open
ekiefl opened this issue Feb 2, 2022 · 6 comments
Open

Wrong TM score? Order matters? #6

ekiefl opened this issue Feb 2, 2022 · 6 comments
Labels

Comments

@ekiefl
Copy link

ekiefl commented Feb 2, 2022

Hello, looks like some well-written and organized code in this project--thanks for making it.

I'm noticing different results depending on the order my filepaths are passed to TMscoring. Usually the results of symmetric, but in a batch of around 300 unique comparisons, about 30 exhibit this behavior. Here is one such example. I've attached the files as *.txt so github allows me to upload them, but you should rename them to *.pdb to replicate this example.

s1.txt
s2.txt

import tmscoring
print(tmscoring.__version__)

aln = tmscoring.TMscoring('s1.pdb', 's2.pdb')
aln.optimise()
TM_score = aln.tmscore(**aln.get_current_values())
print(TM_score)

aln = tmscoring.TMscoring('s2.pdb', 's1.pdb') # reverse order
aln.optimise()
TM_score = aln.tmscore(**aln.get_current_values())
print(TM_score)

This yields

>>> 0.3
>>> 0.038127712872195546
>>> 0.8497114209938352

When I pass these files to https://zhanggroup.org/TM-score/ the result is 0.84971.... In every case where the TM score is non-symmetric, the web server yields the larger result, which by eye, appears to be the correct choice.

s1.pdb differs from s2.pdb in that it was generated from MODELLER, which does not have hydrogen atoms, whereas s2.pdb is generated from AlphaFold, which does. Take from that what you will :\

@Dapid
Copy link
Owner

Dapid commented Mar 3, 2023

Sorry for the slow reply. Unfortunately I don't have much time to work on this anymore.

It is puzzling indeed. Optimising for RMSD instead does yield a symmetric result. Maybe iminuit is having a hiccup there?

Hydrogens shouldn't matter, since we are just considering the alpha carbons.

@Dapid Dapid added the bug label Mar 3, 2023
@ardhe-qb
Copy link

Hi @Dapid,

Have been there any developments in this repo? Is there another (possibly more maintained) Python library that allows us to calculate the TM score?

@ekiefl
Copy link
Author

ekiefl commented Nov 10, 2023

@ardhe-qb I haven't ran this code, but based on my original bug report, I do suspect that calculating TM with both directions and taking the max yields the correct result. So you could install tmscoring and then somewhere in your project create the following wrap:

import tmscoring
from pathlib import Path
from typing import Union

Pathish = Union[Path, str]

def get_tmscore(path1: Pathish, path2: Pathish) -> float:
    aln1 = tmscoring.TMscoring(path1, path2)
    aln2 = tmscoring.TMscoring(path2, path1)

    aln1.optimise()
    aln2.optimise()
    
    return max(
        aln1.tmscore(**aln1.get_current_values()),
        aln2.tmscore(**aln1.get_current_values()),
    )

@Dapid
Copy link
Owner

Dapid commented Nov 10, 2023

Hi @Dapid,

Have been there any developments in this repo? Is there another (possibly more maintained) Python library that allows us to calculate the TM score?

Sorry, I am not in the field anymore, so I don't have the time to work on the code. I am also not aware of any other implementation (that is why I created this one).

If someone wants to modernise it, I could review it, or even pass it on. The codebase is fairly short.

@Dapid
Copy link
Owner

Dapid commented Nov 10, 2023

@ardhe-qb I haven't ran this code, but based on my original bug report, I do suspect that calculating TM with both directions and taking the max yields the correct result. So you could install tmscoring and then somewhere in your project create the following wrap:

Interesting. Is that because the normalisation is different, or is the optimiser not converging to the same point?

@ekiefl
Copy link
Author

ekiefl commented Nov 10, 2023

@ardhe-qb I haven't ran this code, but based on my original bug report, I do suspect that calculating TM with both directions and taking the max yields the correct result. So you could install tmscoring and then somewhere in your project create the following wrap:

Interesting. Is that because the normalisation is different, or is the optimiser not converging to the same point?

I'm not sure. I never looked into the code, or even how TM score works, it was just a paragraph in my thesis. Based on my experimentation in Feb 2022 that led me to file this bug report, I seemed to come to the conclusion that the max matches https://zhanggroup.org/TM-score/

@ardhe-qb If you want to go ahead with the above hack, I would recommend first catching mismatches and then testing against https://zhanggroup.org/TM-score/ to verify the max score matches:

import tmscoring
from pathlib import Path
from typing import Union

Pathish = Union[Path, str]

def get_tmscore(path1: Pathish, path2: Pathish) -> float:
    aln1 = tmscoring.TMscoring(path1, path2)
    aln2 = tmscoring.TMscoring(path2, path1)

    aln1.optimise()
    aln2.optimise()

    score1 = aln1.tmscore(**aln1.get_current_values()),
    score2 = aln2.tmscore(**aln2.get_current_values()),

    if score1 != score2:
        print(f"Mismatch between {path1} (TMScore: {score1}) and {path2} (TMScore: {score2})")
    
    return max(score1, score2)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants