Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A question about the FDR and Q-Value of the MS-GF+ output #131

Open
Sweetsour-crap opened this issue Sep 28, 2021 · 2 comments
Open

A question about the FDR and Q-Value of the MS-GF+ output #131

Sweetsour-crap opened this issue Sep 28, 2021 · 2 comments
Labels

Comments

@Sweetsour-crap
Copy link

I have some questions about the “QValue” in the output report of the software.

In the documentation, you mentioned that “QValue is defined as the minimum false discovery rate (FDR) at which the test may be called significant”. But in the formula of the QValue, the documentation says:

  • QValue(t) = (Number of DecoyPSMs with score equal or above t) ÷ (Number of TargetPSMs with score equal or above t)

It is quite confused that this seems to be the formula for computing “FDR”. And when I read some other papers, they said that “FDR” is not the same as “Qvalue”, because “FDR is not a function of the underlying score, it is not monotone: two different scores can lead to the same FDR”. So Qvalue is taken to represent the “The minimal FDR threshold at which a given PSM is accepted” for a single PSM.

And I found one plot of the relationship of FDR and Qvalue taken from another paper, which is attached below. (From “Käll, L., Storey, J. D., MacCoss, M. J., & Noble, W. S. (2008). Assigning significance to peptides identified by tandem mass spectrometry using decoy databases. Journal of proteome research, 7(1), 29–34.”)
fdr and qvalue

So I want to know if my understanding of FDR and QValue is right. If so, could you tell me that which parameter you are using in the report of MS-GF+ exactly? Is it FDR or QValue? If my understating is wrong, I also want to know the exact interpretation of QValue in the output report.

I searched the website documentation and two papers of MS-GF+ but still can not get an answer. So I decide to bother you for the question and I am sorry for taking your precious time.

@FarmGeek4Life
Copy link
Collaborator

As for the exact definition of "QValue" as used in MS-GF+, I can't really give a precise answer, but in terms of behavior, the MS-GF+ QValue does exhibit very similar behavior to the QValue shown in that image, but it is based on the FDR ratio also mentioned in that paper. I believe the MS-GF+ QValue may be calculated in a manner matching the "estimated QValue" mentioned there.

Specifically: to compute the MS-GF+ QValue, the target and decoy results are combined, and only the PSM with the highest SpecEValue for each spectrum is kept. These are then sorted from best to worst SpecEValue, and FDR is calculated for each of these PSMs (at that SpecEValue threshold). Then the FDR values for all PSMs are processed to create the QValue, which is the highest computed FDR value that exists among the assigned PSM and any PSM with a better SpecEValue.

If you want to try to understand the calculations in the code yourself, the following files/lines are good places to start:
https://github.com/MSGFPlus/msgfplus/blob/master/src/main/java/edu/ucsd/msjava/fdr/ComputeFDR.java#L272
https://github.com/MSGFPlus/msgfplus/blob/master/src/main/java/edu/ucsd/msjava/fdr/TargetDecoyAnalysis.java#L160

@alchemistmatt
Copy link
Collaborator

I would treat Q Value as an estimate of FDR. There is no such thing as absolute FDR in proteomics. To see how Q Value is calculated, please download these Excel files:

The formulas in those files show how Q Value is computed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants