-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect PPV estimate from riskProfile function #6
Comments
I spent some time today investigating this a bit and just want to check my understanding before diving much further. In your Let me know if this is intentional and if you still expect the results to match up even with the slightly different ways of setting the score cutoff. |
I think that the Line 32 in e868e23
The PPV that function produces all look correct but the data that is returned is shifted off by one for PPV (but not NPV) |
Ah, I see - I was looking at this part in your code: s4p_perf <- p1cn$data %>%
filter(method == "non-parametric") %>%
filter(score == 0.41325) Would you need to do |
My underlying assumption is that for a given entry in the data returned by the riskProfile function it returns the PPV or NPV for the scenario of score > cutoff. So in the text you quoted in this step |
Thanks for confirming! I'll see what I can do to make the values align. |
I think I've nailed down the culprit, but I want your advice on what you want the behavior of the package to be. You were right to hone in on the lines where the prevalence is injected as a ppv/npv. To manage this, the calculated thresholds were being truncated. The assumption that the last threshold should be equal to the prevalence seemed to be the issue. I think the original goal here was to ensure that there is always a PPV & NPV calculated at There are two paths I see here, and it depends on what you're looking for: We stop forcing the inclusion of ppv/npv where
|
Hey guys! Sorry for completely missing on this issue, I have now updated my notifications in this repo ... Did you manage to solve this? I have created a PR that I think should address this issue, see #7. It is a mix of Doug's suggestions actually, because I now think that the following line Line 35 in e868e23
nonParametricPV function wasn't exactly correct.
|
Thanks for the ping @slamao - this definitely fell off my radar through the holiday season. I'd be happy for you to pick it up and make a judgement call on what is the appropriate solution here since I'm really having to jog my memory on the methods trying to jump back into the codebase. Please let me know if there's anything else you might need from me. |
Thanks @dgkf-roche and @slamao ! I was out for two months on sabbatical so this fell off my radar as well. @slamao I am testing out your proposed changes in #7 now and will come back to you |
Address #6. Since it was chosen by design to assess `score > x` for each `x in score`, I don't think we can naturally end up with a PPV at prevalence. Other option would be to inflate the resulting dataset manually for this boundary value, but than we would need to artificially add one more score and one more percentile to match the number of rows in the resulting data frame.
Thanks for the trust 😄 and also thanks for your analysis above, it did help me to prepare the PR, which is a mix of your suggestions. And sorry to both of you for reacting so late. It is nice to see at least some activity here :) hope you enjoyed your days off Derrek! |
I noticed that the PPV estimate for a given cutoff in the riskProfile() function output is incorrect while NPV (and separately sensitivity and specificity) all look to be correct. Below is a working toy example to illustrate the problem.
Our NPV and PPV for this example are
0.9189189
and0.4615385
as expected.With the riskProfile function we see a different estimate for PPV:
Our NPV and PPV for this riskProfile example are
0.9189189
and0.4285714
i.e. not the correct PPV!If you take a deeper dive into the data output from the riskProfile function it looks like the PPV estimate at the next cutoff is the correct value for the previous cutoff so could be a merging or off-by-one type error (remember that the NPV is still correct for a given cutoff though).
Here's my session info as well JIC:
The text was updated successfully, but these errors were encountered: