-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
R2 values of -inf when no neutralization is observed (i.e. a flat line at 1) #55
Comments
This really reflects a conceptual as much as technical issue. The The total variation in the data is just the summed squared difference of all of the data points (fraction infectivity at each concentration) from a straight horizontal line drawn at the mean fraction infectivity. The fraction of the variation not explained by the fit is the sum squared residuals. So when all the data points are on a perfectly straight line, both the variation in the data and the residuals are zero. So arguably, yes, this should give a value of 1 for the But more generally, when the data fall along a straight line with just a tiny bit of noise (jitter), then the But really, if we are looking at data with no neutralization, we don't want to classify this as a bad fit. So I think really we need QC to include two quantities:
For non-neutralized data, we could have a good fit with a very poor |
Improvements to metrics for assessing curve fit (see [here](#55 (comment))): - The coefficient of determination (``r2``) now is one if all points are fit by a straight line, rather than engative infinity. - A root-mean-square-deviation (square root of mean residual) is now calculated as the ``rmsd`` attribute of ``HillCurve`` objects and reported in fit parameter summaries from ``CurveFits``.
- In `process_plate_curvefit_qc` in the YAML configuration, there is a new key called `goodness_of_fit` and now both `min_R2` (the minimum coefficient of determination) and `max_RMSD` (the maximum mean square deviation) for each curve fit are specified as keys under that. The curves are then filtered to retain only those that meet *either* of these criteria (so must fail both to be dropped). Addresses [this issue](#33) and [this issue](jbloomlab/neutcurve#55 (comment)). Alongside this change, the `rmsd` is now reported in key output files. Also, in the tabulation of failures, `fails_min_R2` now becomes `fails_goodness_of_fit`. - This is a **backward-incompatible change** in the configuration YAML. Previously `min_R2` was a standalone key under `process_plate_curvefit_qc`; now `goodness_of_fit` is the required key and `min_R2` and `max_RMSD` are required keys under it. - Added another plate (of H3N2 rather than H1N1) to the `test_example` to test some of the changes introduced in this version.
When a virus is not neutralized, this results in fraction infectivity values which can be fit by flat line at 1.0. In the current iteration of neutcurve, it appears that this type of data can result in a fit with an r2 value of negative infinity. This means that filtering on a minimum r2 value would remove these values from analysis. This is not ideal, as non-neutralization of a given strain is a reasonable and expected result in many cases.
The text was updated successfully, but these errors were encountered: