R2 values of -inf when no neutralization is observed (i.e. a flat line at 1) #55

anloes · 2024-03-22T16:49:43Z

When a virus is not neutralized, this results in fraction infectivity values which can be fit by flat line at 1.0. In the current iteration of neutcurve, it appears that this type of data can result in a fit with an r2 value of negative infinity. This means that filtering on a minimum r2 value would remove these values from analysis. This is not ideal, as non-neutralization of a given strain is a reasonable and expected result in many cases.

jbloom · 2024-03-24T22:52:53Z

This really reflects a conceptual as much as technical issue.

The r2 value is the coefficient of determination, which reflects how much of the total variation in the data (the total sum of squares of the data, $\sum_i \left(y_i - \langle y \rangle\right)$) is explained by the fit sigmoid.

The total variation in the data is just the summed squared difference of all of the data points (fraction infectivity at each concentration) from a straight horizontal line drawn at the mean fraction infectivity.

The fraction of the variation not explained by the fit is the sum squared residuals.

So when all the data points are on a perfectly straight line, both the variation in the data and the residuals are zero. So arguably, yes, this should give a value of 1 for the r2 rather than negative infinity, and I will fix this.

But more generally, when the data fall along a straight line with just a tiny bit of noise (jitter), then the r2 will be zero (or close to zero) since a sigmoid cannot fit the data better than a straight line.

But really, if we are looking at data with no neutralization, we don't want to classify this as a bad fit.

So I think really we need QC to include two quantities:

the coefficient of determination (r2)
the root mean squared deviation (rmsd), which quantifies the absolute amount that the points vary from the data

For non-neutralized data, we could have a good fit with a very poor r2, but then the rmsd will be very small. So I will add computation of that from the curve fits, and then curves can be QC-ed by looking at both the r2 and rmsd.
A fit should be considered good if either r2 is close to one rmsd is close to zero.

Improvements to metrics for assessing curve fit (see [here](#55 (comment))): - The coefficient of determination (``r2``) now is one if all points are fit by a straight line, rather than engative infinity. - A root-mean-square-deviation (square root of mean residual) is now calculated as the ``rmsd`` attribute of ``HillCurve`` objects and reported in fit parameter summaries from ``CurveFits``.

- In `process_plate_curvefit_qc` in the YAML configuration, there is a new key called `goodness_of_fit` and now both `min_R2` (the minimum coefficient of determination) and `max_RMSD` (the maximum mean square deviation) for each curve fit are specified as keys under that. The curves are then filtered to retain only those that meet *either* of these criteria (so must fail both to be dropped). Addresses [this issue](#33) and [this issue](jbloomlab/neutcurve#55 (comment)). Alongside this change, the `rmsd` is now reported in key output files. Also, in the tabulation of failures, `fails_min_R2` now becomes `fails_goodness_of_fit`. - This is a **backward-incompatible change** in the configuration YAML. Previously `min_R2` was a standalone key under `process_plate_curvefit_qc`; now `goodness_of_fit` is the required key and `min_R2` and `max_RMSD` are required keys under it. - Added another plate (of H3N2 rather than H1N1) to the `test_example` to test some of the changes introduced in this version.

jbloom mentioned this issue Mar 25, 2024

improvements to assessing curve fit (r2 and rmsd) #57

Merged

jbloom linked a pull request Mar 25, 2024 that will close this issue

improvements to assessing curve fit (r2 and rmsd) #57

Merged

jbloom closed this as completed in #57 Mar 25, 2024

This was referenced Mar 25, 2024

improve goodness of fit assessments jbloomlab/seqneut-pipeline#33

Closed

goodness of fit looks at both RMSD and R2 jbloomlab/seqneut-pipeline#37

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

R2 values of -inf when no neutralization is observed (i.e. a flat line at 1) #55

R2 values of -inf when no neutralization is observed (i.e. a flat line at 1) #55

anloes commented Mar 22, 2024

jbloom commented Mar 24, 2024 •

edited

Loading

R2 values of -inf when no neutralization is observed (i.e. a flat line at 1) #55

R2 values of -inf when no neutralization is observed (i.e. a flat line at 1) #55

Comments

anloes commented Mar 22, 2024

jbloom commented Mar 24, 2024 • edited Loading

jbloom commented Mar 24, 2024 •

edited

Loading