elem_mul
in variance calculation should use float64
casting
#3127
Milestone
elem_mul
in variance calculation should use float64
casting
#3127
Please make sure these conditions are met
What happened?
When we calculate
X*X
for variance, we preserve the data type of the incomingX
, but this actually can cause downstream inaccuracies from overflow differences. This has been the case for many yearsReally we should do something like
np.multiply(X, X, dtype="float64)
. This would be more accurate/sensible. This came up in the context of https://github.com/scverse/scanpy/pull/3099/files#diff-afb2fb35cbde7ff5e7d9b79874ede22605918cdba923250dd554f23353702e45R65-R67 where @Intron7 was casting first, and then multiplying (because it should be more accurate), but this revealed that we are not doing this at the moment, despite the fact that it is more accurate. And then downstream analyses can change quite a bit.Thus we should remedy this for the next minor release as it is a breaking change to e.g., https://dev.azure.com/scverse/scanpy/_build/results?buildId=7094&view=logs&j=5ea502cf-d418-510c-3b5f-c4ba606ae534&t=534778bb-2f86-5739-7d3c-59518f7b5a2b&l=2171
Minimal code sample
Error output
Versions
The text was updated successfully, but these errors were encountered: