You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Maybe I'm just a novice at statistics and this is how it's supposed to work, but it seems like a bug. When I connect a correlations widget to data that has missing fields in one of the features, the correlations are different than if I provide the same data but with the problem row removed. What I would expect is that if I am getting a correlation of six rows of two features, and one of the rows is missing the second feature, then only the five complete rows would factor into the correlation. But the value is not the same as a file with only those five rows.
I'll try to illustrate. Here is the test csv file, missing one value for score:
The Scatter Plot shows the correct r value for the regression line of age and score (0.09). The upper Correlations widget shows an incorrect Pearson correlation (0.050). The lower Select Rows excludes the undefined "Lucy" row, then connects to another Correlations widget, which shows the correct value (0.090).
What's wrong?
Maybe I'm just a novice at statistics and this is how it's supposed to work, but it seems like a bug. When I connect a correlations widget to data that has missing fields in one of the features, the correlations are different than if I provide the same data but with the problem row removed. What I would expect is that if I am getting a correlation of six rows of two features, and one of the rows is missing the second feature, then only the five complete rows would factor into the correlation. But the value is not the same as a file with only those five rows.
I'll try to illustrate. Here is the test csv file, missing one value for score:
Here is Orange:
The Scatter Plot shows the correct r value for the regression line of age and score (0.09). The upper Correlations widget shows an incorrect Pearson correlation (0.050). The lower Select Rows excludes the undefined "Lucy" row, then connects to another Correlations widget, which shows the correct value (0.090).
How can we reproduce the problem?
missing_row_test.ows.zip
Look at the correlations in the two Correlations widgets.
What's your environment?
The text was updated successfully, but these errors were encountered: