-
-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistency in case of dataframe and distance matrix input #46
Comments
@jnpsk thank you for identifying and noting this behavior in this issue! While it's not a project issue as you stated, I think it would be best to align the behavior of using your own distance matrix with that of the original Local Outlier Probabilities method, for consistency. Perhaps this can be resolved by adding one additional neighbor when using This fix should be included in the next version pushed out. |
@jnpsk can you please comment on the version of SciPy you have installed? After running the test case as is with 120 observations, I receive a difference of |
Having defined
What I mean is that the |
Resolved this ticket by updating the documentation in the |
This is not a project issue, but a suggestion to put some kind of warning in the distance matrix example in the README.
There is an example of using distance matrix as input for LoOP in the README. It shows how
sklearn.neighbors.NearestNeighbors
could be used to obtain distance matrix together with index matrix. It seems that this way the matricies also contain distance measures to a point itself, resulting to zero distance for the first nearest neighbor of every point.On the other hand internal method
_compute_distance_and_neighbor_matrix
, used when data argument is specified, excludes the distances to a point itself and so giving different scores on same data.I took a look into the test case, which allowes difference of 0.15 in scores vector, and thus the difference between 0.45 and 0.6 is considered negligible.
I think the output metrices of
sklearn.neighbors.NearestNeighbors
should be transformed first to be consistent with the internal algorithm.The text was updated successfully, but these errors were encountered: