Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Changed the default value of xi used in sklearn's OPTIC algorithm from 0.05 to 0.15. The value of xi approximately controls the size of the clusters, with a small xi leading to larger clusters and a larger xi leading to smaller clusters. While 0.05 is the standard value, as recommended in the original OPTICS paper, this value can incorrectly include obvious outliers when the size of each cluster is very small, as often occurs in Zooniverse projects.
The value of 0.15 was chosen after tests with the real data from PRINT project found that obvious outliers (by visual inspection) where identified, while minimising the differences with the previous value of 0.05.
Note that this branch uses the updated OPTICS algorithm, where the
_predecessor_correction
function had a bug fixed. This bug in fact inadvertently helped remove outliers, including in the unit tests used here. This is how the problem with a too low xi value for the use case here was first identified.