We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hi, when I'm running the minhash dedup by index, I find the cluster results produced by MinhashDedupCluster is a bit strange.
-rw-r--r-- 1 root root 108K Jul 12 12:40 001194.clusters -rw-r--r-- 1 root root 54K Jul 12 12:40 001194.remove -rw-r--r-- 1 root root 108K Jul 12 12:40 001195.clusters -rw-r--r-- 1 root root 54K Jul 12 12:40 001195.remove -rw-r--r-- 1 root root 107K Jul 12 12:40 001196.clusters -rw-r--r-- 1 root root 54K Jul 12 12:40 001196.remove -rw-r--r-- 1 root root 107K Jul 12 12:40 001197.clusters -rw-r--r-- 1 root root 54K Jul 12 12:40 001197.remove -rw-r--r-- 1 root root 106K Jul 12 12:40 001198.clusters -rw-r--r-- 1 root root 53K Jul 12 12:40 001198.remove -rw-r--r-- 1 root root 107K Jul 12 12:40 001199.clusters -rw-r--r-- 1 root root 54K Jul 12 12:40 001199.remove -rw-r--r-- 1 root root 8 Jul 12 12:40 4294967295.clusters -rw-r--r-- 1 root root 4 Jul 12 12:40 4294967295.remove
There is an outlier which might be due to the SENTINEL token being treated as doc to be removed. So there might be a logical bug in the code?
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Hi, when I'm running the minhash dedup by index, I find the cluster results produced by MinhashDedupCluster is a bit strange.
There is an outlier which might be due to the SENTINEL token being treated as doc to be removed. So there might be a logical bug in the code?
The text was updated successfully, but these errors were encountered: