-
Notifications
You must be signed in to change notification settings - Fork 229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Results differ from scikit-learn implementation #8
Comments
Did you try |
Yes, unfortunately sk-learn's t-sne is unusable now except for such toy datasets. Yes, that's strange the output shows that the algorithm quickly converged to a low error and stopped any progress further.
Comparing to that MNIST test example slowly but progressed till the last iteration. No I haven't tried other implementations yet |
Well, it actually lies about the loss before 200 iteration (check the
code), so I would not believe this log.
Did you try this one https://github.com/cemoody/topicsne? Interesting to
compare to this repo and sk-learn both in quality and speed.
On Thu, Mar 9, 2017 at 10:44 PM areshytko ***@***.***> wrote:
Yes, unfortunately sk-learn's t-sne is unusable now except for such toy
datasets. Yes, that's strange the output shows that the algorithm quickly
converged to a low error and stopped any progress further.
Learning embedding...
Iteration 50: error is 43.405481 (50 iterations in 0.00 seconds)
Iteration 100: error is 44.709520 (50 iterations in 0.00 seconds)
Iteration 150: error is 43.567784 (50 iterations in 0.00 seconds)
Iteration 200: error is 42.564679 (50 iterations in 0.00 seconds)
Iteration 250: error is 1.118502 (50 iterations in 0.00 seconds)
Iteration 300: error is 0.238091 (50 iterations in 0.00 seconds)
Iteration 350: error is 0.117268 (50 iterations in 0.00 seconds)
Iteration 400: error is 0.120770 (50 iterations in 0.00 seconds)
Iteration 450: error is 0.121062 (50 iterations in 0.00 seconds)
Iteration 500: error is 0.121366 (50 iterations in 0.00 seconds)
Iteration 550: error is 0.121098 (50 iterations in 0.00 seconds)
Iteration 600: error is 0.121540 (50 iterations in 0.00 seconds)
Iteration 650: error is 0.121057 (50 iterations in 0.00 seconds)
Iteration 700: error is 0.120856 (50 iterations in 0.00 seconds)
Iteration 750: error is 0.121666 (50 iterations in 0.00 seconds)
Iteration 800: error is 0.121161 (50 iterations in 0.00 seconds)
Iteration 850: error is 0.121708 (50 iterations in 0.00 seconds)
Iteration 900: error is 0.121865 (50 iterations in 0.00 seconds)
Iteration 950: error is 0.122631 (50 iterations in 0.00 seconds)
Iteration 999: error is 0.121577 (50 iterations in 0.00 seconds)
Fitting performed in 0.00 seconds.
No I haven't tried other implementations yet
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#8 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AGanZBRKxaN6HzTssMhc6gZKa9htno2lks5rkFaYgaJpZM4MYfCZ>
.
--
Best,
Dmitry
|
Hi, the picture in the README file is t-sne visualization for MNIST dataset, made with the code from this repository. Here is the code https://github.com/DmitryUlyanov/Multicore-TSNE/blob/master/python/tests/test.py |
Hey, I loaded the dataset from sklearn and ran the multicore_tsne on it, would there be a difference? |
Do not know for sure, but the format the digits are stored can be different, e.g. [0,1] or 0...255. And t-SNE does a gradient descent, which may fail if the scaling and learning rates are wrong. Try the example |
Yes it works with your example. It appears the scalings are different for the datasets. The dataset from sklearn is 0...16 but the one in your example is [-1,1]. So is this version working only with normalized datasets? |
Thank you for putting this together, as it is the only multicore TSNE application I can get to successfully complete. However, my results are identical to shaidams64. I have an arcsinh transformed data set and I tried an implementation of this method in R (single core) and I get good results. Sklearn implementation (python) on the same data set returns a very similar result. This multi-core implementation works quickly, but produces an indiscernible cloud of points. I have carefully aligned all of the arguments I can, and the result is the same. Even when I set multicoreTSNE to use only one core, the result is the same (cloud of points). Any recommendations on how to fix this? EDIT: This discussion thread ends with a multicore TSNE implementation that does reproduce my results with Sklearn and Rtsne. lvdmaaten/bhtsne#18 |
Is this problem solved with this multi-core tsne? |
I used this recently and didn't see a noticeable speed up 🤷. This was
on an AWS instance with 32 cores. I was hopeful.
…On Sun, Apr 1, 2018 at 7:07 PM, Yubin ***@***.***> wrote:
Is this problem solved with this multi-core tsne?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#8 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADmCPOssaXqNqY4UlOgSdXFbs1SzavjCks5tkYfKgaJpZM4MYfCZ>
.
--
Sincerely,
Ryan Lambert
data scientist
email: [email protected]
|
Hi, facing same problem for now - results of sklearn tsne and yours differs on the same params
So, if I'm getting it right - data normalizing should help (to make results be about "same")? |
t-sne is inherently randomized but still not that much. It produces consistently different (much worse) results compared to scikit-learn Barnes-Hut implementation.
Example on IRIS dataset:
Scikit-learn with default parameters and learning rate 100
Multicore T-SNE with default parameters and learning rate 100
The greater distance of
setosa
cluster is also supported by general statistical properties of the dataset (and other embedding algorithms) so the results of scikit-learn lib are more consistent with the original manifold structureThe text was updated successfully, but these errors were encountered: