Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PCA and ICA return ValueError #4

Closed
bainzo opened this issue Jun 14, 2016 · 4 comments
Closed

PCA and ICA return ValueError #4

bainzo opened this issue Jun 14, 2016 · 4 comments

Comments

@bainzo
Copy link

bainzo commented Jun 14, 2016

Running a simple test,

import thunder as td
from factorization import PCA,ICA
ts = td.series.fromexample('fish')
algo = PCA()
w,t = algo.fit(ts)

fails with,

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-7-350246bb024c> in <module>()
----> 1 w,t = algopca.fit(ts)

/home/ec2-user/anaconda2/lib/python2.7/site-packages/factorization/base.pyc in fit(self, X)
     15 
     16         if data.mode == "local":
---> 17             results = self._fit_local(data)
     18 
     19         if data.mode == "spark":

/home/ec2-user/anaconda2/lib/python2.7/site-packages/factorization/algorithms/PCA.pyc in _fit_local(self, X)
     14 
     15     def _fit_local(self, X):
---> 16         t, v = self._fit_spark(X)
     17         return t.toarray(), v.toarray()
     18 

/home/ec2-user/anaconda2/lib/python2.7/site-packages/factorization/algorithms/PCA.pyc in _fit_spark(self, X)
     23         from thunder.series import fromarray
     24 
---> 25         X = toseries(X).center(1)
     26 
     27         svd = SVD(k=self.k, method=self.svd_method, max_iter=self.max_iter, tol=self.tol, seed=self.seed)

/home/ec2-user/anaconda2/lib/python2.7/site-packages/factorization/utils.pyc in toseries(y)
     11 
     12     if len(y.shape) != 2:
---> 13         raise ValueError("factorization on for 2-dimensional arrays")
     14 
     15     return y

ValueError: factorization on for 2-dimensional arrays

In both PySpark and regular python.

I've also had similar errors for ICA with this type of test and my own tif data.

I'm running Thunder 1.0.0 and Spark 1.6.0.

@bainzo bainzo closed this as completed Jun 14, 2016
@bainzo bainzo reopened this Jun 14, 2016
@jwittenbach
Copy link
Contributor

@bainzo This is one of the aspects that we want to address before we publish the package. With the original design, all of the factorization algorithms only work with two-dimensional objects. It's easy enough of the user to ravel/unravel higher-dimensional data to on passing it into/out of the algorithms, but we realized that we should probably just do that for them automatically.

@bainzo
Copy link
Author

bainzo commented Jun 16, 2016

@jwittenbach Ahh, that makes sense.

Is there a method like flatten that can reshape the Series object?

@bainzo
Copy link
Author

bainzo commented Jun 16, 2016

I've opened this as issue #335 on the main thunder repo as suggested by @jwittenbach

@jwittenbach
Copy link
Contributor

jwittenbach commented Aug 19, 2016

Just merged a big update that address this (#5). Now factorization can be done direction on Images and Series objects from thunder and shapes will be conserved. I'm going to close this, but definitely open a new issue if there are any problems with the new functionality~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants