-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error raised with pandas data frame #34
Comments
It doesn't appear that this is simply from a dataframe. Eg, the following works: import pandas as pd
import numpy as np
import scikits.bootstrap as boot
boot.ci(pd.DataFrame(np.random.randn(100))) |
I am trying to bootstrap the eta squared effect size, calculated with these anova libraries: If I use your bootstrap with these pandas data frames as input and these anova libraries as function, that error I shared is raised. I solved the problem this way. I created a function where the raw data sets are fed as input, not in a pandas data frame format. Within my function, the input data sets get structured as a pandas data frame, which is then inputted to the anova library to calculate eta squared, which is returned by my function. The important point is that, using your bootstrap, "multi" needs to be set to "paired". In fact, with "multi=paired", the input data sets (arrays) are sampled together and the link/correspondence between/among the values in each array, at a particular index, is maintained. This is necessary to recreate a correct pandas data frame, within my function, to feed to the anova library, since the data sets have to be related index to index (participant number (subject) - measured value (dependent variable) - between/within factor). This link is not maintained with "multi=independent", where arrays are sampled separately and have unequal length, thus it is not possible to recreate a correct data frame, and an error is also raised due to the unequal size arrays fed to the anova library. |
Hello,
When the input data is a pandas data frame, an error is raised:
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/scikits/bootstrap/bootstrap.py", line 179, in ci
lengths = [x.shape[0] for x in tdata]
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/scikits/bootstrap/bootstrap.py", line 179, in
lengths = [x.shape[0] for x in tdata]
IndexError: tuple index out of range
In the code, it is explained why:
334 # Ensure that the data is actually an array. This isn't nice to pandas,
335 # but pandas seems much much slower and the indices become a problem.
336 if multi and isinstance(data, Iterable):
337 tdata: "Tuple[NDArrayAny, ...]" = tuple(np.array(x) for x in data)
338 lengths = [x.shape[0] for x in tdata]
Any suggestion?
The text was updated successfully, but these errors were encountered: