-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TerminatedWorkerError when using GridSearchCV #177
Comments
Ok haven't managed to replicate exactly on my (windows) laptop or colab. Will try linux later. In the meantime a smal change to your code is to go with:
Instead of
I'm not sure if the code previously supported numpy arrays and I lost this support in a refactor or if it's always been this way. I think I should be able to add the support back relatively easily. It's possible that this alone would fix your example because the multiprocessing can give confusing error codes whenever there is a bug |
True, now I remember that we had this issue before. Unfortunately I still get the error, even when using P.S.: Maybe it would make sense to open a separate issue for the data types in param_grid? I think it would sense if both list, numpy arrays or other iterables would be valid inputs? |
I then tried to use simulated data and it seems to work with:
Here I get |
Yes, agree |
I guess I have to take a look at my dataset and check if the error stems from there. Maybe it has to do something with #175 because that should be the only difference here. Right now, my X, y are already normalized before passing them to GridSearchCV. |
I will implement StandardScaler to mimic the old |
Okay, I test the following code on a Windows machine and it works fine. Both on the Windows and Ubuntu machine I have scikit-learn 1.3.0 and cca-zoo 2.1.0 installed. import numpy as np
import pandas as pd
from sklearn.model_selection import GroupShuffleSplit
from cca_zoo.model_selection import GridSearchCV
from cca_zoo.linear import rCCA
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from mvlearn.utils import check_Xs
from sklearn.base import TransformerMixin
from sklearn.utils.validation import check_is_fitted
###############################################################################
## Prepare Analysis ###########################################################
###############################################################################
rng = np.random.RandomState(42)
brain_df = np.loadtxt('brain_df.txt')
behavior_df = np.loadtxt('behavior_df.txt')
groups = np.loadtxt('groups.txt')
###############################################################################
## Analysis settings ##########################################################
###############################################################################
# define latent dimensions
latent_dimensions = 1
# define cross validation strategy
cv = GroupShuffleSplit(n_splits=10,train_size=0.7,random_state=rng)
# define a search space (optimize left and right penalty parameters)
param_grid = {'cca__c':[list(np.arange(0.1,1.1,0.1)),list(np.arange(0.1,1.1,0.1))]}
"""
Class which allows for the different (or the same) processing of multiple views of data.
"""
class MultiViewPreprocessing(TransformerMixin):
def __init__(self, preprocessing_list):
self.preprocessing_list = preprocessing_list
def fit(self, views, y=None):
"""
Fits the associated preprocessing steps to each view.
Parameters
----------
views
y
Returns
-------
"""
if len(self.preprocessing_list) == 1:
self.preprocessing_list = self.preprocessing_list * len(views)
elif len(self.preprocessing_list) != len(views):
raise ValueError("Length of preprocessing_list must be 1 (apply the same preprocessing to each view) or equal to the number of views")
check_Xs(views, enforce_views=range(len(self.preprocessing_list)))
for view, preprocessing in zip(views, self.preprocessing_list):
preprocessing.fit(view, y)
return self
def transform(self, X, y=None):
"""
Transforms each view using the associated preprocessing steps.
Parameters
----------
X
y
Returns
-------
"""
[check_is_fitted(preprocessing) for preprocessing in self.preprocessing_list]
check_Xs(X, enforce_views=range(len(self.preprocessing_list)))
return [preprocessing.transform(view) for view, preprocessing in zip(X, self.preprocessing_list)]
# # define an estimator
estimator = Pipeline([
('preprocessing', MultiViewPreprocessing((StandardScaler(),StandardScaler()))),
('cca',rCCA(latent_dimensions=latent_dimensions,random_state=rng))
])
###############################################################################
## Run GridSearch
##############################################################################
def scorer(estimator, views):
scores = estimator.score(views)
return np.mean(scores)
grid = GridSearchCV(estimator,param_grid,scoring=scorer,n_jobs=5,cv=cv)
grid.fit([brain_df,behavior_df],groups=groups)
best_params = grid.best_params_
estimator_best = grid.best_estimator_
X_weights,y_weights = estimator_best.weights
print(f"Best parameters are: {best_params}\n") Data: behavior_df.txt Could you perhaps also re-check on a Linux machine if this is an OS issue? |
Works fine or doesn't work fine? |
Ah sorry. Works fine on Windows but not on Ubuntu. |
ok - got an error message I can see? I can try and get one myself otherwise XD |
Here's the complete traceback: Traceback (most recent call last):
File ~/micromamba/envs/csp_wiesner_johannes/lib/python3.8/site-packages/spyder_kernels/py3compat.py:356 in compat_exec
exec(code, globals, locals)
File ~/work/projects/project_hcp/testing/test_cca.py:108
grid.fit([brain_df,behavior_df],groups=groups)
File ~/micromamba/envs/csp_wiesner_johannes/lib/python3.8/site-packages/cca_zoo/model_selection/_search.py:208 in fit
self = BaseSearchCV.fit(self, np.hstack(X), y=y, groups=groups, **fit_params)
File ~/micromamba/envs/csp_wiesner_johannes/lib/python3.8/site-packages/sklearn/base.py:1151 in wrapper
return fit_method(estimator, *args, **kwargs)
File ~/micromamba/envs/csp_wiesner_johannes/lib/python3.8/site-packages/sklearn/model_selection/_search.py:898 in fit
self._run_search(evaluate_candidates)
File ~/micromamba/envs/csp_wiesner_johannes/lib/python3.8/site-packages/cca_zoo/model_selection/_search.py:199 in _run_search
evaluate_candidates(param_grid)
File ~/micromamba/envs/csp_wiesner_johannes/lib/python3.8/site-packages/sklearn/model_selection/_search.py:845 in evaluate_candidates
out = parallel(
File ~/micromamba/envs/csp_wiesner_johannes/lib/python3.8/site-packages/sklearn/utils/parallel.py:65 in __call__
return super().__call__(iterable_with_config)
File ~/micromamba/envs/csp_wiesner_johannes/lib/python3.8/site-packages/joblib/parallel.py:1944 in __call__
return output if self.return_generator else list(output)
File ~/micromamba/envs/csp_wiesner_johannes/lib/python3.8/site-packages/joblib/parallel.py:1587 in _get_outputs
yield from self._retrieve()
File ~/micromamba/envs/csp_wiesner_johannes/lib/python3.8/site-packages/joblib/parallel.py:1691 in _retrieve
self._raise_error_fast()
File ~/micromamba/envs/csp_wiesner_johannes/lib/python3.8/site-packages/joblib/parallel.py:1726 in _raise_error_fast
error_job.get_result(self.timeout)
File ~/micromamba/envs/csp_wiesner_johannes/lib/python3.8/site-packages/joblib/parallel.py:735 in get_result
return self._return_or_raise()
File ~/micromamba/envs/csp_wiesner_johannes/lib/python3.8/site-packages/joblib/parallel.py:753 in _return_or_raise
raise self._result
TerminatedWorkerError: A worker process managed by the executor was unexpectedly terminated. This could be caused by a segmentation fault while calling the function or by an excessive memory usage causing the Operating System to kill the worker.
The exit codes of the workers are {SIGSEGV(-11), SIGSEGV(-11)} |
Okay, tested it on our Linux server. Same error here. Seems to be an OS-issue! |
OK I think it's also possible that its consuming more than expected memory. Will investigate - apologies and thanks for bringing this to my attention! |
I'm thinking trying two things will help diagnose this:
1 will help work out if it is multiprocessing causing the problem, 2 will help work out if it is in the pipeline |
Yup, with |
Thanks for this. Will have a dig around. |
Hi @JohannesWiesner. From some reading I'm thinking this is versions of scipy/numpy e.g.
because there's nothing substantive that has changed to rCCA which could have caused this (it essentially does a lossless PCA [keeping all components] for efficiency and then sets up an eigenvalue problem which it sends to scipy). So I think if you give updating scipy/numpy a go that might work? |
Hm, are you sure? Also tried it with |
Ah true, all of those will use the same underlying numpy/scipy functions I guess. You'll get an update tomorrow! |
Worked: On our Windows machine, these versions are installed:
Worked: We also tested in a Docker Container (with Debian-Bookworm as the base image) running on Windows:
Worked: Then I set up a completely fresh conda environment with cca-zoo only:
Did not work: And in my default conda environment I got these versions:
|
Hard to say what's causing the issue. Can't really see, how |
Ergh! And this worked in a previous version? |
It does seem to be a problem elsewhere: https://stackoverflow.com/questions/53757856/segmentation-fault-when-creating-multiprocessing-array |
Geez, that sounds not trivial. For now, I will just use the working conda environment for the analysis. Let me know, if I should test something out for you. Probably a good idea to implement a testing workflow with different os-runners in the long term. |
Agree about testing with different OS - my ‘hack’ has been that I develop on windows and the automatic tests here use Ubuntu. although weirdly that suggests the package does work on Ubuntu! So suggests I need to make the numpy/scipy versions explicit (I’ve tended towards laziness/relying on scikit-learn dependencies to be about right) |
So this passes all the tests on Ubuntu: Installing numpy (1.24.4) Installing scipy (1.9.3) Installing scikit-learn (1.3.0) If that works then I’ll make the dependencies hard to avoid your issue in the future- thanks and apologies! I’m always learning 🙏 |
Ah no because I haven’t been testing the jobs>1 behaviour. Will add to the tests |
Should be feasible to implement a CI-Workflow with different os-runners and and then running pytest.py test for each of them. |
Could send a PR if I have some time |
Hi James, with the latest version of cca_zoo I get this error:
Didn't happen in older versions (although I am using the exact same script). Can you reproduce this? Here's my full code + attached my X,y and groups as txt files.
Data:
groups.txt
X.txt
y.txt
Note that
X
andy
have been normalized prior to GridSearch, so each fold "sees" different batches of the normalized dataset. Not sure if this is related to #175The text was updated successfully, but these errors were encountered: