-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem with the function extract_features #1058
Comments
Hi @AlessioBolpagni98 ! |
i encountered the similar issues, my raw dataframe has 1k ids, 27k rows, 140 features. it can be well done by full feature extraction with MultiprocessingDistributor(n_workers=12) on a 64GB machine within 30 mins. but it always hanged with ClusterDaskDistributor with 4 nodes of 64GB workers. and i noticed that it hanged in the result gathering step. after about 4 hours, the extract_features job will be killed out of memory. |
@AlessioBolpagni98 have you fixed this issue? |
My problem was that i was using the extract_features() function in a improper way. i was using the same column for the parameter 'column_id' and 'column_sort'. This was my problematic function:
|
to solve this in my case all the rows must have the same ID, so i have created an ID 'A' for all the rows |
thanks for your reply. in my case , the code can be finished by multiprocess of n_jobs=8 in about 30mins, but it can't be finished in clustered 8 workers on different machines. |
i fixed it by shifting to dask_feature_extraction_on_chunk(), the ClusterDaskDistributor still failed with a lot of communication errors |
The problem:
I have a script that run everyday and in this script i use the tsfresh function extract_features(), but sometimes the script remain stucked in the function with the progressbar blocked at a certain percentage. The function doesn't raise any excpetion and the code remain blocked.
Packages (1).txt
The text was updated successfully, but these errors were encountered: