-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge Alistair's fixes with Isaac's #80
base: corpus-3-release-fixes
Are you sure you want to change the base?
Conversation
@alistairewj and @ibevers - why are we going away from the parallel execution? also pydra has a serial executor. if that didn't work something in the usage or assumptions is incorrect. let's fix things if they are broken now instead of the rework i'm seeing here. |
@satra we are adding serial execution as a plan B for redundancy since parallel execution is not working with SenseLab's speech_to_text function currently. We are working on debugging parallel execution. I believe the core challenge with parallel execution may be with the reliability of Whisper or SenseLab's speech_to_text function. Fabio told me that it does not work on shorter audio files. @alistair found that it did not work initially on every file, although I believe it worked when he retried it since he was able to get a complete set of transcripts. @alistairewj told me that the parallel execution is currently not working after adding error handling and retry logic for transcription. |
this does not seem to be an issue of serial vs parallel (both of which pydra can handle without change of code). it's more a question of what happens when some file is not processed appropriately (resilience, something pydra handles internally as well by creating an exception/runtime error that is tracked at the task level). and perhaps also constructing the task and workflow properly. for example, having retries in a task. |
@ibevers also, can you please open an issue on senselab describing the issue you are facing and reporting all steps to reproduce it? thanks! |
Yes, I suppose talking about it as a serial/parallel issue does not really focus on the key problem. @fabiocat93 Yes, I will do that. Thank you! |
Yeah so I ran everything in serial because debugging the parallel execution was challenging. I fixed the issues with pipeline and its use with senselab. They were:
For the parallel vs. serial processing, this was my choice really not Isaac's. I've been finding pydra very hard to use. The documentation is sparse, and so it wasn't easy for me to (quickly) get the debugging to work. Specifically I can't find where the The code to do the parallel processing is here: b2aiprep/src/b2aiprep/prepare/prepare.py Lines 225 to 250 in d5db59b
This fails with this error: Exception has occurred: AttributeError
'Workflow' object has no attribute 'ef_wf'
File "/Users/alistairewj/git/b2aiprep/src/b2aiprep/prepare/prepare.py", line 243, in extract_features_workflow
ef_wf.set_output(
File "/Users/alistairewj/git/b2aiprep/src/b2aiprep/prepare/prepare.py", line 315, in prepare_bids_like_data
extract_features_workflow(
File "/Users/alistairewj/git/b2aiprep/src/b2aiprep/cli.py", line 111, in prepbidslikedata
prepare_bids_like_data(
File "/Users/alistairewj/git/b2aiprep/src/b2aiprep/cli.py", line 413, in <module>
main() # pylint: disable=no-value-for-parameter
^^^^^^
AttributeError: 'Workflow' object has no attribute 'ef_wf' It seems like the name of the workflow ( |
No description provided.