Merge Alistair's fixes with Isaac's #80

ibevers · 2024-10-02T18:12:14Z

No description provided.

satra · 2024-10-02T18:16:03Z

@alistairewj and @ibevers - why are we going away from the parallel execution? also pydra has a serial executor. if that didn't work something in the usage or assumptions is incorrect. let's fix things if they are broken now instead of the rework i'm seeing here.

ibevers · 2024-10-02T18:47:15Z

@satra we are adding serial execution as a plan B for redundancy since parallel execution is not working with SenseLab's speech_to_text function currently.

We are working on debugging parallel execution. I believe the core challenge with parallel execution may be with the reliability of Whisper or SenseLab's speech_to_text function. Fabio told me that it does not work on shorter audio files. @alistair found that it did not work initially on every file, although I believe it worked when he retried it since he was able to get a complete set of transcripts. @alistairewj told me that the parallel execution is currently not working after adding error handling and retry logic for transcription.

satra · 2024-10-02T18:56:38Z

this does not seem to be an issue of serial vs parallel (both of which pydra can handle without change of code). it's more a question of what happens when some file is not processed appropriately (resilience, something pydra handles internally as well by creating an exception/runtime error that is tracked at the task level). and perhaps also constructing the task and workflow properly. for example, having retries in a task.

fabiocat93 · 2024-10-02T19:01:02Z

@ibevers also, can you please open an issue on senselab describing the issue you are facing and reporting all steps to reproduce it? thanks!

ibevers · 2024-10-02T19:02:36Z

Yes, I suppose talking about it as a serial/parallel issue does not really focus on the key problem. @fabiocat93 Yes, I will do that. Thank you!

alistairewj · 2024-10-03T00:05:21Z

Yeah so I ran everything in serial because debugging the parallel execution was challenging. I fixed the issues with pipeline and its use with senselab. They were:

If the language isn't specified, the transcription fails. I fixed this by specifying the language as English.
Randomly HFModel fails to validate the data. I couldn't find any consistency as to why this would happen - just on a random record. I don't actually think this is a senselab issue. a try/except around the validation error handles this fine since it doesn't repeatedly happen. I wish I had a good explanation for it, but HF does some pretty elaborate things when it tries to load a model; for all I know it's simply an API call which fails and isn't propagated properly through the traceback.

For the parallel vs. serial processing, this was my choice really not Isaac's. I've been finding pydra very hard to use. The documentation is sparse, and so it wasn't easy for me to (quickly) get the debugging to work. Specifically I can't find where the Task object is documented with its methods - the docs only describe ShellCommandTask. There are not really any examples to learn from either. :(

The code to do the parallel processing is here:

b2aiprep/src/b2aiprep/prepare/prepare.py

Lines 225 to 250 in d5db59b

    
           # Get paths to every audio file. 
        
           audio_paths = get_audio_paths(bids_dir_path=bids_dir_path) 
        
           # Initialize the Pydra workflow. 
        
           ef_wf = pydra.Workflow( 
        
               name="ef_wf", input_spec=["audio_paths"], audio_paths=audio_paths, cache_dir=None 
        
           ) 
        
           # Run wav_to_features for each audio file. 
        
           ef_wf.add( 
        
               wav_to_features( 
        
                   name="features", 
        
                   wav_path=ef_wf.lzin.audio_paths, 
        
                   transcription_model_size=transcription_model_size, 
        
                   with_sensitive=with_sensitive, 
        
               ).split("wav_path", wav_path=ef_wf.lzin.audio_paths) 
        
           ) 
        
           ef_wf.set_output( 
        
               {"audio_paths": ef_wf.lzin.audio_paths} 
        
           ) 
        
           with pydra.Submitter(plugin="cf") as run: 
        
               run(ef_wf) 
        
           return ef_wf

This fails with this error:

Exception has occurred: AttributeError
'Workflow' object has no attribute 'ef_wf'
  File "/Users/alistairewj/git/b2aiprep/src/b2aiprep/prepare/prepare.py", line 243, in extract_features_workflow
    ef_wf.set_output(
  File "/Users/alistairewj/git/b2aiprep/src/b2aiprep/prepare/prepare.py", line 315, in prepare_bids_like_data
    extract_features_workflow(
  File "/Users/alistairewj/git/b2aiprep/src/b2aiprep/cli.py", line 111, in prepbidslikedata
    prepare_bids_like_data(
  File "/Users/alistairewj/git/b2aiprep/src/b2aiprep/cli.py", line 413, in <module>
    main()  # pylint: disable=no-value-for-parameter
    ^^^^^^
AttributeError: 'Workflow' object has no attribute 'ef_wf'

It seems like the name of the workflow (ef_wf) is being used as an attribute.. but I don't know why. I suspect this is an easy error to fix, but I'm not sure how!

alistairewj added 3 commits September 20, 2024 18:38

fixes while running on corpus 3

4cf8333

rework so non-parallel extraction works and verifies content

d07d229

add command to validate a BIDS folder has all the extracted features

d5db59b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge Alistair's fixes with Isaac's #80

Merge Alistair's fixes with Isaac's #80

ibevers commented Oct 2, 2024

satra commented Oct 2, 2024

ibevers commented Oct 2, 2024

satra commented Oct 2, 2024

fabiocat93 commented Oct 2, 2024

ibevers commented Oct 2, 2024 •

edited

Loading

alistairewj commented Oct 3, 2024 •

edited

Loading

Merge Alistair's fixes with Isaac's #80

Are you sure you want to change the base?

Merge Alistair's fixes with Isaac's #80

Conversation

ibevers commented Oct 2, 2024

satra commented Oct 2, 2024

ibevers commented Oct 2, 2024

satra commented Oct 2, 2024

fabiocat93 commented Oct 2, 2024

ibevers commented Oct 2, 2024 • edited Loading

alistairewj commented Oct 3, 2024 • edited Loading

ibevers commented Oct 2, 2024 •

edited

Loading

alistairewj commented Oct 3, 2024 •

edited

Loading