Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding utility functions #48

Merged
merged 10 commits into from
Jun 4, 2024
Merged

Conversation

fabiocat93
Copy link
Collaborator

I have added some utility functions for computing cosine similarity, eer, wer, cca, cka, cross correlation

@fabiocat93 fabiocat93 requested a review from wilke0818 June 3, 2024 17:28
@codecov-commenter
Copy link

codecov-commenter commented Jun 3, 2024

Codecov Report

Attention: Patch coverage is 94.76744% with 18 lines in your changes missing coverage. Please review.

Project coverage is 59.38%. Comparing base (3e2e6b6) to head (66ff00b).

Files Patch % Lines
...lab/audio/tasks/speech_to_text_evaluation_pydra.py 0.00% 7 Missing ⚠️
src/senselab/audio/tasks/preprocessing_pydra.py 0.00% 4 Missing ⚠️
src/senselab/audio/tasks/preprocessing.py 93.18% 3 Missing ⚠️
src/senselab/utils/tasks/cca_cka.py 95.55% 2 Missing ⚠️
src/senselab/utils/tasks/cosine_similarity.py 90.90% 1 Missing ⚠️
src/tests/audio/tasks/preprocessing_test.py 98.11% 1 Missing ⚠️
Additional details and impacted files
@@                    Coverage Diff                    @@
##           audio_abstract_dtype      #48       +/-   ##
=========================================================
+ Coverage                 41.62%   59.38%   +17.75%     
=========================================================
  Files                        25       36       +11     
  Lines                       627      943      +316     
=========================================================
+ Hits                        261      560      +299     
- Misses                      366      383       +17     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@wilke0818 wilke0818 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks mostly good to me. I have two suggestions that might be helpful in the future that can be minor tasks for others later:

  • In speech_to_text_evaluation I think it would be good to give more concrete definitions on what some of these mean or reference say the jiwer docs since I haven't heard of/am not sure the definitions of say WIP or WIL.
  • In cca_cka we should create an enum for the types of kernels that are allowed as a better software practice

@fabiocat93
Copy link
Collaborator Author

thank you, @wilke0818, for your feedback. For now, I have created an enum for the kernel types. I will address documentation issues more calmly if that sounds reasonable.
Also, I have added some more functionalities for audio pre-processing. Please feel free to review the code whenever you have time

"""
down_mixed_audios = []
for audio in audios:
if audio.waveform.dim() != 2 or audio.waveform.size(0) < 1:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is needed since Pydantic should be handling this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair. i have removed this

if audio.waveform.dim() != 2 or audio.waveform.size(0) < 1:
raise ValueError("waveform should have shape (num_channels, num_samples)")

down_mixed_audio = audio.copy()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

think we need to change this to audio.model_copy(deep=True) but curious if what's better is this or just creating a new Audio instance?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am not sure there is a big difference here. But can easily switch to creating a new Audio object so that we are consistent in coding style plus it becomes clearer how we create a new entity/item

raise ValueError("waveform should have shape (num_channels, num_samples)")

down_mixed_audio = audio.copy()
down_mixed_audio.waveform = audio.waveform.mean(dim=0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I worry slightly about this but maybe your test cases handle this, but does this guarantee that waveform is shape (1, num_samples)? I'm not 100% sure that the field_validator runs in this case but if it does then this is fine. If it doesn't, then use audio.waveform.mean(dim=0, keepdim=True)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was managed anyway, but i have now added keepdim=True to make it clearer

Shape: (num_channels, num_samples).

Returns:
List[Audio]: The list of audio objects with a mono waveform averaged from all channels. Shape: (num_samples).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this gets into a later comment I made, but at least in the Audio class as of right now, we standardize the waveform field to be (num_channels, num_samples) and perhaps to keep consistency we keep it as (1, num_samples)?

return down_mixed_audios


def select_channel_from_audios(audios: List[Audio], channel_index: int) -> List[Audio]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

basically see all of my comments for the above function

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we've broken up a lot of the tasks into individual files (e.g. individual modules) and I'm wondering if we should consolidate a bit to make it easier to use. Not super opinionated on this though.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a good point. I think we can proceed like this for this first release and next week we do some restructure

This was linked to issues Jun 4, 2024
@fabiocat93 fabiocat93 added the enhancement New feature or request label Jun 4, 2024
@fabiocat93 fabiocat93 self-assigned this Jun 4, 2024
@fabiocat93 fabiocat93 merged commit 66c646b into audio_abstract_dtype Jun 4, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Task: Utilities Task: Preprocessing
3 participants