I want one audio can generate one exactly result, what should i do? #40

Satellite30 · 2024-12-18T10:56:15Z

Satellite30
Dec 18, 2024

Hello author, thank you for open-sourcing the UTMOSV2 model code. The integration of the visual model and SSL features in this work is an excellent approach. However, I have a question. Unlike UTMOS1, when I tested UTMOSv2 (using the Hugging Face demo and keeping the default parameter settings, link to Hugging Face demo), I noticed that the model's output results vary. Even when using the same audio and domain, the model produces different predictions. Of course, I've observed the random slicing within the dataset. I specifically limited the audio to exactly 3 seconds (without altering any parameter settings). Yet, the model still predicts different outcomes. Why is that? PS: I haven't gone through all the internal code. Additionally, as stated in the title, how can I achieve a stable score prediction?

Answered by Satellite30

Dec 19, 2024

Problem solved, there is no randomness used internally in the model. I found that modifying the utmosv2.dataset._utils.select_random_start function can achieve non-random slicing; simply change return y[start : start + length] to return y[:length].

View full answer

Satellite30 · 2024-12-19T03:36:50Z

Satellite30
Dec 19, 2024
Author

Problem solved, there is no randomness used internally in the model. I found that modifying the utmosv2.dataset._utils.select_random_start function can achieve non-random slicing; simply change return y[start : start + length] to return y[:length].

1 reply

Satellite30 Dec 19, 2024
Author

I use the default setting predict_dataset: str = "sarulab" and fusion_stage3 setting。Besides select_random_start function, dataset.spec_frames.mixup_inner needs to be set False. This setting is in the utmosv2/config/fusion_stage3.py。Finally，it can be one exactly result

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I want one audio can generate one exactly result, what should i do? #40

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

I want one audio can generate one exactly result, what should i do? #40

Satellite30 Dec 18, 2024

Replies: 1 comment · 1 reply

Satellite30 Dec 19, 2024 Author

Satellite30 Dec 19, 2024 Author

Satellite30
Dec 18, 2024

Replies: 1 comment 1 reply

Satellite30
Dec 19, 2024
Author

Satellite30 Dec 19, 2024
Author