How to understand and use the audio embedding? #148

arthur19312 · 2024-04-29T18:22:48Z

I'm new here, I run the method get_audio_embedding_from_filelist with model music_audioset_epoch_15_esc_90.14.pt and get the audio embeddings just like

[[-4.639852792024612427e-02, -9.935184381902217865e-03, ...]]

I approximately know it represent the feature of the input audio somehow, while I don't know how to use it.
Could someone tell me what is the audio embedding that I get in format of float? And whether this audio embedding is common to other models? And how should I use it?

(PS: I'm really interested in this work while it seems like I lack some necessary background knowledge, so it would be better if someone could recommend me some relevant materials to get me into the field. Thank you so much ❤)

The text was updated successfully, but these errors were encountered:

cvillela · 2024-04-30T16:52:48Z

I am having similar doubts.

When extracting text and audio embeddings, I can easily perform cosine similarity to find closely related pairs, and retrieve audio from text inputs and vice-versa.

However, I would like to know if there is a way to decode the embeddings into text. Decoding them into Audio seems manageable using AudioLDM.

satvik-dixit · 2024-07-14T19:13:48Z

@cvillela is there a way to decode CLAP embeddings to Audio using AudioLDM?

arthur19312 · 2024-10-22T12:38:15Z

When I make an analogy to CLIP, I would know how to use CLAP. My mind was stuck then ><. Thanks for your hints!
Now we know AudioLDM will turn text into audio, and is there any tools works like clip interrogator to turn audio into text?

waldleitner · 2024-10-23T08:18:52Z

@arthur19312 The following CLAP implementration also supports a model for audio captioning (not yet tested):

https://arxiv.org/abs/2309.05767
https://github.com/microsoft/CLAP

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to understand and use the audio embedding? #148

How to understand and use the audio embedding? #148

arthur19312 commented Apr 29, 2024 •

edited

Loading

cvillela commented Apr 30, 2024

satvik-dixit commented Jul 14, 2024

arthur19312 commented Oct 22, 2024

waldleitner commented Oct 23, 2024

How to understand and use the audio embedding? #148

How to understand and use the audio embedding? #148

Comments

arthur19312 commented Apr 29, 2024 • edited Loading

cvillela commented Apr 30, 2024

satvik-dixit commented Jul 14, 2024

arthur19312 commented Oct 22, 2024

waldleitner commented Oct 23, 2024

arthur19312 commented Apr 29, 2024 •

edited

Loading