You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I tried replicating the audio to text retrieval results using the PyPI library and the hugging face implementation, however the obtained numbers do not match with those reported in the paper.
For the hugging face implementation, I use ClapTextModelWithProjection and ClapAudioModelWithProjection. I obtain the similarity score by performing cosine similarity and sort the retrieved texts by similarity score.
Similarly for the PyPI library implementation, I use get_audio_embedding_from_data and get_text_embedding and follow the same procedure as above.
The model is initialized as following:
model = laion_clap.CLAP_Module(enable_fusion=enable_fusion)
model.load_ckpt()
I am using Clotho version 2.1 evaluation split from here and AudioCaps val split from google drive link in repository
Could you please help me understand what could be the issue?
The text was updated successfully, but these errors were encountered:
I would recommend using this github implementation to evaluate the model. Also, for clotho dataset, for one audio there are 5 text labels. Thus, the metric calculation is a bit different. Please refer to our implementation of evaluation in here: https://github.com/LAION-AI/CLAP/blob/main/src/laion_clap/training/train.py#L577
Hi, I tried replicating the audio to text retrieval results using the PyPI library and the hugging face implementation, however the obtained numbers do not match with those reported in the paper.
For the hugging face implementation, I use
ClapTextModelWithProjection
andClapAudioModelWithProjection
. I obtain the similarity score by performing cosine similarity and sort the retrieved texts by similarity score.Similarly for the PyPI library implementation, I use
get_audio_embedding_from_data
andget_text_embedding
and follow the same procedure as above.The model is initialized as following:
I am using Clotho version 2.1 evaluation split from here and AudioCaps val split from google drive link in repository
Could you please help me understand what could be the issue?
The text was updated successfully, but these errors were encountered: