-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Matching MFCC configuration with Librosa's #54
Comments
@Yaxit It was tested with the MFCC from TensorFlow using Hamming window If Librosa is a reference then it may be worth looking at it for a future improvement. The From TensorFlow documentation:
I think |
@Yaxit I have made some tests. Here is a Python script to compute the 3 versions : TensorFlow, CMSIS-DSP, Librosa. I am using the CMSIS-DSP Python wrapper for testing ( The results I am getting:
I don't understand all the settings of Librosa. I played with some of them and I can't get the same result as TensorFlow. I was hoping that the Librosa setting So, it looks like Librosa is using different convention and formula. For below script, I have started from the TensorFlow example in the documentation and I have reused the same parameters: import tensorflow as tf
import cmsisdsp as dsp
import librosa
import numpy as np
import scipy.signal as sig
import cmsisdsp.mfcc as mfcc
from cmsisdsp.datatype import F32
import math
# https://librosa.org/doc/main/generated/librosa.feature.mfcc.html
# https://www.tensorflow.org/api_docs/python/tf/signal/mfccs_from_log_mel_spectrograms
FFTSize = 1024
numOfDctOutputs = 13
freq_min = 80.0
freq_high = 7600.0
numOfMelFilters = 80
frame_step = 256
num_samples, sample_rate = 32000, 16000.0
t = np.linspace(0,2,num_samples)
pcm = np.array(np.sin(2*math.pi * t * 1000),dtype=np.double)
# A 1024-point STFT with frames of 64 ms and 75% overlap.
stfts = tf.signal.stft(pcm, frame_length=FFTSize, frame_step=frame_step,
fft_length=FFTSize,
window_fn=tf.signal.hamming_window)
spectrograms = tf.abs(stfts)
# Warp the linear scale spectrograms into the mel-scale.
num_spectrogram_bins = stfts.shape[-1]
lower_edge_hertz, upper_edge_hertz, num_mel_bins = freq_min, freq_high, numOfMelFilters
linear_to_mel_weight_matrix = tf.signal.linear_to_mel_weight_matrix(
num_mel_bins, num_spectrogram_bins, sample_rate, lower_edge_hertz,
upper_edge_hertz,dtype=tf.double)
#print(spectrograms.dtype)
#print(linear_to_mel_weight_matrix.dtype)
mel_spectrograms = tf.tensordot(
spectrograms, linear_to_mel_weight_matrix, 1)
mel_spectrograms.set_shape(spectrograms.shape[:-1].concatenate(
linear_to_mel_weight_matrix.shape[-1:]))
# Compute a stabilized log to get log-magnitude mel-scale spectrograms.
log_mel_spectrograms = tf.math.log(mel_spectrograms + 1e-6)
# Compute MFCCs from log_mel_spectrograms and take the first 13.
mfccsT = tf.signal.mfccs_from_log_mel_spectrograms(
log_mel_spectrograms)[..., :numOfDctOutputs]
mfccsL = librosa.feature.mfcc(y=np.array(pcm),
sr=sample_rate,
n_mfcc=numOfDctOutputs,
n_fft = FFTSize,
hop_length = frame_step,
window=sig.hamming(FFTSize, sym=False),
center=False, # for padding, not used
pad_mode='constant', # for padding, not used
power=1.0,
n_mels=numOfMelFilters,
fmin=freq_min,
fmax=freq_high,
dct_type=2,
norm='ortho',
htk=True
)
print("TF")
print(mfccsT[0])
window = sig.hamming(FFTSize, sym=False)
filtLen,filtPos,packedFilters = mfcc.melFilterMatrix(F32,freq_min, freq_high, numOfMelFilters,sample_rate,FFTSize)
dctMatrixFilters = mfcc.dctMatrix(F32,numOfDctOutputs, numOfMelFilters)
mfccf32=dsp.arm_mfcc_instance_f32()
status=dsp.arm_mfcc_init_f32(mfccf32,FFTSize,numOfMelFilters,numOfDctOutputs,dctMatrixFilters,
filtPos,filtLen,packedFilters,window)
tmp=np.zeros(FFTSize + 2)
res=dsp.arm_mfcc_f32(mfccf32,pcm[0:FFTSize],tmp)
print("CMSIS-DSP")
print(res)
print("Librosa")
print(np.array(mfccsL).T[0]) |
@christophe0606 Thank you for the clarification. |
Hello,
I am trying to replicate the MFCC output of Librosa, which is widely used as the reference library for audio manipulation.
On my ARM microcontroller, I am using the
arm_mfcc_f32
callback and thearm_mfcc_init_f32
to initialize the parameters.The parameters have been generated with the scripts from this repository.
Generally, I would expect some small deviation from the results on the two platforms, since CMSIS uses approximated calculations for logarithms, etc; however, I found that the outputs are completely different (one order of magnitude, sign).
I also could not find a clear explanation of which parameters are used in the CMSIS implementation for
power
,dct_type
, andnorm
.Am I missing some steps? Or is this expected altogether?
I believe this would be quite important to clarify, to allow people to design a data pipeline that can be quite reproducible in the microcontroller :)
The text was updated successfully, but these errors were encountered: