Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cqcc feature extract #40

Open
zhengzhezhe opened this issue Jul 23, 2024 · 2 comments
Open

cqcc feature extract #40

zhengzhezhe opened this issue Jul 23, 2024 · 2 comments

Comments

@zhengzhezhe
Copy link

hi
I want to ask about how cqcc features are extracted, which is not the same as the output of the matlab version.

@liufeigit
Copy link
Member

Constant-Q transform:
$$X[k]=\frac1{N[k]}\sum_{n=0}^{N[k]-1}x[n]W_k[n]e^{\frac{-j2\pi Qn}{N[k]} }$$

In the field of music, this transform and the chroma features based on it are commonly used. The standard CQT (Constant-Q Transform) implementation requires an impressive length to achieve a certain frequency resolution according $N[k]=Q\frac{f_s}{f_k}$. Although FFT can be used to accelerate this process, it has become ineffective for most business scenarios.

The typical approach, due to the characteristics of Q fitting musical tones, involves solving N[k] with a much smaller size within each octave. If variable bandwidth ratios are not considered, the filter banks produced in the frequency domain are the same for each octave. Additionally, each time an octave is computed, the data must be downsampled by a factor of 2 to be used in the next octave computation. This method is essentially a hack version of an efficient CQT implementation proposed in the 1990s, and most libraries for standard CQT implementations are based on this paper.

Later, the Non-Stationary Gabor Transform was proposed as an optimal solution to address issues related to CQT. It offers significant improvements in efficiency, effectiveness, and invertibility.

Non-Stationary Gabor Transform:
$$X(m,k)=\frac1{N[k]} \sum_{n=0}^{L-1} x[n] W_k[n]e^{\frac{j2\pi m(n-\omega_k) }{N[k]} }$$

MATLAB cqt is implemented using the Non-Stationary Gabor Transform approach. AudioFlux provides implementations standard CQT and NSGT. So, MATLAB’s CQT and AudioFlux’s NSGT are more consistent with each other.

Finally, in the field of numerical computation, it is challenging for different frameworks to produce exactly the same values. However, the issue you mentioned regarding CQCC is primarily due to different mechanisms in algorithm implementations. Even for the same algorithm, factors such as optimization techniques and precision in numerical computation make it difficult to achieve identical values.

@zhengzhezhe
Copy link
Author

Thank you very much for your reply.
If I want to extract cqcc features of audio using audioflux, is this how I use it:

   import audioflux
   cc = audioflux.CQT()
   m_data_arr = cc.cqt(x)
   fea = cc.cqcc(m_data_arr)
   fea1 = audioflux.utils.delta(fea)
   fea2 = audioflux.utils.delta(fea1)
   fea_cqcc = numpy.concatenate((fea1, fea2, fea), axis=0)

Is this standard CQT ? If I want to align the output with the matlab version, should I replace “cc = audioflux.CQT() m_data_arr = cc.cqt(x) ” with “gg = audioflux.NSGT() m_data_arr = gg.nsgt(x)”

Or can you tell me how to correctly use audioflux to extract cqcc features?thank u~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants