-
-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spectrogram processing #288
Comments
Thanks for doing this. Can you explain how blocksize, hopsize, trafosize would effect the filter? Edit: Looking at this I can't see where the waveform output is... oh you renamed vector to vec. |
for filtering, the trafoSize should probably stay fixed at being equal to the blockSize. i think using zero-padding (i.e. trafoSize > blockSize) makes sense only for higher resolution analysis and visualization. it's actually a faux increase in frequency resulution - the spectrum is still washed out, you just get more spectral samples of it. hopSize = blockSize/2 is probably also best left as is. you can get similar faux time-resolution increase for visualization by choosing smaller (power-of-two) fractions here. the only parameter that really matters is the blockSize - it dials in a trade-off between time-resolution and frequency-resolution of the filter. it basically means that longer block-sizes lead to longer ringing filters that give better frequency separation. so the parameter here is the blockSize - the other two are dependent as: trafoSize=blockSize, hopSize=blockSize/2. ...i mean, you can experiment with other settings (i didn't, so far), but i would not really expect much improvement or difference from doing so |
You have lowpass and highpass here. Wouldn't it be just as good to only do one and then subtract the result from the original? |
yes, that should work, too. i'm currently just not sure, if the subtraction will work correctly on the spectrogram level due to the flaws and/or bugs in the matrix class (i mean, if you just subtract spectrogram matrices using the "-" operator of the matrix class). but subtracting time-domain signals should work |
ok so something I kinda get but also totally don't get. You have two classes, the spectrogram and the sinusoidal model stuff. Let's say I want to create a simple denoiser. 0 out bins that are not above an amplitude threshold. Let's say I want to improve that denoiser by detecting harmonics. Let's say after x seconds the harmonics of the signal start dropping below the noise floor. This is where noise reduction start to become useless. What I really want is to do a guestimate of how long the harmonics last and just resynthesize from scratch. Get rid of everything after X seconds and just resynthesize from there. That's where the sinusoidal model comes in? Is it confusing to combine these two classes into a function? How's this going to work, etc. etc. This is just brainstorming. Edit: This is basically the function of the sample tail extender but as you said, it doesn't use sinusoidal models? It only uses spectrogram stuff and bin amplitude? So it's not as good? |
quick question: is it ok to use float types with rsSpectrogram? Because the audio is in floats not doubles. |
i didn't try that but don't see, why it should be a problem. you should just need an appropriate template instantiation |
What's the demodulation thing about? What is modulation in this case? |
before analyzing a frame, an analysis window is applied and after (re)synthesizing a frame, a synthesis window is applied. depending on the choice of these windows (and the ratio of blocksize and hopsize), there may be an amplitude modulation in the resynthesized signal - but this modulation is predictable and can be compensated for. this is the demodulation step. see here: https://ccrma.stanford.edu/~jos/parshl/Overlap_Add_Synthesis.html in most implementations of spectrogram processing systems, the window-functions and blocksize-to-hopsize ratio is tuned such that this modulation does not occur (this happens when the (overlapped) products of analysis and synthesis windows sum up to unity). but i wanted to have more freedom in my choices of window-functions, so i incorporated this demodulation step. this will do nothing (i.e. divide by one), in case of a choice where this overlap-to-unity condition is satisfied. in other cases, it divides by the sum of the overlapped window-products |
:-O what is this?! i've never seen that. what are your settings? maybe i should try it with your sample? or (easier for me to check) can you produce this artifact also with a simple artificial input signal (noise, dc, sine, whatever?)
this would be really weird! |
oh - i just noticed that the hopsize should be blockSize/4 and not blockSize/2 for the overlapped windows to sum up to a constant (with the Hann window). blockSize/2 would be appropriate only if the window would be applied once - but it's applied twice (in analysis and synthesis) - but with halving the hopsize, we again get a sum-up-to-constant property. the hann window is really nice in this respect. so, with blockSize/2, the demodulation actually does something - which could produce artifacts. ...although probably not of the kind that you are seeing at the end. that is probably something else. i have checked in an update - my experiment now also plots the sum of the overlapped window products - in case you want to take a look and play around with it |
wtf, when I combine two files using JUCE classes the result is that there's errors around -135db. If I combine using REAPER, it is -inf, so, perfect. WHYYYYYY it's simple math! Edit: Hmm what if the audio file writer is adding dither? No that can't be, then it wouldn't combine well in REAPER. I changed hopsize to blocksize / 4 and removed my "fix" for the distortion at the end. I'm not getting any distortion now, I'll wait for it to happen again. It's an intermittent problem. Sometimes happens, sometimes doesn't (if I don't have my fix implemented). |
could it be related to the length of the buffer/file? if it's a "nice" number (maybe that an integer number of blocks fits in or something), it doesn't happen and otherwise it does? or the other way around? is the juce sample buffer single precision and the reaper double and we see roundoff error here or something? anyway, this is not related to my code, right? |
btw - i actually would - at the moment - recommend to prepend and append a blocksize worth of zeros before analysis and cutting it off after resynthesis (half or maybe even a quarter of that should actually suffice but better be safe). because my block overlapping may produce fade-in/out effects at start and end (which you won't ever see because of the demodulation - they'll be compensated too - but it's probably better not having to compensate for anything) ...the spectrogram stuff is still very much under development |
ah - by the way - in the function plotOverlappingWindowSum, i have made a plot of the overlapping windows and their sum, using hopSize=blockSize/4 with the Hann window (squared, because it's applied twice - if it were not squared, blockSize/2 would work). as you see, they sum up to unity: except for the fades at the ends (because there, less windows are contributing to the sum). when you use the same overlap factor with a blackman window, you can clearly see the amplitude modulation: ...but i think, for blackman, you can get this sum-to-unity property (at least approximately) with another overlap factor, too - i have too look that up or try it... edit: ah - here: |
Yeah I'll have to experiment with double precision. I wonder if maybe using 64-bit wav files will force JUCE to use higher precision... or at least I should try 32 bit just to see what happens. |
in some sense, it creates a (restricted version of a) sinusoidal model from the spectrogram data. for tail extension, i actually think, the general approach is quite appropriate - if it would be improved to estimating the decay rates from the data. but with the current one-decay-rate-for-all-harmonics implementation, the tail sounds rather static and artificial. ...i don't really get, what exactly you get and also don't get |
Yep, using a 32-bit wav file solved the slight error. OK, so how do you create a smooth spectral filter rather than brick wall? Do you simply reduce amplitude a little more for each frequency? Wouldn't you be able to hear the steppyness of the frequency rows? Especially if the filter frequency was changing over time? |
yes - i would probably linearly fade the amplitudes down over a certain number of bins. and/or maybe with a smoother (sin/cos) shaped function. we would just have to take care, that complementary low- and highpass add up to unity. ...unless you get your highpass by subtraction - then, it would be prefect reconstruction, regardless. however, the highpass may not be an exact mirror image of the lowpass, if you use just any fade-function
i guess, the overlap would smooth these steps out. what exactly do you mean to happen without time-modulation? |
ahh - i guess, i see what you mean: the rounding of the cutoff-bin to integer values? yeah...i could probably allow float numbers for that by scaling the last bin by a number between 0 and 1. i must think about that |
btw, I don't see me needing a sloped filter. But I guess interpolating values is something that will be needed a lot, especially for a denoiser. |
One thing a lot of my clients ask for is a matching filter, to for example make a loud violin sample sound like a softer violin sample, usually to correct mistakes in the recording process, or even to generate new samples that are in between recorded dynamics. You do this by analyzing the sample you want to manipulate and get some kind of difference based on the sample you want it to sound like. Do you know how matching stuff works? Edit: This is a spectral process. |
hmm - do you have some example product to show me exactly what you mean? if it's about applying the spectral envelope of one signal to another signal, then i have done such things in the context of my master thesis (it was partially about spectral envelopes - and i implemented a vocoder based on this algorithm https://hal.archives-ouvertes.fr/hal-01161334/document) |
Oh, well do YOU have some examples? |
ok - yeah - this is fun. this is me reading and then vocoding the title of my thesis (it's in german - "representation and modification of spectral envelopes of natural sounds based on formant models"): www.rs-met.com/sounds/effects/ThesisVocoderCarrier.wav the speech of the output is really intelligible .....for germans |
So you can use this to convolve a soft guitar pluck with a loud guitar pluck and create something in between? Obviously won't sounds as good as the real thing, but just enough to be usable in music composition. |
well, it's vocoder - so the process is asymmetrical, so it's not like a morph (which i would expect to act somewhat symmetrical - and adjsutable) - carrier and modulator play different roles - but yeah, you get something "in between". that said - morphing stuff could certainly be done as well. in fact, we'll have a lot of interesting stuff to explore with this spectrogram processor. the basic system is in place and (more or less) working - now the fun can begin |
This is just an image comparing your voice envelope to the output envelope, not sure what purpose this image has. They look like they match up pretty well. http://www.elanhickler.com/transfer/ThesisVocoderOutput_VocodexPlugin.wav This is Vocodex example but there's some problems with it. It's great for a musical sound but your version sounds like a better starting point to improve on legibility and musicality. |
it has a totally different character (thinner and a bit gnarly - i like the gnarl! a bit goa'esque). do you know how vocodex works? is it also spectrogram/stft based - or does it use the classical filter bank approach? edit: from the product desciption (https://www.image-line.com/plugins/Effects/Vocodex/):
...so that probably means filterbank |
http://www.elanhickler.com/transfer/guitar_dynamic_morph_3_to_5.mp3 3 dynamics to 5 dynamics. The two in-between samples are the vocoder output plus I mixed in some of the original audio by hand and adjusted overall amplitude until it sounded right. |
for some reason I am getting an infinte hang after deletion of the 2nd matrix in the code, I think it's the 2nd one if things are deleted in the order they appeared. The debugger isn't being helpful! Nothing seems out of the ordinary. I'm trying to zero out harmonics (small ranges of frequencies) for (int ch = 0; ch < channels; ++ch)
{
// compute the complex spectrogram:
WindowType W = WindowType::hanningZN;
Spectrogram sp;
sp.setBlockAndTrafoSize(blockSize, trafoSize);
sp.setHopSize(hopSize);
sp.setAnalysisWindowType(W);
sp.setSynthesisWindowType(W);
sp.setOutputDemodulation(true);
Matrix s = sp.complexSpectrogram(origAudio->audio->getReadPointer(ch), samples);
// workaround to create the deep copies
int numFrames = s.getNumRows();
int numBins = sp.getNumNonRedundantBins(); // == s.getNumColumns()
Matrix sl(numFrames, numBins);
sl.copyDataFrom(s);
vector<int> binsToZero;
for (double cf = f; cf < sampleRate * 0.5;)
{
int lo = sp.frequencyToBinIndex(cf - fRange * 0.5, sampleRate);
int hi = sp.frequencyToBinIndex(cf + fRange * 0.5, sampleRate);
for (int b = lo; b < hi; ++b)
binsToZero.push_back(b);
cf += f;
}
// zero out harmonic bandwiths to get only noise
for (int i = 0; i < numFrames; i++)
for (int b : binsToZero)
s(i, b) = 0;
// subtract noise only from orignal to get harmonics only
vec x = sp.synthesize(sl);
auto ptr = harmAudio->audio->getWritePointer(ch);
for (int s = 0; s < samples; ++s)
ptr[s] -= x[s];
// transfer noise only
ptr = noisAudio->audio->getWritePointer(ch);
for (int s = 0; s < samples; ++s)
ptr[s] = x[s];
} // deletion occurs here! |
Here are the noise only waveforms: So, the large portion is bad because you can see that the noise has a regularity to it, you can spot some oscillations. That's bad because it's going to interfere with the resynthesized harmonics when combining together. Looks like it's best to capture as tiny a range of frequencies per harmonic as possible. Edit: Hmmm, but capturing not just a large portion but EVERYTHING (so there is no noise left), and then resynthesizing it might actually work better because you then don't need to combine noise/harmonics at the end. Edit: I think this is what I originally wanted to do from the beginning but never could due to not having flat enough filters. |
with "portion-per-harmonic" you mean the bandwidth of the bandpass filters to isolate the harmonics?
that makes sense. if the harmonic bandwidth gets wider, the "in-between" bandwidths of the noise bands get narrower. and narrow-bands have high sinusoidality/regularity |
hmmmm...actually, butterworth filters are quite nicely flat. ...but maybe not steep enough? ...in this case, one could go for elliptics at the cost of ripples (in passband and stopband). the FFT based filters have ripples, too.
so - this is good, right? i still cannot see, what fixed FFT filters can do that regular time-domain filters can't. maybe i should take the challenge of separating some signal with filters vs FFT. like a sort of battle experiment...haha |
bidirectional filters can't change frequency right? But you could kinda move the filter around in a spectral process by changing bin volumes. Moving filters are needed. |
yes - it's difficult to make time-varying bidirectional IIR filters while preserving their desirable zero-phase properties. with FIR filters it would be easier but computationally expensive, such that you'd probably end up with doing some FFT based process here also (FFT convolution). but at them moment, we are talking about fixed filters, right? i mean the stuff you did above where you said that bidirectional filters could not do it. for time varying stuff - especially tracking harmonics frequencies - i'd also opt for spectrogram based algorithms. |
I'm having some issues with phaselocking, basically the same issues I had last time with bidirectional filters except I think things are sounding better and easier to use. I don't understand how SampleModelling gets perfectly phaselocked samples. http://www.elanhickler.com/transfer/horn_samples_phaselocked.rar phaselocked example attempts to capture the phase per harmonic with auto p = RAPT::rsSinePhaseAt<float>(x.data(), x.size(), x.size() * 0.5);
RAPT::rsRecreateSine<float>(x.data(), xNew.data(), x.size(), currentf, currentf, sampleRate, p, 0); phaselocked_singlePhase example sounds better because I don't take phase measurements but set [p]hase to "0" for all harmonics, whatever that means, but then it loses a lot of stereo information. I think |
Can you tell me how to, instead of zeroing out bins, manipulate the amplitude? First, retrieve the amplitude, and then change it to something else. |
each matrix entry is a complex number, so to get the amplitude, you can just call |
What's an easy way to exponentially reduce the volume of a bin to 0? What equation should I use? Well I should probably work in terms of time rather than bin id. I'd need to convert a time range to a bin index range. Also, should I use a sin function to feather the frequency vs bin volume? So you choose a center frequency, then above and below the volume is reduced a bit until 0 as well, so the filter isn't so flat. I might just need like 1 or 2 bins of feathering above and below the target frequency. edit: I guess time would be |
i don't understand that. time and frequency are independent dimensions |
I got it working. Maybe I should have said numFrames / seconds. Basic linear fade. |
yes - numFrames/seconds makes much more sense |
I'll explain what I'm up to. I'm fading out/removing all audio content except multiples of a frequency. That isolates harmonics. It drastically reduces noise and clicks. The problem is that you hear a buzzing sound because there's still noise in the upper frequencies where the instrument's harmonics decay quickly. To solve this I put a sweeping-down lowpass filter so the spectrum decays in a way that sounds natural. It's just guesswork at that point, by ear. But here's the kicker: you could not use a filter on a normal signal because it would actually sound like a sweeping lowpass filter because of broadband noise. It would sound like "shhhhoooowww". But in this case, since everything but harmonics is isolated, using a filter is viable and sounds correct. |
// workaround to create the deep copies
Matrix sl(numFrames, numBins);
sl.copyDataFrom(s); meanwhile, i have implemented my new matrix class and i have updated my spectrogram stuff to use it. now you can simply write: Matrix sl = s; the new class is temporarily named edit: just look at |
I don't know wtf is going on, but this is really bad. I can't extract harmonics at this crucial time. See code: #288 (comment) edit: figured it out, I was resynthesizing the wrong fft object.......................... I don't understand this code too well. |
what is an fft object? do you mean the spectrogram matrix? if you think, the code is hard to use and have suggestions for the improvement of the API, let me know. in fact, i'm contemplating a general overhaul of the API - after watching this (very entertaining) video: https://www.youtube.com/watch?v=5tg1ONG18H8 my code grew organically over almost two decades now and was not at all consistently "designed" in any way (aside from loosely following some conventions that i made up), so it definitely has its quirks - such that time variables are sometimes in seconds, sometimes in milliseconds (i usually opted for whatever i would display to the user on the gui - which was good enough for personal use but is quirky for client code) |
Matrix s = sp.complexSpectrogram(origAudio->audio->getReadPointer(ch), samples);
// workaround to create the deep copies
int numFrames = s.getNumRows();
int numBins = sp.getNumNonRedundantBins(); // == s.getNumColumns()
Matrix sl = s; this is what messed me up, because I don't understand why we throw away the first matrix. |
oh - you mean, you want to work directly on the matrix edit: oh - by the way - the comment is outdated as well - there is no workaround anymore. the deep copy is now done by the assignment with the new matrix class - which is the solution i wanted |
I don't get it. Why would you need an unprocessed matrix for plotting? Wouldn't you want to plot the processed version to see the results? And yes, I don't need to plot results. Can I get rid of the 2nd matrix? Also, I want to move Here's my code again: const int channels = origAudio->getNumChannels();
const int sampleRate = origAudio->getSampleRate();
const int samples = origAudio->getNumFrames();
const int bitDepth = origAudio->getBitDepth();
const double maxf = std::min<double>(20000, sampleRate * 0.5);
const int nHarmonics = int(maxf / f);
for (int ch = 0; ch < channels; ++ch)
{
// compute the complex spectrogram:
WindowType W = WindowType::hanningZN;
Spectrogram sp;
sp.setBlockAndTrafoSize(blockSize, trafoSize);
sp.setHopSize(hopSize);
sp.setAnalysisWindowType(W);
sp.setSynthesisWindowType(W);
sp.setOutputDemodulation(true);
Matrix s = sp.complexSpectrogram(origAudio->audio->getReadPointer(ch), samples);
// workaround to create the deep copies
int numFrames = s.getNumRows();
int numBins = sp.getNumNonRedundantBins(); // == s.getNumColumns()
Matrix sl = s;
vector<int> binsToZero;
for (double cf = f; cf < sampleRate * 0.5;)
{
int lo = sp.frequencyToBinIndex(cf - fRange * 0.5, sampleRate);
int hi = sp.frequencyToBinIndex(cf + fRange * 0.5, sampleRate);
for (int b = lo; b < hi; ++b)
binsToZero.push_back(b);
cf += f;
}
// zero out harmonic bandwiths to get only noise
for (int i = 0; i < numFrames; i++)
for (int b : binsToZero)
sl(i, b) = 0;
// subtract noise only from orignal to get harmonics only
vec x = sp.synthesize(sl);
auto ptr = harmAudio->audio->getWritePointer(ch);
for (int s = 0; s < samples; ++s)
ptr[s] -= x[s];
// transfer noise only
ptr = noiseAudio->audio->getWritePointer(ch);
for (int s = 0; s < samples; ++s)
ptr[s] = x[s];
}
if (noiseOutPath.isNotEmpty())
noiseAudio->renderFile(noiseOutPath);
if (harmOutPath.isNotEmpty())
harmAudio->renderFile(harmOutPath);
return var::undefined; |
because in my test, i plot all three spectrograms - original, lowpassed and highpassed. sure, i could have interleaved the processing and plotting code to get rid of at least of one of the 3 matrices - but this is not production code, so i'd rather focus on readability than efficiency.
i don't see any obvious problems with that - numFrames/numBins seem to be used only once, so you may as well replace their usage directly with the function call. although i think, looping up to numFrames is more readable than to s.getNumRows(), though. matter of taste. i wonder, why you bother with the binsToZero vector at all and not just loop directly |
i just renamed |
good, push changes! |
code with your suggested changes: for (int ch = 0; ch < channels; ++ch)
{
// compute the complex spectrogram:
WindowType W = WindowType::hanningZN;
Spectrogram sp;
sp.setBlockAndTrafoSize(blockSize, trafoSize);
sp.setHopSize(hopSize);
sp.setAnalysisWindowType(W);
sp.setSynthesisWindowType(W);
sp.setOutputDemodulation(true);
Matrix mat = sp.complexSpectrogram(origAudio->audio->getReadPointer(ch), samples);
// zero out harmonic bandwiths to get only noise
for (int i = 0; i < mat.getNumRows(); i++)
{
for (double cf = f; cf < sampleRate * 0.5;)
{
int lo = sp.frequencyToBinIndex(cf - fRange * 0.5, sampleRate);
int hi = sp.frequencyToBinIndex(cf + fRange * 0.5, sampleRate);
for (int b = lo; b < hi; ++b)
mat(i, b) = 0;
cf += f;
}
}
// subtract noise only from orignal to get harmonics only
vec x = sp.synthesize(mat);
auto ptr = harmAudio->audio->getWritePointer(ch);
for (int s = 0; s < samples; ++s)
ptr[s] -= x[s];
// transfer noise only
ptr = noiseAudio->audio->getWritePointer(ch);
for (int s = 0; s < samples; ++s)
ptr[s] = x[s];
} |
i already pushed them. i would also drag out all the setup-work for sp out of the loop over the channels - execept for the last line |
oh - ok, yes - thanks. i fixed it. i didn't hit this on "build" - only now on "rebuild". apparently, some minimum-rebuild-cache in VS was messed up. |
i open a new thread to continue the discussion which started here:
#280 (comment)
but doesn't really belong to the main topic of this thread. ...so there's a class rsSpectrogram for converting an array of audio-samples into a spectrogram, represented as matrix of complex numbers and/or (re)synthesize an audio signal from such a spectrogram. in between analysis and resynthesis, one can apply arbitrary transformations to the spectrogram data. one of the simplemost things to do is filtering by zeroing out the bins above or below a certain cutoff point. the function
spectrogramFilter
in the TestsRosicAndRapt project demonstrates how this can be done....by writing this test function, i discovered a flaw in the underlying matrix class which unfortunately requires some inconvenient additional copying workaround (there are comments about this in the code). i think, i will soon replace this matrix class - which i wanted to do since some time anyway. i have now other ideas, how a proper implementation of a matrix class should look like (probably next week - i'm a bit sick at the moment)
The text was updated successfully, but these errors were encountered: