Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spectrogram processing #288

Open
RobinSchmidt opened this issue Sep 6, 2019 · 68 comments
Open

Spectrogram processing #288

RobinSchmidt opened this issue Sep 6, 2019 · 68 comments

Comments

@RobinSchmidt
Copy link
Owner

RobinSchmidt commented Sep 6, 2019

i open a new thread to continue the discussion which started here:

#280 (comment)

but doesn't really belong to the main topic of this thread. ...so there's a class rsSpectrogram for converting an array of audio-samples into a spectrogram, represented as matrix of complex numbers and/or (re)synthesize an audio signal from such a spectrogram. in between analysis and resynthesis, one can apply arbitrary transformations to the spectrogram data. one of the simplemost things to do is filtering by zeroing out the bins above or below a certain cutoff point. the function spectrogramFilter in the TestsRosicAndRapt project demonstrates how this can be done.

...by writing this test function, i discovered a flaw in the underlying matrix class which unfortunately requires some inconvenient additional copying workaround (there are comments about this in the code). i think, i will soon replace this matrix class - which i wanted to do since some time anyway. i have now other ideas, how a proper implementation of a matrix class should look like (probably next week - i'm a bit sick at the moment)

@elanhickler
Copy link
Contributor

elanhickler commented Sep 6, 2019

Thanks for doing this.

Can you explain how blocksize, hopsize, trafosize would effect the filter?

Edit: Looking at this I can't see where the waveform output is... oh you renamed vector to vec.

@RobinSchmidt
Copy link
Owner Author

RobinSchmidt commented Sep 6, 2019

for filtering, the trafoSize should probably stay fixed at being equal to the blockSize. i think using zero-padding (i.e. trafoSize > blockSize) makes sense only for higher resolution analysis and visualization. it's actually a faux increase in frequency resulution - the spectrum is still washed out, you just get more spectral samples of it. hopSize = blockSize/2 is probably also best left as is. you can get similar faux time-resolution increase for visualization by choosing smaller (power-of-two) fractions here. the only parameter that really matters is the blockSize - it dials in a trade-off between time-resolution and frequency-resolution of the filter. it basically means that longer block-sizes lead to longer ringing filters that give better frequency separation. so the parameter here is the blockSize - the other two are dependent as: trafoSize=blockSize, hopSize=blockSize/2. ...i mean, you can experiment with other settings (i didn't, so far), but i would not really expect much improvement or difference from doing so

@elanhickler
Copy link
Contributor

You have lowpass and highpass here. Wouldn't it be just as good to only do one and then subtract the result from the original?

@RobinSchmidt
Copy link
Owner Author

RobinSchmidt commented Sep 6, 2019

yes, that should work, too. i'm currently just not sure, if the subtraction will work correctly on the spectrogram level due to the flaws and/or bugs in the matrix class (i mean, if you just subtract spectrogram matrices using the "-" operator of the matrix class). but subtracting time-domain signals should work

@elanhickler
Copy link
Contributor

elanhickler commented Sep 6, 2019

ok so something I kinda get but also totally don't get.

You have two classes, the spectrogram and the sinusoidal model stuff.

Let's say I want to create a simple denoiser. 0 out bins that are not above an amplitude threshold. Let's say I want to improve that denoiser by detecting harmonics. Let's say after x seconds the harmonics of the signal start dropping below the noise floor. This is where noise reduction start to become useless. What I really want is to do a guestimate of how long the harmonics last and just resynthesize from scratch. Get rid of everything after X seconds and just resynthesize from there. That's where the sinusoidal model comes in? Is it confusing to combine these two classes into a function? How's this going to work, etc. etc. This is just brainstorming.

Edit: This is basically the function of the sample tail extender but as you said, it doesn't use sinusoidal models? It only uses spectrogram stuff and bin amplitude? So it's not as good?

@elanhickler
Copy link
Contributor

quick question: is it ok to use float types with rsSpectrogram? Because the audio is in floats not doubles.

@RobinSchmidt
Copy link
Owner Author

is it ok to use float types with rsSpectrogram?

i didn't try that but don't see, why it should be a problem. you should just need an appropriate template instantiation

@elanhickler
Copy link
Contributor

What's the demodulation thing about? What is modulation in this case?

@elanhickler
Copy link
Contributor

elanhickler commented Sep 7, 2019

image

I get some distortion at the end of the file using the lowpass/highpass stuff. I resized my audio samples so that it has blocksize number of extra 0s at the end. That seemed to fix it.

Edit: adding a single extra 0 sample seems to work as well. maybe the last number just needs to be 0.

@RobinSchmidt
Copy link
Owner Author

RobinSchmidt commented Sep 7, 2019

What's the demodulation thing about? What is modulation in this case?

before analyzing a frame, an analysis window is applied and after (re)synthesizing a frame, a synthesis window is applied. depending on the choice of these windows (and the ratio of blocksize and hopsize), there may be an amplitude modulation in the resynthesized signal - but this modulation is predictable and can be compensated for. this is the demodulation step.

see here: https://ccrma.stanford.edu/~jos/parshl/Overlap_Add_Synthesis.html

in most implementations of spectrogram processing systems, the window-functions and blocksize-to-hopsize ratio is tuned such that this modulation does not occur (this happens when the (overlapped) products of analysis and synthesis windows sum up to unity). but i wanted to have more freedom in my choices of window-functions, so i incorporated this demodulation step. this will do nothing (i.e. divide by one), in case of a choice where this overlap-to-unity condition is satisfied. in other cases, it divides by the sum of the overlapped window-products

@RobinSchmidt
Copy link
Owner Author

RobinSchmidt commented Sep 7, 2019

I get some distortion at the end of the file

:-O what is this?! i've never seen that. what are your settings? maybe i should try it with your sample? or (easier for me to check) can you produce this artifact also with a simple artificial input signal (noise, dc, sine, whatever?)

maybe the last number just needs to be 0.

this would be really weird!

@RobinSchmidt
Copy link
Owner Author

RobinSchmidt commented Sep 7, 2019

oh - i just noticed that the hopsize should be blockSize/4 and not blockSize/2 for the overlapped windows to sum up to a constant (with the Hann window). blockSize/2 would be appropriate only if the window would be applied once - but it's applied twice (in analysis and synthesis) - but with halving the hopsize, we again get a sum-up-to-constant property. the hann window is really nice in this respect.

so, with blockSize/2, the demodulation actually does something - which could produce artifacts. ...although probably not of the kind that you are seeing at the end. that is probably something else.

i have checked in an update - my experiment now also plots the sum of the overlapped window products - in case you want to take a look and play around with it

@elanhickler
Copy link
Contributor

elanhickler commented Sep 7, 2019

wtf, when I combine two files using JUCE classes the result is that there's errors around -135db. If I combine using REAPER, it is -inf, so, perfect. WHYYYYYY

it's simple math!

image

Edit: Hmm what if the audio file writer is adding dither? No that can't be, then it wouldn't combine well in REAPER.

I changed hopsize to blocksize / 4 and removed my "fix" for the distortion at the end. I'm not getting any distortion now, I'll wait for it to happen again. It's an intermittent problem. Sometimes happens, sometimes doesn't (if I don't have my fix implemented).

@RobinSchmidt
Copy link
Owner Author

RobinSchmidt commented Sep 7, 2019

It's an intermittent problem. Sometimes happens, sometimes doesn't

could it be related to the length of the buffer/file? if it's a "nice" number (maybe that an integer number of blocks fits in or something), it doesn't happen and otherwise it does? or the other way around?

is the juce sample buffer single precision and the reaper double and we see roundoff error here or something? anyway, this is not related to my code, right?

@RobinSchmidt
Copy link
Owner Author

RobinSchmidt commented Sep 7, 2019

btw - i actually would - at the moment - recommend to prepend and append a blocksize worth of zeros before analysis and cutting it off after resynthesis (half or maybe even a quarter of that should actually suffice but better be safe). because my block overlapping may produce fade-in/out effects at start and end (which you won't ever see because of the demodulation - they'll be compensated too - but it's probably better not having to compensate for anything)

...the spectrogram stuff is still very much under development

@RobinSchmidt
Copy link
Owner Author

RobinSchmidt commented Sep 7, 2019

ah - by the way - in the function plotOverlappingWindowSum, i have made a plot of the overlapping windows and their sum, using hopSize=blockSize/4 with the Hann window (squared, because it's applied twice - if it were not squared, blockSize/2 would work). as you see, they sum up to unity:

image

except for the fades at the ends (because there, less windows are contributing to the sum). when you use the same overlap factor with a blackman window, you can clearly see the amplitude modulation:

image

...but i think, for blackman, you can get this sum-to-unity property (at least approximately) with another overlap factor, too - i have too look that up or try it... edit: ah - here:
https://ccrma.stanford.edu/~jos/sasp/COLA_Examples.html

@elanhickler
Copy link
Contributor

Yeah I'll have to experiment with double precision. I wonder if maybe using 64-bit wav files will force JUCE to use higher precision... or at least I should try 32 bit just to see what happens.

@RobinSchmidt
Copy link
Owner Author

RobinSchmidt commented Sep 8, 2019

This is basically the function of the sample tail extender but as you said, it doesn't use sinusoidal models? It only uses spectrogram stuff and bin amplitude? So it's not as good?

in some sense, it creates a (restricted version of a) sinusoidal model from the spectrogram data. for tail extension, i actually think, the general approach is quite appropriate - if it would be improved to estimating the decay rates from the data. but with the current one-decay-rate-for-all-harmonics implementation, the tail sounds rather static and artificial.

...i don't really get, what exactly you get and also don't get

@elanhickler
Copy link
Contributor

elanhickler commented Sep 8, 2019

Yep, using a 32-bit wav file solved the slight error.

OK, so how do you create a smooth spectral filter rather than brick wall? Do you simply reduce amplitude a little more for each frequency? Wouldn't you be able to hear the steppyness of the frequency rows? Especially if the filter frequency was changing over time?

@RobinSchmidt
Copy link
Owner Author

how do you create a smooth spectral filter rather than brick wall? Do you simply reduce amplitude a little more for each frequency?

yes - i would probably linearly fade the amplitudes down over a certain number of bins. and/or maybe with a smoother (sin/cos) shaped function. we would just have to take care, that complementary low- and highpass add up to unity. ...unless you get your highpass by subtraction - then, it would be prefect reconstruction, regardless. however, the highpass may not be an exact mirror image of the lowpass, if you use just any fade-function

Wouldn't you be able to hear the steppyness of the frequency rows? Especially if the filter frequency was changing over time?

i guess, the overlap would smooth these steps out. what exactly do you mean to happen without time-modulation?

@RobinSchmidt
Copy link
Owner Author

RobinSchmidt commented Sep 8, 2019

ahh - i guess, i see what you mean: the rounding of the cutoff-bin to integer values? yeah...i could probably allow float numbers for that by scaling the last bin by a number between 0 and 1. i must think about that

@elanhickler
Copy link
Contributor

elanhickler commented Sep 8, 2019

btw, I don't see me needing a sloped filter. But I guess interpolating values is something that will be needed a lot, especially for a denoiser.

@elanhickler
Copy link
Contributor

elanhickler commented Sep 8, 2019

One thing a lot of my clients ask for is a matching filter, to for example make a loud violin sample sound like a softer violin sample, usually to correct mistakes in the recording process, or even to generate new samples that are in between recorded dynamics. You do this by analyzing the sample you want to manipulate and get some kind of difference based on the sample you want it to sound like.

Do you know how matching stuff works?

Edit: This is a spectral process.

@RobinSchmidt
Copy link
Owner Author

RobinSchmidt commented Sep 8, 2019

Do you know how matching stuff works?

hmm - do you have some example product to show me exactly what you mean? if it's about applying the spectral envelope of one signal to another signal, then i have done such things in the context of my master thesis (it was partially about spectral envelopes - and i implemented a vocoder based on this algorithm https://hal.archives-ouvertes.fr/hal-01161334/document)

@elanhickler
Copy link
Contributor

Oh, well do YOU have some examples?

@RobinSchmidt
Copy link
Owner Author

RobinSchmidt commented Sep 8, 2019

ok - yeah - this is fun. this is me reading and then vocoding the title of my thesis (it's in german - "representation and modification of spectral envelopes of natural sounds based on formant models"):

www.rs-met.com/sounds/effects/ThesisVocoderCarrier.wav
www.rs-met.com/sounds/effects/ThesisVocoderModulator.wav
www.rs-met.com/sounds/effects/ThesisVocoderOutput.wav

the speech of the output is really intelligible .....for germans

@elanhickler
Copy link
Contributor

So you can use this to convolve a soft guitar pluck with a loud guitar pluck and create something in between? Obviously won't sounds as good as the real thing, but just enough to be usable in music composition.

@RobinSchmidt
Copy link
Owner Author

RobinSchmidt commented Sep 8, 2019

well, it's vocoder - so the process is asymmetrical, so it's not like a morph (which i would expect to act somewhat symmetrical - and adjsutable) - carrier and modulator play different roles - but yeah, you get something "in between".

that said - morphing stuff could certainly be done as well. in fact, we'll have a lot of interesting stuff to explore with this spectrogram processor. the basic system is in place and (more or less) working - now the fun can begin

@elanhickler
Copy link
Contributor

elanhickler commented Sep 8, 2019

This is just an image comparing your voice envelope to the output envelope, not sure what purpose this image has. They look like they match up pretty well.
image

http://www.elanhickler.com/transfer/ThesisVocoderOutput_VocodexPlugin.wav

This is Vocodex example but there's some problems with it. It's great for a musical sound but your version sounds like a better starting point to improve on legibility and musicality.

@RobinSchmidt
Copy link
Owner Author

RobinSchmidt commented Sep 8, 2019

it has a totally different character (thinner and a bit gnarly - i like the gnarl! a bit goa'esque). do you know how vocodex works? is it also spectrogram/stft based - or does it use the classical filter bank approach? edit: from the product desciption (https://www.image-line.com/plugins/Effects/Vocodex/):

Up to 100 bands individually locatable anywhere in the spectrum.

...so that probably means filterbank

@elanhickler
Copy link
Contributor

elanhickler commented Sep 11, 2019

http://www.elanhickler.com/transfer/guitar_dynamic_morph_3_to_5.mp3

3 dynamics to 5 dynamics. The two in-between samples are the vocoder output plus I mixed in some of the original audio by hand and adjusted overall amplitude until it sounded right.

@elanhickler
Copy link
Contributor

elanhickler commented Sep 17, 2019

for some reason I am getting an infinte hang after deletion of the 2nd matrix in the code, I think it's the 2nd one if things are deleted in the order they appeared. The debugger isn't being helpful! Nothing seems out of the ordinary.

image

I'm trying to zero out harmonics (small ranges of frequencies)

for (int ch = 0; ch < channels; ++ch)
{
    // compute the complex spectrogram:
    WindowType W = WindowType::hanningZN;
    Spectrogram sp;
    sp.setBlockAndTrafoSize(blockSize, trafoSize);
    sp.setHopSize(hopSize);
    sp.setAnalysisWindowType(W);
    sp.setSynthesisWindowType(W);
    sp.setOutputDemodulation(true);
    Matrix s = sp.complexSpectrogram(origAudio->audio->getReadPointer(ch), samples);

    // workaround to create the deep copies
    int numFrames = s.getNumRows();
    int numBins = sp.getNumNonRedundantBins(); // == s.getNumColumns()        

    Matrix sl(numFrames, numBins);
    sl.copyDataFrom(s);

    vector<int> binsToZero;
    for (double cf = f; cf < sampleRate * 0.5;)
    {
        int lo = sp.frequencyToBinIndex(cf - fRange * 0.5, sampleRate);
        int hi = sp.frequencyToBinIndex(cf + fRange * 0.5, sampleRate);

        for (int b = lo; b < hi; ++b)
            binsToZero.push_back(b);

        cf += f;
    }

    // zero out harmonic bandwiths to get only noise
    for (int i = 0; i < numFrames; i++)
        for (int b : binsToZero)
            s(i, b) = 0;

    // subtract noise only from orignal to get harmonics only
    vec x = sp.synthesize(sl);
    auto ptr = harmAudio->audio->getWritePointer(ch);
    for (int s = 0; s < samples; ++s)
        ptr[s] -= x[s];

    // transfer noise only
    ptr = noisAudio->audio->getWritePointer(ch);
    for (int s = 0; s < samples; ++s)
        ptr[s] = x[s];
} // deletion occurs here!

@elanhickler
Copy link
Contributor

elanhickler commented Sep 17, 2019

I just commented out the deletion function haha. Look at my noise/harmonic spectrograms:
ezgif-5-90b779e121d0 1

I'm using a frequency bandwith of 15hz. That is INSANELY PRECISE! Your bidirectional filters could never do this.

Ok, I need to do another test though. It might be better to capture a large portion of frequencies per harmonic rather than the smallest possible portion. Again your bidirectional filters would fail with this task because it's not as flat (so there would be filter overlap, causing issues)

Also, this function is insanely fast. 1 or 2 seconds to process a 10 second stereo clip.

@elanhickler
Copy link
Contributor

elanhickler commented Sep 17, 2019

Here are the noise only waveforms:

tiny portion per harmonic:
iZotope_RX_3_(64-bit)_2019-09-17_10-36-02

large portion per harmonic:
iZotope_RX_3_(64-bit)_2019-09-17_10-33-06

So, the large portion is bad because you can see that the noise has a regularity to it, you can spot some oscillations. That's bad because it's going to interfere with the resynthesized harmonics when combining together. Looks like it's best to capture as tiny a range of frequencies per harmonic as possible.

Edit: Hmmm, but capturing not just a large portion but EVERYTHING (so there is no noise left), and then resynthesizing it might actually work better because you then don't need to combine noise/harmonics at the end.

Edit: I think this is what I originally wanted to do from the beginning but never could due to not having flat enough filters.

@RobinSchmidt
Copy link
Owner Author

with "portion-per-harmonic" you mean the bandwidth of the bandpass filters to isolate the harmonics?

the large portion is bad because you can see that the noise has a regularity to it, you can spot some oscillations

that makes sense. if the harmonic bandwidth gets wider, the "in-between" bandwidths of the noise bands get narrower. and narrow-bands have high sinusoidality/regularity

@RobinSchmidt
Copy link
Owner Author

RobinSchmidt commented Sep 17, 2019

I think this is what I originally wanted to do from the beginning but never could due to not having flat enough filters.

hmmmm...actually, butterworth filters are quite nicely flat. ...but maybe not steep enough? ...in this case, one could go for elliptics at the cost of ripples (in passband and stopband). the FFT based filters have ripples, too.

That is INSANELY PRECISE! Your bidirectional filters could never do this.

so - this is good, right? i still cannot see, what fixed FFT filters can do that regular time-domain filters can't. maybe i should take the challenge of separating some signal with filters vs FFT. like a sort of battle experiment...haha

@elanhickler
Copy link
Contributor

bidirectional filters can't change frequency right? But you could kinda move the filter around in a spectral process by changing bin volumes. Moving filters are needed.

@RobinSchmidt
Copy link
Owner Author

yes - it's difficult to make time-varying bidirectional IIR filters while preserving their desirable zero-phase properties. with FIR filters it would be easier but computationally expensive, such that you'd probably end up with doing some FFT based process here also (FFT convolution). but at them moment, we are talking about fixed filters, right? i mean the stuff you did above where you said that bidirectional filters could not do it. for time varying stuff - especially tracking harmonics frequencies - i'd also opt for spectrogram based algorithms.

@elanhickler
Copy link
Contributor

elanhickler commented Sep 17, 2019

I'm having some issues with phaselocking, basically the same issues I had last time with bidirectional filters except I think things are sounding better and easier to use. I don't understand how SampleModelling gets perfectly phaselocked samples.

http://www.elanhickler.com/transfer/horn_samples_phaselocked.rar

Included is the original:
image

phaselocked example attempts to capture the phase per harmonic with

auto p = RAPT::rsSinePhaseAt<float>(x.data(), x.size(), x.size() * 0.5);
RAPT::rsRecreateSine<float>(x.data(), xNew.data(), x.size(), currentf, currentf, sampleRate, p, 0);

image

phaselocked_singlePhase example sounds better because I don't take phase measurements but set [p]hase to "0" for all harmonics, whatever that means, but then it loses a lot of stereo information.
image

I think RAPT::rsSinePhaseAt is not working well enough... or maybe I need to make a measurement exactly at a spectral peak, one of those spikes.

@elanhickler
Copy link
Contributor

ALRIGHT YES! Taking the phase measurement precisely at the most energetic spot spectrally has improved the result:
image

Now there's a weird issue with some amplitude modulation circled in red. No idea what that could be from.

@elanhickler
Copy link
Contributor

Can you tell me how to, instead of zeroing out bins, manipulate the amplitude? First, retrieve the amplitude, and then change it to something else.

@RobinSchmidt
Copy link
Owner Author

RobinSchmidt commented Sep 19, 2019

each matrix entry is a complex number, so to get the amplitude, you can just call std::abs on it - the implementation for std::complex will extract the complex magnitude. you can also multiply the complex values by real numbers to change their magnitude. you may also want to look into std::arg and std::polar, if you want to deal with phase separately

@elanhickler
Copy link
Contributor

elanhickler commented Sep 21, 2019

What's an easy way to exponentially reduce the volume of a bin to 0? What equation should I use?

Well I should probably work in terms of time rather than bin id. I'd need to convert a time range to a bin index range.

Also, should I use a sin function to feather the frequency vs bin volume? So you choose a center frequency, then above and below the volume is reduced a bit until 0 as well, so the filter isn't so flat. I might just need like 1 or 2 bins of feathering above and below the target frequency.

edit: I guess time would be numBins / seconds and I can use linear for now.
edit: linear fade sounds good.

@RobinSchmidt
Copy link
Owner Author

I'd need to convert a time range to a bin index range.

i don't understand that. time and frequency are independent dimensions

@elanhickler
Copy link
Contributor

elanhickler commented Sep 23, 2019

I got it working. Maybe I should have said numFrames / seconds. Basic linear fade.

@RobinSchmidt
Copy link
Owner Author

yes - numFrames/seconds makes much more sense

@elanhickler
Copy link
Contributor

elanhickler commented Sep 23, 2019

I'll explain what I'm up to. I'm fading out/removing all audio content except multiples of a frequency. That isolates harmonics. It drastically reduces noise and clicks. The problem is that you hear a buzzing sound because there's still noise in the upper frequencies where the instrument's harmonics decay quickly. To solve this I put a sweeping-down lowpass filter so the spectrum decays in a way that sounds natural. It's just guesswork at that point, by ear. But here's the kicker: you could not use a filter on a normal signal because it would actually sound like a sweeping lowpass filter because of broadband noise. It would sound like "shhhhoooowww". But in this case, since everything but harmonics is isolated, using a filter is viable and sounds correct.

@RobinSchmidt
Copy link
Owner Author

RobinSchmidt commented Nov 1, 2019

    // workaround to create the deep copies
    Matrix sl(numFrames, numBins);
    sl.copyDataFrom(s);

meanwhile, i have implemented my new matrix class and i have updated my spectrogram stuff to use it. now you can simply write:

    Matrix sl = s;

the new class is temporarily named rsMatrixNew, so you may have to update the typedef that is not shown your code. eventually, i want to retire the old implementation and then the new class will be just named rsMatrix again. but during transition (while the old class is still in use), i need this temporary name for distinction

edit: just look at spectrogramFilter() in PhaseVocoderExperiments.cpp how to update the code (your code seemed to be partially copied from there). it's much cleaner now.

@elanhickler
Copy link
Contributor

elanhickler commented Nov 24, 2019

I don't know wtf is going on, but this is really bad. I can't extract harmonics at this crucial time. See code: #288 (comment)

edit: figured it out, I was resynthesizing the wrong fft object.......................... I don't understand this code too well.

@RobinSchmidt
Copy link
Owner Author

RobinSchmidt commented Nov 24, 2019

I was resynthesizing the wrong fft object.......................... I don't understand this code too well

what is an fft object? do you mean the spectrogram matrix? if you think, the code is hard to use and have suggestions for the improvement of the API, let me know. in fact, i'm contemplating a general overhaul of the API - after watching this (very entertaining) video:

https://www.youtube.com/watch?v=5tg1ONG18H8

my code grew organically over almost two decades now and was not at all consistently "designed" in any way (aside from loosely following some conventions that i made up), so it definitely has its quirks - such that time variables are sometimes in seconds, sometimes in milliseconds (i usually opted for whatever i would display to the user on the gui - which was good enough for personal use but is quirky for client code)

@elanhickler
Copy link
Contributor

Matrix s = sp.complexSpectrogram(origAudio->audio->getReadPointer(ch), samples);

// workaround to create the deep copies
int numFrames = s.getNumRows();
int numBins = sp.getNumNonRedundantBins(); // == s.getNumColumns()        

Matrix sl = s;

this is what messed me up, because I don't understand why we throw away the first matrix.

@RobinSchmidt
Copy link
Owner Author

RobinSchmidt commented Nov 25, 2019

oh - you mean, you want to work directly on the matrix s instead of on a copy sl as in "in-place" processing? sure, if you don't need the unprocessed matrix anymore for anything else, you can totally do that. the code from which you copy-and-pasted this, still needs the original matrix later for plotting - but you may not

edit: oh - by the way - the comment is outdated as well - there is no workaround anymore. the deep copy is now done by the assignment with the new matrix class - which is the solution i wanted

@elanhickler
Copy link
Contributor

elanhickler commented Nov 25, 2019

I don't get it. Why would you need an unprocessed matrix for plotting? Wouldn't you want to plot the processed version to see the results? And yes, I don't need to plot results. Can I get rid of the 2nd matrix? Also, I want to move numFrames/numBins and vector<int> binsToZero outside of the loop because that only needs to be calculated once.

Here's my code again:

const int channels = origAudio->getNumChannels();
const int sampleRate = origAudio->getSampleRate();
const int samples = origAudio->getNumFrames();
const int bitDepth = origAudio->getBitDepth();
const double maxf = std::min<double>(20000, sampleRate * 0.5);
const int nHarmonics = int(maxf / f);

for (int ch = 0; ch < channels; ++ch)
{
    // compute the complex spectrogram:
    WindowType W = WindowType::hanningZN;
    Spectrogram sp;
    sp.setBlockAndTrafoSize(blockSize, trafoSize);
    sp.setHopSize(hopSize);
    sp.setAnalysisWindowType(W);
    sp.setSynthesisWindowType(W);
    sp.setOutputDemodulation(true);
    Matrix s = sp.complexSpectrogram(origAudio->audio->getReadPointer(ch), samples);

    // workaround to create the deep copies
    int numFrames = s.getNumRows();
    int numBins = sp.getNumNonRedundantBins(); // == s.getNumColumns()        

    Matrix sl = s;

    vector<int> binsToZero;
    for (double cf = f; cf < sampleRate * 0.5;)
    {
        int lo = sp.frequencyToBinIndex(cf - fRange * 0.5, sampleRate);
        int hi = sp.frequencyToBinIndex(cf + fRange * 0.5, sampleRate);

        for (int b = lo; b < hi; ++b)
            binsToZero.push_back(b);

        cf += f;
    }

    // zero out harmonic bandwiths to get only noise
    for (int i = 0; i < numFrames; i++)
        for (int b : binsToZero)
            sl(i, b) = 0;

    // subtract noise only from orignal to get harmonics only
    vec x = sp.synthesize(sl);
    auto ptr = harmAudio->audio->getWritePointer(ch);
    for (int s = 0; s < samples; ++s)
        ptr[s] -= x[s];

    // transfer noise only
    ptr = noiseAudio->audio->getWritePointer(ch);
    for (int s = 0; s < samples; ++s)
        ptr[s] = x[s];
}

if (noiseOutPath.isNotEmpty())
    noiseAudio->renderFile(noiseOutPath);
if (harmOutPath.isNotEmpty())
    harmAudio->renderFile(harmOutPath);

return var::undefined;

@RobinSchmidt
Copy link
Owner Author

RobinSchmidt commented Nov 26, 2019

Why would you need an unprocessed matrix for plotting? Wouldn't you want to plot the processed version to see the results? And yes, I don't need to plot results. Can I get rid of the 2nd matrix?

because in my test, i plot all three spectrograms - original, lowpassed and highpassed. sure, i could have interleaved the processing and plotting code to get rid of at least of one of the 3 matrices - but this is not production code, so i'd rather focus on readability than efficiency.

Also, I want to move numFrames/numBins and vector binsToZero outside of the loop because that only needs to be calculated once.

i don't see any obvious problems with that - numFrames/numBins seem to be used only once, so you may as well replace their usage directly with the function call. although i think, looping up to numFrames is more readable than to s.getNumRows(), though. matter of taste. i wonder, why you bother with the binsToZero vector at all and not just loop directly for(b = lo; b <= hi; b++)

@RobinSchmidt
Copy link
Owner Author

RobinSchmidt commented Nov 26, 2019

i just renamed rsSpectrogram to rsSpectrogramProcessor - and the member function complexSpectrogram to getComplexSpectrogram. i'm planning to introduce a new class rsComplexSpectrogram which should be a subclass of rsMatrix<std::complex<T>> and will have member function getNumFrames, getNumBins that simply delegate to the getNumRows, getNumColumns function of the matrix baseclass (and the baseclass functions will become unavailable via inheriting protected or private). then you would have something like rsSpectrogramProcessor::getComplexSpectrogram that returns an object of class rsComplexSpectrogram rather than rsSpectrogram::complexSpectrogram returning rsMatrix. that makes much more sense, API-wise

@elanhickler
Copy link
Contributor

good, push changes!

@elanhickler
Copy link
Contributor

elanhickler commented Nov 26, 2019

code with your suggested changes:

for (int ch = 0; ch < channels; ++ch)
{
    // compute the complex spectrogram:
    WindowType W = WindowType::hanningZN;
    Spectrogram sp;
    sp.setBlockAndTrafoSize(blockSize, trafoSize);
    sp.setHopSize(hopSize);
    sp.setAnalysisWindowType(W);
    sp.setSynthesisWindowType(W);
    sp.setOutputDemodulation(true);
    Matrix mat = sp.complexSpectrogram(origAudio->audio->getReadPointer(ch), samples);

    // zero out harmonic bandwiths to get only noise
    for (int i = 0; i < mat.getNumRows(); i++)
    {
        for (double cf = f; cf < sampleRate * 0.5;)
        {
            int lo = sp.frequencyToBinIndex(cf - fRange * 0.5, sampleRate);
            int hi = sp.frequencyToBinIndex(cf + fRange * 0.5, sampleRate);

            for (int b = lo; b < hi; ++b)
                mat(i, b) = 0;

            cf += f;
        }                
    }

    // subtract noise only from orignal to get harmonics only
    vec x = sp.synthesize(mat);
    auto ptr = harmAudio->audio->getWritePointer(ch);
    for (int s = 0; s < samples; ++s)
        ptr[s] -= x[s];

    // transfer noise only
    ptr = noiseAudio->audio->getWritePointer(ch);
    for (int s = 0; s < samples; ++s)
        ptr[s] = x[s];
}

@RobinSchmidt
Copy link
Owner Author

RobinSchmidt commented Nov 26, 2019

i already pushed them. i would also drag out all the setup-work for sp out of the loop over the channels - execept for the last line mat = ... (which is not setup call anyway) - because nothing really changes - you just do all the exact same setup-operations numChannels times when doing it in the loop

@elanhickler
Copy link
Contributor

elanhickler commented Nov 27, 2019

build error, i fixed it like this... not sure if it's correct.

image

@RobinSchmidt
Copy link
Owner Author

RobinSchmidt commented Nov 27, 2019

oh - ok, yes - thanks. i fixed it. i didn't hit this on "build" - only now on "rebuild". apparently, some minimum-rebuild-cache in VS was messed up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants