-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AUDIO_WORKLET] Optimise the copy back from wasm's heap to JS #22753
base: main
Are you sure you want to change the base?
Changes from 4 commits
45f9cc4
53e20ca
9b3dbb6
d4680ca
baf7a59
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -37,6 +37,9 @@ function createWasmAudioWorkletProcessor(audioParams) { | |
// 'render quantum size', and this exercise of passing in the value | ||
// shouldn't be required (to be verified). | ||
this.samplesPerChannel = opts['sc']; | ||
// Typed views of the output buffers on the worklet's stack, which after | ||
// creation should only change if the audio chain changes. | ||
this.outputViews = []; | ||
} | ||
|
||
static get parameterDescriptors() { | ||
|
@@ -54,7 +57,7 @@ function createWasmAudioWorkletProcessor(audioParams) { | |
bytesPerChannel = this.samplesPerChannel * 4, | ||
stackMemoryNeeded = (numInputs + numOutputs) * {{{ C_STRUCTS.AudioSampleFrame.__size__ }}}, | ||
oldStackPtr = stackSave(), | ||
inputsPtr, outputsPtr, outputDataPtr, paramsPtr, | ||
inputsPtr, outputsPtr, outputDataPtr, paramsPtr, requiredViews = 0, | ||
didProduceAudio, paramArray; | ||
|
||
// Calculate how much stack space is needed. | ||
|
@@ -93,6 +96,24 @@ function createWasmAudioWorkletProcessor(audioParams) { | |
k += {{{ C_STRUCTS.AudioSampleFrame.__size__ / 4 }}}; | ||
// Reserve space for the output data | ||
dataPtr += bytesPerChannel * i.length; | ||
// How many output views are needed in total? | ||
requiredViews += i.length; | ||
} | ||
|
||
// Verify we have enough views (it doesn't matter if we have too many, any | ||
// excess won't be accessed) then also verify the views' start address | ||
// hasn't changed. | ||
// TODO: allocate space for outputDataPtr before any inputs? | ||
k = outputDataPtr; | ||
if (this.outputViews.length < requiredViews || (this.outputViews.length && this.outputViews[0].byteOffset != k << 2)) { | ||
this.outputViews = []; | ||
for (/*which output*/ i of outputList) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can the length of Perhaps There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It can grow if other nodes are manually added. I’ll need to write contrived examples to see this, since in the real world I don’t think it will. |
||
for (/*which channel*/ j of i) { | ||
this.outputViews.push( | ||
HEAPF32.subarray(k, k += this.samplesPerChannel) | ||
); | ||
} | ||
} | ||
} | ||
|
||
// Copy parameters descriptor structs and data to Wasm | ||
|
@@ -112,14 +133,12 @@ function createWasmAudioWorkletProcessor(audioParams) { | |
// Call out to Wasm callback to perform audio processing | ||
if (didProduceAudio = this.callbackFunction(numInputs, inputsPtr, numOutputs, outputsPtr, numParams, paramsPtr, this.userData)) { | ||
// Read back the produced audio data to all outputs and their channels. | ||
// (A garbage-free function TypedArray.copy(dstTypedArray, dstOffset, | ||
// srcTypedArray, srcOffset, count) would sure be handy.. but web does | ||
// not have one, so manually copy all bytes in) | ||
// The 'outputViews' are subarray views into the heap, each with the | ||
// correct offset and size to be copied directly into the output. | ||
k = 0; | ||
for (i of outputList) { | ||
for (j of i) { | ||
for (k = 0; k < this.samplesPerChannel; ++k) { | ||
j[k] = HEAPF32[outputDataPtr++]; | ||
} | ||
j.set(this.outputViews[k++]); | ||
} | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be ok to turn this around to allocate outputs before inputs. This way it should be possible to only assert() in debug builds that the invariant
this.outputViews[0].byteOffset == k << 2
holds.Also, the stack is already preallocated by the time the above
constructor()
is called, so things can be precomputed there in the ctor instead of at process time for any performance wins.There are two things that I'd recommend:
I think
performance.now()
should work here, so adding calls to performance.now() at very top of process() and at the very end of process() to measure the overall perf differential would probably work.With the original code, it was reasonably "trivial" to statically prove that the code will handle any combination of classes or channels, though when state is being cached like this, there will then be the possibility that different schemes could trip up.
I wonder if a good test case might be to have a scenario where there's simultaneously
a) one stereo source clip -> simple passthrough stereo worklet -> stereo audiocontext speaker destination
b) one mono source clip -> simple passthrough mono worklet -> mono output
c) two copies of mono sources from b) above -> worklet merging (memcpying) these into stereo as Left/Right channels -> stereo audiocontext speaker destination
In such a test, there would be a full combination of "1ch input, 1ch output", "2ch input, 2ch outputs", "two 1ch inputs, 2ch output" cases that I think would cover every possibility (graph would have multiple processor classes and different shaped nodes).
The worklets themselves would be trivial memcpy input -> output implementations so no much complexity to the audio code itself, but they would then test that different shapes of the graph are preserved. Would something like that be a feasible test case to build?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was unexpectedly free this afternoon so already rearranged everything.
I'll look at creating some views in the ctor up front after I've done the timings.
But you're 100% right about timing this, which I'll aim to do once it's in roughly the right shape, and same for the test cases. At work our use is mono mic, stereo out from a wasm mixer, and then combined mic and mixer, but I want to see it tested with more ins and outs like you propose.
(I can dedicate quite a bit of time to this for the next weeks)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not part of the
AudioWorkletGlobalScope
.Date.now()
is available but I just get zeros, though I did see a1
which brought excitement for a brief moment.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the same
Date.now()
to the original code... and it's a lot of zeros, then some occasional ones. I think there are more ones than with the views, but that's not very scientific.What I'll need to do is write something standalone that runs
process()
not in theAudioWorkletGlobalScope
and then benchmark it over many runs on different browser and hardware (which is worth doing).