[AUDIO_WORKLET] Optimise the copy back from wasm's heap to JS #22753

cwoffenden · 2024-10-16T14:39:28Z

This builds on #22741 just because that's where I was at, but it's not required. The interesting changes are in audio_worklet.js and I'd appreciate some feedback from @juj before tidying this up (with sanity checks and a fallback).

Since we pass in the stack for the worklet from the caller's heap, its address shouldn't change. And since the (I'll make myself say it) render quantum size doesn't change after the audio worklet creation, the stack positions for the audio buffers should not change either. So, we can create one-time subarray views and replace the float-by-float copy with a simple set() per channel (per output).

I've thrown simple tests at it at and it works, fulfilling the garbage-free requirement and theoretically having a nice performance boost (not measured, but looping over thousands of JS Number types and shuffling them to and from floats must come at a cost). If the outputList does change, then it should only change after changes to the audio chain, which would be expensive enough that changing the subarrays wouldn't make a difference.

To be extra sure, we can move the output buffers to the first entries on the stack, then simple additional changes like input buffers won't change the address.

It wants sanity checks here and there but I'd like feedback for anything I'm missing or misunderstanding. Thanks!

juj · 2024-10-17T18:05:18Z

Nice idea! It looks like this PR carries the rename from the other PR. Rebase/merge should hide it?

sbc100 · 2024-10-17T18:28:44Z

src/audio_worklet.js

+        for (/*which output*/ i of outputList) {
+          for (/*which channel*/ j of i) {
+            this.outputViews.push({
+              // dataPtr is the sanity check (to be implemented)


If its a sanity check then perhaps put it behind if ASSERTIONS?

I’ve some rearranging to do, to make sure the addresses are fixed, and if those change from the base I’ll add assertions.

I’ll make this a draft and work on it next week, adding some robustness.

If its a sanity check then perhaps put it behind if ASSERTIONS?

Interestingly (unless I'm doing something hideous) but assert() isn't available in the AudioWorkletGlobalScope, and I can't add it as a dependency the same as stackAlloc(), etc. This stuff I'm very rusty with, mind.

sbc100 · 2024-10-17T18:30:02Z

src/audio_worklet.js

+      if (!this.outputViews) {
+        this.outputViews = [];
+        k = outputDataPtr;
+        for (/*which output*/ i of outputList) {


Can the length of outputList be different in each call?

Perhaps assert(this.outputViews.length == outputList.length) after this block to be clear?

It can grow if other nodes are manually added. I’ll need to write contrived examples to see this, since in the real world I don’t think it will.

cwoffenden · 2024-10-17T19:19:33Z

Nice idea!

A shower thought, as all good ideas are!

It looks like this PR carries the rename from the other PR. Rebase/merge should hide it?

Yes, this one was rebased off the earlier one so should have the same commit IDs.

We can remove the float-by-float JS copy and replace with this simple TypedArray set() calls.

Typed views are recreated if needed but otherwise are reused.

juj · 2024-10-18T14:12:54Z

src/audio_worklet.js

+      // Verify we have enough views (it doesn't matter if we have too many, any
+      // excess won't be accessed) then also verify the views' start address
+      // hasn't changed.
+      // TODO: allocate space for outputDataPtr before any inputs?


It would be ok to turn this around to allocate outputs before inputs. This way it should be possible to only assert() in debug builds that the invariant this.outputViews[0].byteOffset == k << 2 holds.

Also, the stack is already preallocated by the time the above constructor() is called, so things can be precomputed there in the ctor instead of at process time for any performance wins.

There are two things that I'd recommend:

benchmark the effect of this optimization. Since the code does become more sophisticated/more dependent on things having been pre-set up, it would be good to confirm that there is actually a performance benefit to calling TypedArray.set() instead of manually looping the samples.

I think performance.now() should work here, so adding calls to performance.now() at very top of process() and at the very end of process() to measure the overall perf differential would probably work.

It would be important to test two scenarios: a) there being multiple distinct audio processor classes, and b) there being different node paths in the same graph that have different number of inputs/outputs channels.

With the original code, it was reasonably "trivial" to statically prove that the code will handle any combination of classes or channels, though when state is being cached like this, there will then be the possibility that different schemes could trip up.

I wonder if a good test case might be to have a scenario where there's simultaneously
a) one stereo source clip -> simple passthrough stereo worklet -> stereo audiocontext speaker destination
b) one mono source clip -> simple passthrough mono worklet -> mono output
c) two copies of mono sources from b) above -> worklet merging (memcpying) these into stereo as Left/Right channels -> stereo audiocontext speaker destination

In such a test, there would be a full combination of "1ch input, 1ch output", "2ch input, 2ch outputs", "two 1ch inputs, 2ch output" cases that I think would cover every possibility (graph would have multiple processor classes and different shaped nodes).

The worklets themselves would be trivial memcpy input -> output implementations so no much complexity to the audio code itself, but they would then test that different shapes of the graph are preserved. Would something like that be a feasible test case to build?

It would be ok to turn this around to allocate outputs before inputs. This way it should be possible to only assert() in debug builds that the invariant this.outputViews[0].byteOffset == k << 2 holds.

I was unexpectedly free this afternoon so already rearranged everything.

Also, the stack is already preallocated by the time the above constructor() is called, so things can be precomputed there in the ctor instead of at process time for any performance wins.

I'll look at creating some views in the ctor up front after I've done the timings.

But you're 100% right about timing this, which I'll aim to do once it's in roughly the right shape, and same for the test cases. At work our use is mono mic, stereo out from a wasm mixer, and then combined mic and mixer, but I want to see it tested with more ins and outs like you propose.

(I can dedicate quite a bit of time to this for the next weeks)

I think performance.now() should work here

It's not part of the AudioWorkletGlobalScope. Date.now() is available but I just get zeros, though I did see a 1 which brought excitement for a brief moment.

I added the same Date.now() to the original code... and it's a lot of zeros, then some occasional ones. I think there are more ones than with the views, but that's not very scientific.

What I'll need to do is write something standalone that runs process() not in the AudioWorkletGlobalScope and then benchmark it over many runs on different browser and hardware (which is worth doing).

Lots of juggling with the various pointers, and next will be to reduce the code and move all of the output first to stop repeating some of the calculations. Some can also move to the constructor.

cwoffenden changed the title ~~[AUDIO_WORKET] Optimise the copy back from wasm's heap to JS~~ [AUDIO_WORKLET] Optimise the copy back from wasm's heap to JS Oct 16, 2024

cwoffenden mentioned this pull request Oct 16, 2024

[AUDIO_WORKLET] Reword API to make it clearer #22741

Merged

cwoffenden force-pushed the cw-audio-tweaks-3 branch 2 times, most recently from 38203f0 to 24a90e4 Compare October 17, 2024 13:01

sbc100 reviewed Oct 17, 2024

View reviewed changes

cwoffenden added 4 commits October 18, 2024 08:31

Logging and notes for me

45f9cc4

Better error message (to see why it fails)

53e20ca

Create one-time fixed views into the heap

9b3dbb6

We can remove the float-by-float JS copy and replace with this simple TypedArray set() calls.

Allow the number of channels to increase (or the audio chain to change)

d4680ca

Typed views are recreated if needed but otherwise are reused.

cwoffenden force-pushed the cw-audio-tweaks-3 branch from d186eb5 to d4680ca Compare October 18, 2024 06:32

cwoffenden marked this pull request as draft October 18, 2024 06:35

juj reviewed Oct 18, 2024

View reviewed changes

Work in progress, moved the output buffers first

baf7a59

Lots of juggling with the various pointers, and next will be to reduce the code and move all of the output first to stop repeating some of the calculations. Some can also move to the constructor.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AUDIO_WORKLET] Optimise the copy back from wasm's heap to JS #22753

[AUDIO_WORKLET] Optimise the copy back from wasm's heap to JS #22753

cwoffenden commented Oct 16, 2024 •

edited

Loading

juj commented Oct 17, 2024

sbc100 Oct 17, 2024

cwoffenden Oct 17, 2024

cwoffenden Oct 18, 2024

sbc100 Oct 17, 2024

cwoffenden Oct 17, 2024

cwoffenden commented Oct 17, 2024

juj Oct 18, 2024

cwoffenden Oct 18, 2024

cwoffenden Oct 18, 2024

cwoffenden Oct 18, 2024

[AUDIO_WORKLET] Optimise the copy back from wasm's heap to JS #22753

Are you sure you want to change the base?

[AUDIO_WORKLET] Optimise the copy back from wasm's heap to JS #22753

Conversation

cwoffenden commented Oct 16, 2024 • edited Loading

juj commented Oct 17, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cwoffenden commented Oct 17, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cwoffenden commented Oct 16, 2024 •

edited

Loading