Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Work in progress for AudioWorkletNode via Worker thread #124

Merged
merged 105 commits into from
May 15, 2024

Conversation

orottier
Copy link
Collaborator

@orottier orottier commented May 3, 2024

Many many todo, but it works now for single-channel source nodes! examples/audio-worklet.mjs
Relates to #28

Todo

  • pass single input and single output
  • pass all inputs
  • pass all outputs
  • construct with audio param descriptors
  • pass parameterData
  • expose AudioParams on the JS side
  • use apply3 instead of call3 to bind the processor to this
  • return the tailTime result to rust
  • pass the currentTime, currentFrame, sampleRate
  • allow more than one AudioWorkletNode, rework proc123
  • rework horrible send_recv_pair comms channel from main to worker
  • rework horrible send_recv_pair2 comms channel from worker to main
  • the comms channel approach does not work if there are multiple AudioContexts running at the same time, we need to bind the channels to the actual BaseAudioContext instance
  • use the AudioWorkletNodeOptions
  • git bisect and fix the 2728 segmentation fault node examples/audio-worklet.mjs
  • fix Error: called Result::unwrap() on an Err value: RecvError at Immediate.run_loop ([worker eval]:95:5)
  • stdout of the Worker is lost somehow (console.log does not work) --> this is because we never spin the event loop, should be fixed after the refactor needed for the message port
  • do not spawn a Worker thread per Node, but use one for the whole AudioContext
  • conform to the spec (needs context.addModule, only pass the name in the AudioWorkletNode ctor)
  • make the messagePort work. This is tricky because we need to spin the v8 event loop instead of blocking it permanently
  • use an actual MessagePort instance for AudioWorkletNode.port instead of hacking the Worker there
  • set thread priority in Worker (can be done with thread_priority crate)
  • do we want to merge https://github.com/orottier/web-audio-api-rs/compare/feature/param-keys or find another solution?
    • prepare a release
  • test if SharedArrayBuffer from main thread works
  • avoid allocations in process call:
    • reuse params vec
    • in JS can we reuse the Arrays?
  • garbage collect the Js processor in the Worker when finished processing
  • close the AudioWorkletGlobalScope properly so a script can terminate
  • Unsafety checks: what happens if we write to the inputs? what happens if we store the outputs in globalThis and use them after the processor has been dropped? How can we utilize env.freeze/FrozenArray?
  • ...

@orottier orottier requested a review from b-ma May 3, 2024 10:06
index.d.ts Outdated Show resolved Hide resolved
@b-ma
Copy link
Collaborator

b-ma commented May 3, 2024

Cool! I will have a look soon

generator/rs/lib.tmpl.rs Outdated Show resolved Hide resolved
@b-ma
Copy link
Collaborator

b-ma commented May 3, 2024

Just made small change to remove the hardcoded path and have it work on my machine, this is really nice! Great work!

Cargo.toml Outdated Show resolved Hide resolved
src/audio_worklet_node.rs Outdated Show resolved Hide resolved
@b-ma
Copy link
Collaborator

b-ma commented May 4, 2024

[WARNING] make sure to check #124 (comment) to not reintroduce the index.js file and changes in index.d.ts

@b-ma
Copy link
Collaborator

b-ma commented May 4, 2024

I can advance on

  • conform to the spec (needs context.addModule, only pass the name in the AudioWorkletNode ctor)

next, can be done on JS side rather simply I think

@orottier
Copy link
Collaborator Author

orottier commented May 4, 2024

Thanks! Interesting stuff happening regarding the index.js. I will take care

can be done on JS side rather simply I think

Okay that would be helpful!

@b-ma
Copy link
Collaborator

b-ma commented May 12, 2024

For remaining avoid allocations in process call and safety issues, I think the following algorithm would do the trick

  1. Check inputs layout [n][m] compared to js_inputs (only m can change at runtime, as number of inputs is defined at construction)
  2. If layout is not the same we need to rebuild js_inputs (allocations only happen at this step)
    • create new js_inputs Array
    • for each input
      • create new js_input Array
      • shallow copy old js_input (i.e. existing Float32Array channels) into new js_input
      • adjust missing or extra channels
      • freeze js_input Array
    • freeze js_inputs Array
    • re-assign js_inputs to processor[kWorkletInputs]
  3. Repeat 1. and eventually 2. with outputs and js_ouputs
  4. Copy inputs buffers into js_inputs using copy_from_slice
  5. Execute js_process
  6. Copy js_ouputs buffers into outputs using copy_from_slice

@b-ma
Copy link
Collaborator

b-ma commented May 12, 2024

Other things I see that could be nice to have I think:

  • ensure we run run_audio_worklet_global_scope and setImmediate only once per render quantum, but I don't really see how we can achieve that without some hint from the graph... If you have any idea, that's welcome
  • rely on the fact that parameterDescriptors are ordered to avoid transferring parameter keys, the NapiAudioWorlketProcessor::param_values could just be Vec<&'static [f32]>

Then I start to be out of ideas too :)

In any case, here are the latest wpt results:

  RESULTS:
  - # pass: 260
  - # fail: 42
  - # type error issues: 0
> wpt duration: 1:34.799 (m:ss.mmm)

Almost 80% of the tests (332)!

@orottier
Copy link
Collaborator Author

Hey, I skimmed your commits and they look good. Nice to see progress on allocation free JS because it may be the garbage collector playing us parts in realtime safety.
Indeed very nice results on the WPT.

I have been benchmarking a bit and it's surprising to see the click actually happen inside the run_audio_worklet_global_scope call, not outside. This is nice because apparently we are not fighting the event loop but something inside our call into the worklets (still it may be a garbage collect)..

In any case, if I change

-    while let Ok(msg) =
-        process_call_receiver(worklet_id).recv_timeout(std::time::Duration::from_micros(100))
-    {
+    while let Ok(msg) = process_call_receiver(worklet_id).try_recv() {

The load goes up from 11% CPU to 14% on my machine (because we removed some idling) but all the clicks/pop/underruns also seem to dissappear. Can you confirm?

@b-ma
Copy link
Collaborator

b-ma commented May 13, 2024

I just implemented the algorithm I described upper, would be nice to have your review when you find some time (no hurry :), should indeed help minimizing garbage collection I think

The load goes up from 11% CPU to 14% on my machine (because we removed some idling) but all the clicks/pop/underruns also seem to dissappear. Can you confirm?

Actually, this is hard to tell because I do very rarely have clicks on my side now, but if it's better on your side then let's go that way! (maybe we should just make sure it doesn't create issue with the process exit, but has you had ideas to refactor anyway...)

Copy link
Collaborator Author

@orottier orottier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good to me, I have left some nitpicks.

However, the garbage collector stats before this change are equal to afterwards:

node --trace-gc examples/audio-worklet.mjs
gives e.g.
[46986:0x130028000] 5658 ms: Scavenge 5.9 (9.0) -> 4.3 (9.0) MB, 0.17 / 0.00 ms (average mu = 1.000, current mu = 1.000) task;
Meaning it reclaimed from 5.9 to 4.3 MB. More info: https://nodejs.org/en/learn/diagnostics/memory/using-gc-traces

After the change:
[47391:0x128028000] 5354 ms: Scavenge 5.9 (8.7) -> 4.3 (8.7) MB, 0.12 / 0.00 ms (average mu = 1.000, current mu = 1.000) task;

Not sure what to make from this..


for (output_number, output) in outputs.iter().enumerate() {
let mut channels = js_outputs.get_element::<JsObject>(output_number as u32)?;
if !check_same_io_layout(&js_outputs, outputs) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was in doubt if it was possible for output channel counts to change at runtime, but it is indeed possible:
https://webaudio.github.io/web-audio-api/#configuring-channels-with-audioworkletnodeoptions (note at item 4)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup indeed, I think there might be an issue on the rust side on this point, the audioworkletnode-channel-count.https.html wpt test is a good example of what is expected. The test is currently crashing for some reason I didn't really understand, but it seems the rs output length keeps to be 1 even when input is 3

let src = js_channel_buffer.as_ptr();
let dst = channel.as_ptr() as *mut f32;

unsafe {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can just safely use copy_from_slice, no?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was my first try but I did manage to find a way to iter_mut() the &'static [&'static [&'static [f32]]] output...

channels.delete_element(i)?;
for (channel_number, channel) in input.iter().enumerate() {
let js_channel = js_input.get_element::<JsTypedArray>(channel_number as u32)?;
let mut js_channel_value = js_channel.into_value()?;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to check at some point if we don't allocate with some of these calls

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally except in rebuild_io_layout there should be no allocation, but clearly needs to be confirmed

I'm just thinking we could also have a pool of preallocated Float32Array to mitigate further this possible issue

@orottier
Copy link
Collaborator Author

The GC pauses appear to be of the range 0.08 - 0.17 ms and seem not related to the excessive call durations I'm measuring.. So what is stalling my audio thread!?

@b-ma
Copy link
Collaborator

b-ma commented May 14, 2024

However, the garbage collector stats before this change are equal to afterwards:

Hum, no clear idea neither... did you make it run for a long time ? I'm no expert but the few times I had to check such thing, the GC was called every 4-5 sec, so as our example is 8 sec we could just see the result of some initialisation stuff?

The GC pauses appear to be of the range 0.08 - 0.17 ms and seem not related to the excessive call durations I'm measuring.. So what is stalling my audio thread!?

Yup weird, no idea neither... note that I didn't remove the recv_timeout in my last changes, might be somewhere to look at?

@orottier
Copy link
Collaborator Author

orottier commented May 14, 2024

Okay after loads of logging I have come to the surprising conclusion that the call .recv_timeout(std::time::Duration::from_micros(100)) is indeed the culprit. It takes up to 8 milliseconds sometimes which is pretty insane to think about! Reading the issue list of crossbeam seems to indicate that they are doing some weird stuff with spinning and backoffs and possibly that does not work nice on ARM? In any case this is not suitable for our latency-sensitive ping pong of the audio thread and the worker.

So as I already mentioned changing the call to just try_recv solves my underrun issues. I did a bit of further profiling because I was worried the worker thread would max out a single CPU core but I don't see it happening, so possible the event loop does a bit of idling. Which is probably fine for now. The load per core does not exceed 15% for audio-worklet.mjs

I definitely want to look into this further, possible using some primitives from the parking_lot crate to perform the ping pong with CondVars maybe. But I guess that is not a must have before merging.

@orottier
Copy link
Collaborator Author

Regarding

TODO: In the case of OfflineAudioContext, it is probably desired to block here

We could maybe add a method await_all_processors_registered in js/AudioWorklet.js to ensure the number of create-processor is equal to the number of processor-registered and call it before the OfflineAudioContext startRendering?

@b-ma
Copy link
Collaborator

b-ma commented May 15, 2024

Cool, nice that you found the issue! I think we could start to consider merging this, and put all remaining ideas / improvements to a new issue.

Is there anything you think we should really do before?

@b-ma b-ma marked this pull request as ready for review May 15, 2024 06:36
@orottier
Copy link
Collaborator Author

Yeah sounds good! Perhaps only my final comment

@b-ma
Copy link
Collaborator

b-ma commented May 15, 2024

Yup sure, I will have a look

@b-ma b-ma mentioned this pull request May 15, 2024
11 tasks
@b-ma
Copy link
Collaborator

b-ma commented May 15, 2024

Follow up in #127

@b-ma b-ma merged commit 3a0a5ba into main May 15, 2024
5 checks passed
@b-ma
Copy link
Collaborator

b-ma commented May 15, 2024

\o/ great work!

@b-ma b-ma mentioned this pull request May 15, 2024
@b-ma b-ma deleted the feature/audio-worklet-worker branch May 15, 2024 11:04
@jampekka
Copy link
Contributor

Awesome work! 🔥

@orottier
Copy link
Collaborator Author

Quite the milestone indeed :) Congrats to us. And also thanks @jampekka for chiming in on the exploratory phase. An extra pair of eyes is always very motivating.

@orottier
Copy link
Collaborator Author

Please let me know when you have released this on npm. I think it is worth the announcement on the web audio slack. Also I would like to get some tips and tricks from Paul and Hoch in the WG afterwards

@b-ma
Copy link
Collaborator

b-ma commented May 17, 2024

Done [email protected] !

Also I would like to get some tips and tricks from Paul and Hoch in the WG afterwards

Yup, I will try to attend too next times

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants