Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KHR audio framework design proposal #2421

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

KHR audio framework design proposal #2421

wants to merge 3 commits into from

Conversation

cashah
Copy link

@cashah cashah commented Jul 2, 2024

During the recent 3D Formats working group meeting, we reviewed the proposal to define the KHR audio glTF specification using an audio graph framework. The purpose of the current document is to delve deeper into that proposal, offering a comprehensive design of the KHR audio graph. This includes a detailed description of each node object within the graph along with functionality, the specific properties associated with it, and how it interacts with other nodes in the graph. The document is structured to facilitate clear understanding and to solicit feedback on the proposed design. Based on the feedback we will update and finalize the design before it is formally schematized into KHR core audio spec, extensions, animations, and interactivity.

@cashah cashah requested a review from rudybear July 2, 2024 16:53
</td>
<td>string
</td>
<td>Shape in which emitter emits audio (cone, omnidirectional, custom).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is custom included here? Is there a way of defining a custom shape? How would it be implemented?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be implemented by another glTF extension, extending this extension.

</td>
<td>string
</td>
<td>Shape in which emitter emits audio (cone, omnidirectional, custom).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does omnidirectional need to be included when it can be achieved with a 360-degree cone angle?

Copy link
Contributor

@aaronfranke aaronfranke Jul 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This is how KHR_audio_emitter works. However, explicitness is nice, so I am wondering if it makes sense to change KHR_audio_emitter to use a type enum like this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping it explicit to make it runtime friendly.




### 4.3 Oscillator data
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has band-limiting been considered for audio rate oscillator implementations? Is it assumed? Or does it need to be explicitly required? I would think it should be explicitly required if audio quality is a priority.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made that explicit by including a detune parameter. The implementation can then use this with frequency to calculate the desired oscillator frequency.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks, but I don't think adding a detune parameter has anything to do with making band-limiting explicit. If it's assumed that implementations will apply band-limiting to avoid aliasing, then that's fine, but I suspect many implementations will take the easier route of skipping band-limiting if it's not required, which will reduce audio quality.

It may be that this concern is outside the scope of the standard, but if there a lot of poor-quality implementations then it may cause creators to avoid using it, so we might want to make band-limiting required.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are various practical methods an implementation might use to prevent aliasing. While the ideal discrete-time digital audio signal is clearly defined mathematically, each implementation must balance the trade-off between computational cost and fidelity to this ideal. It is anticipated that implementations will strive towards this ideal, but it is also reasonable to consider lower-quality, more cost-effective approaches for less powerful hardware. We can include more guidance/recommendation around this.


### 6.8 Filter node (1 input / 1 output)

Use the Audio Mixer node to combine the output from multiple audio sources. A filter node always has exactly one input and one output.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks to be partially copied from the previous item by mistake.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.




### 6.9 Reverb node (1 input / 1 output)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverb implementations vary widely. Has any thought been given to consistency across implementations for this node? Web-based implementations would have to create their own reverb nodes since there is no WebAudio reverb node, yet. Would a Convolver node make more sense?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good callout, extended support for convolution reverb in addition to algorithmic reverb.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks, but this doesn't address my concern about the algorithmic reverb sounding different across implementations. I don't think sound designers are going to use a reverb if it doesn't sound the same everywhere, so how do we get the algorithmic reverb to sound the same on all implementations? Can a specific reverb algorithm be required by the standard, like maybe a basic plate reverb?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Unfortunately, there is no universal standard for reverb implementation, which can vary significantly in core business logic and its complexity. Many implementations remain unpublished. Therefore, the objective is not to restrict reverb to a particular implementation but to rather parameterize it sufficiently to accommodate most implementations. Requiring a specific reverb algorithm as a standard would, in fact, diminish the value for users. That said, one can always leverage convolution reverb with a specific impulse response.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requiring a specific reverb algorithm as a standard would, in fact, diminish the value for users.

Maybe, but I think users won't create graphs with algorithmic reverbs because they won't know how they'll sound on all implementations. Instead, they'll bake the exact reverb they want directly into the audio sources. They'll do this with any audio nodes that vary significantly across implementations.

Copy link
Author

@cashah cashah Jul 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because they won't know how they'll sound on all implementations

That's a valid point, and it extends beyond just different implementations to the same implementation being used across various device types, such as mobile, desktops, and HMDs. Each device type, and indeed each variant within those types, can potentially alter the sound characteristics, making it challenging for users to predict how the reverb will sound without testing it on each device themselves. Moreover, developers often face a trade-off between computational cost and fidelity in their implementations. This balance is crucial to ensure that the reverb not only meets quality expectations but also remains computationally feasible and customizable across diverse hardware platforms.

Copy link
Contributor

@aaronfranke aaronfranke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a lot missing from this PR. Is KHR_audio_graph implemented anywhere? Are there any sample assets? Where are the JSON schemas?

Please define the relationship between KHR_audio_graph and KHR_audio_emitter. An audio graph is too complicated for most scenes and engines, which often only need to define basic emission. The Google Slides document linked in KHR_audio_graph mentions Unity DSPGraph, which is an extra package, not part of the base Unity. I wrote a longer list of specific discussion points here: #2137 (comment)




## **Contributors**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Headings shouldn't have any additional formatting symbols.

Suggested change
## **Contributors**
## Contributors

Comment on lines +34 to +35
* Chintan Shah, Meta
* Alexey Medvedev, Meta
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Khronos preferred Markdown style is to use - for unordered lists:

Suggested change
* Chintan Shah, Meta
* Alexey Medvedev, Meta
- Chintan Shah, Meta
- Alexey Medvedev, Meta

Comment on lines 8 to 13
Using this Markdown file:

1. Paste this output into your source file.
2. See the notes and action items below regarding this conversion run.
3. Check the rendered output (headings, lists, code blocks, tables) for proper
formatting and use a linkchecker before you publish this page.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this leftover from something? It should be removed.


## Context

During the recent Khronos 3D formats working group meeting held on 5/29, we reviewed the [proposal to define the KHR audio glTF specification using an audio graph framework](https://docs.google.com/presentation/d/1IrrQaE-jHyzOtFRabjtLAzeP5UOirFOEj8FRADAceqk/edit?usp=sharing). The purpose of this document is to delve deeper into that proposal, offering a comprehensive design of the KHR audio graph. This includes a detailed description of each node object within the graph along with functionality, the specific properties associated with it, and how it interacts with other nodes in the graph. The document is structured to facilitate clear understanding and to solicit feedback on the proposed design. Based on the feedback we will update and finalize the design before it is formally schematized into KHR core audio spec, extensions, animations, and interactivity.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please include the year and use ISO 8601 to make this unambiguous to future readers.

Suggested change
During the recent Khronos 3D formats working group meeting held on 5/29, we reviewed the [proposal to define the KHR audio glTF specification using an audio graph framework](https://docs.google.com/presentation/d/1IrrQaE-jHyzOtFRabjtLAzeP5UOirFOEj8FRADAceqk/edit?usp=sharing). The purpose of this document is to delve deeper into that proposal, offering a comprehensive design of the KHR audio graph. This includes a detailed description of each node object within the graph along with functionality, the specific properties associated with it, and how it interacts with other nodes in the graph. The document is structured to facilitate clear understanding and to solicit feedback on the proposed design. Based on the feedback we will update and finalize the design before it is formally schematized into KHR core audio spec, extensions, animations, and interactivity.
During the recent Khronos 3D formats working group meeting held on 2024-05-29, we reviewed the [proposal to define the KHR audio glTF specification using an audio graph framework](https://docs.google.com/presentation/d/1IrrQaE-jHyzOtFRabjtLAzeP5UOirFOEj8FRADAceqk/edit?usp=sharing). The purpose of this document is to delve deeper into that proposal, offering a comprehensive design of the KHR audio graph. This includes a detailed description of each node object within the graph along with functionality, the specific properties associated with it, and how it interacts with other nodes in the graph. The document is structured to facilitate clear understanding and to solicit feedback on the proposed design. Based on the feedback we will update and finalize the design before it is formally schematized into KHR core audio spec, extensions, animations, and interactivity.

1/ Included detune to oscillator property. The implementation can then combine detune with frequency to calculate the desired oscillator frequency.
2/ Extended reverb node to support algorithmic as well as convolution reverb implementations.
1/ Removed custom enum types, these will be implicit with custom extensions.
@andybak
Copy link

andybak commented Nov 9, 2024

Is there any documentation or records of discussions that covers the "why" rather than the "what"?

i.e the case for why this belongs in GLTF, discussions on challenges related to implementation and adoption and any exploration of the costs vs the benefits.

I have no doubt this conversations happened - are they published anywhere?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants