-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KHR audio framework design proposal #2421
base: main
Are you sure you want to change the base?
Conversation
</td> | ||
<td>string | ||
</td> | ||
<td>Shape in which emitter emits audio (cone, omnidirectional, custom). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is custom
included here? Is there a way of defining a custom shape? How would it be implemented?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be implemented by another glTF extension, extending this extension.
</td> | ||
<td>string | ||
</td> | ||
<td>Shape in which emitter emits audio (cone, omnidirectional, custom). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does omnidirectional
need to be included when it can be achieved with a 360-degree cone angle?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: This is how KHR_audio_emitter works. However, explicitness is nice, so I am wondering if it makes sense to change KHR_audio_emitter to use a type enum like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keeping it explicit to make it runtime friendly.
|
||
|
||
|
||
### 4.3 Oscillator data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Has band-limiting been considered for audio rate oscillator implementations? Is it assumed? Or does it need to be explicitly required? I would think it should be explicitly required if audio quality is a priority.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made that explicit by including a detune parameter. The implementation can then use this with frequency to calculate the desired oscillator frequency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, thanks, but I don't think adding a detune
parameter has anything to do with making band-limiting explicit. If it's assumed that implementations will apply band-limiting to avoid aliasing, then that's fine, but I suspect many implementations will take the easier route of skipping band-limiting if it's not required, which will reduce audio quality.
It may be that this concern is outside the scope of the standard, but if there a lot of poor-quality implementations then it may cause creators to avoid using it, so we might want to make band-limiting required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are various practical methods an implementation might use to prevent aliasing. While the ideal discrete-time digital audio signal is clearly defined mathematically, each implementation must balance the trade-off between computational cost and fidelity to this ideal. It is anticipated that implementations will strive towards this ideal, but it is also reasonable to consider lower-quality, more cost-effective approaches for less powerful hardware. We can include more guidance/recommendation around this.
|
||
### 6.8 Filter node (1 input / 1 output) | ||
|
||
Use the Audio Mixer node to combine the output from multiple audio sources. A filter node always has exactly one input and one output. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks to be partially copied from the previous item by mistake.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
|
||
|
||
|
||
### 6.9 Reverb node (1 input / 1 output) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverb implementations vary widely. Has any thought been given to consistency across implementations for this node? Web-based implementations would have to create their own reverb nodes since there is no WebAudio reverb node, yet. Would a Convolver
node make more sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good callout, extended support for convolution reverb in addition to algorithmic reverb.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, thanks, but this doesn't address my concern about the algorithmic reverb sounding different across implementations. I don't think sound designers are going to use a reverb if it doesn't sound the same everywhere, so how do we get the algorithmic reverb to sound the same on all implementations? Can a specific reverb algorithm be required by the standard, like maybe a basic plate reverb?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Unfortunately, there is no universal standard for reverb implementation, which can vary significantly in core business logic and its complexity. Many implementations remain unpublished. Therefore, the objective is not to restrict reverb to a particular implementation but to rather parameterize it sufficiently to accommodate most implementations. Requiring a specific reverb algorithm as a standard would, in fact, diminish the value for users. That said, one can always leverage convolution reverb with a specific impulse response.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requiring a specific reverb algorithm as a standard would, in fact, diminish the value for users.
Maybe, but I think users won't create graphs with algorithmic reverbs because they won't know how they'll sound on all implementations. Instead, they'll bake the exact reverb they want directly into the audio sources. They'll do this with any audio nodes that vary significantly across implementations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because they won't know how they'll sound on all implementations
That's a valid point, and it extends beyond just different implementations to the same implementation being used across various device types, such as mobile, desktops, and HMDs. Each device type, and indeed each variant within those types, can potentially alter the sound characteristics, making it challenging for users to predict how the reverb will sound without testing it on each device themselves. Moreover, developers often face a trade-off between computational cost and fidelity in their implementations. This balance is crucial to ensure that the reverb not only meets quality expectations but also remains computationally feasible and customizable across diverse hardware platforms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a lot missing from this PR. Is KHR_audio_graph implemented anywhere? Are there any sample assets? Where are the JSON schemas?
Please define the relationship between KHR_audio_graph and KHR_audio_emitter. An audio graph is too complicated for most scenes and engines, which often only need to define basic emission. The Google Slides document linked in KHR_audio_graph mentions Unity DSPGraph, which is an extra package, not part of the base Unity. I wrote a longer list of specific discussion points here: #2137 (comment)
|
||
|
||
|
||
## **Contributors** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Headings shouldn't have any additional formatting symbols.
## **Contributors** | |
## Contributors |
* Chintan Shah, Meta | ||
* Alexey Medvedev, Meta |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Khronos preferred Markdown style is to use -
for unordered lists:
* Chintan Shah, Meta | |
* Alexey Medvedev, Meta | |
- Chintan Shah, Meta | |
- Alexey Medvedev, Meta |
Using this Markdown file: | ||
|
||
1. Paste this output into your source file. | ||
2. See the notes and action items below regarding this conversion run. | ||
3. Check the rendered output (headings, lists, code blocks, tables) for proper | ||
formatting and use a linkchecker before you publish this page. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this leftover from something? It should be removed.
|
||
## Context | ||
|
||
During the recent Khronos 3D formats working group meeting held on 5/29, we reviewed the [proposal to define the KHR audio glTF specification using an audio graph framework](https://docs.google.com/presentation/d/1IrrQaE-jHyzOtFRabjtLAzeP5UOirFOEj8FRADAceqk/edit?usp=sharing). The purpose of this document is to delve deeper into that proposal, offering a comprehensive design of the KHR audio graph. This includes a detailed description of each node object within the graph along with functionality, the specific properties associated with it, and how it interacts with other nodes in the graph. The document is structured to facilitate clear understanding and to solicit feedback on the proposed design. Based on the feedback we will update and finalize the design before it is formally schematized into KHR core audio spec, extensions, animations, and interactivity. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please include the year and use ISO 8601 to make this unambiguous to future readers.
During the recent Khronos 3D formats working group meeting held on 5/29, we reviewed the [proposal to define the KHR audio glTF specification using an audio graph framework](https://docs.google.com/presentation/d/1IrrQaE-jHyzOtFRabjtLAzeP5UOirFOEj8FRADAceqk/edit?usp=sharing). The purpose of this document is to delve deeper into that proposal, offering a comprehensive design of the KHR audio graph. This includes a detailed description of each node object within the graph along with functionality, the specific properties associated with it, and how it interacts with other nodes in the graph. The document is structured to facilitate clear understanding and to solicit feedback on the proposed design. Based on the feedback we will update and finalize the design before it is formally schematized into KHR core audio spec, extensions, animations, and interactivity. | |
During the recent Khronos 3D formats working group meeting held on 2024-05-29, we reviewed the [proposal to define the KHR audio glTF specification using an audio graph framework](https://docs.google.com/presentation/d/1IrrQaE-jHyzOtFRabjtLAzeP5UOirFOEj8FRADAceqk/edit?usp=sharing). The purpose of this document is to delve deeper into that proposal, offering a comprehensive design of the KHR audio graph. This includes a detailed description of each node object within the graph along with functionality, the specific properties associated with it, and how it interacts with other nodes in the graph. The document is structured to facilitate clear understanding and to solicit feedback on the proposed design. Based on the feedback we will update and finalize the design before it is formally schematized into KHR core audio spec, extensions, animations, and interactivity. |
1/ Included detune to oscillator property. The implementation can then combine detune with frequency to calculate the desired oscillator frequency. 2/ Extended reverb node to support algorithmic as well as convolution reverb implementations.
1/ Removed custom enum types, these will be implicit with custom extensions.
Is there any documentation or records of discussions that covers the "why" rather than the "what"? i.e the case for why this belongs in GLTF, discussions on challenges related to implementation and adoption and any exploration of the costs vs the benefits. I have no doubt this conversations happened - are they published anywhere? |
During the recent 3D Formats working group meeting, we reviewed the proposal to define the KHR audio glTF specification using an audio graph framework. The purpose of the current document is to delve deeper into that proposal, offering a comprehensive design of the KHR audio graph. This includes a detailed description of each node object within the graph along with functionality, the specific properties associated with it, and how it interacts with other nodes in the graph. The document is structured to facilitate clear understanding and to solicit feedback on the proposed design. Based on the feedback we will update and finalize the design before it is formally schematized into KHR core audio spec, extensions, animations, and interactivity.