Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Project: AIFF audio files #1

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
104 changes: 104 additions & 0 deletions audio/aiff/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
# .aiff: Audio Interchange File Format
While not a common audio format in the modern era, AIFF is relatively simple compared to popular formats such as MP3, OGG Theora or FLAC because it can store uncompressed audio data. Use the description of the format below, and write a program that generates an audio file.

## A basic intro to Digital Audio
When dealing with analog audio, you generally treat it as a continuous waveform measured at a given point in space, where the amplitude represents the relative pressure level at that point. The red line below shows an example of an analog waveform:

![Sound wave with PCM](https://upload.wikimedia.org/wikipedia/commons/2/21/4-bit-linear-PCM.svg)

However, you would need an infinite amount of information to store the exact waveform in digital form, as there are an infinite number of time and pressure values to be stored. Since this is prohibitive for a digital system, we need to quantize it.

A very common approach to quantize audio is PCM (Pulse Code Modulation), shown in the blue dots above. The scheme boils down to measuring the amplitude of the signal at discrete time intervals, and further discretizing the measured amplitude into a signed binary number.

The number of divisions in the time dimension is the *sample rate* of the scheme, and the number of bits required to represent all possible amplitude values is the *bit depth*.

In the above example, if we assume the wave is taken over 1 millisecond, there are 23 samples across that interval which gives us a sample rate of **23,000 Hz**. And since it takes 4 bits to represent the 16 amplitude values (-8 to +7), it has a bit-depth of **4**.

## Structure of an .aiff file
To simplify things, we will work with the original AIFF standard (more details can be found [here](https://www-mmsp.ece.mcgill.ca/Documents/AudioFormats/AIFF/Docs/AIFF-1.3.pdf)).

**NOTE**: This format was defined by Apple when they were using the Motorola 68000 as their CPU, which operated with Big Endian values. Hence, all multi-byte values in this format are **Big Endian** in nature.

AIFF files are raw binary files, built up using "chunks" that have an ID and data:
```c
Chunk {
// ID stands for char[4], and represents a text word that identifies the chunk

ID chunk_id; // The type of this chunk
i32 chunk_size; // Number of bytes taken up by the chunk_data section

u8[] chunk_data; // The data stored in this chunk
}
```

The whole .aiff file is made up of one such "container" chunk called a *Form*, which stores other chunks:
```c
AiffFormChunk {
// These are fields from the generic Chunk type
ID chunk_id; // Must be set to "FORM"
i32 chunk_size; // This is set to the total size of form_type and chunks (in bytes)

// These two below form the `chunk_data` of the generic Chunk
ID form_type; // Must be set to "AIFF"
Chunk[] chunks; // All sub-chunks in this container
}
```

There are two mandatory chunks for a valid AIFF file: the Common chunk, and the Sound Data chunk.

### Common chunk
Programs require metadata to understand how to read the audio stored in the Sound Data chunk:
```c
CommonChunk {
// These are fields from the generic Chunk type
ID chunk_id; // Must be set to "COMM"
i32 chunk_size; // This is set to 18: (16 + 32 + 16 + 80) / 8

// These form the `chunk_data` of the generic Chunk
i16 num_channels; // Number of audio channels
u32 num_sample_frames; // Number of Sample Frames
i16 sample_size; // Bit depth of each sample point
f80 sample_rate; // Number of sample frames per second
}
```

Since the format is capable of handling multi-channel audio, instead of storing a single amplitude value per point in time, we store one amplitude per channel. The spec calls this a "Sample Frame", with these parameters in the Common Chunk describing the structure of these frames:

1. `num_channels`: This is the number of channels in each frame. Examples are Mono (1 channel), Stereo (2 channels) or Surround (3+ channels) sound.
2. `num_sample_frames`: This is the sample rate of the stored sound in terms of frames per second.
3. `sample_size`: This is the bit-depth of each sample point within a frame. It's best to store 8, 16 or 32 here, since these form integer multiples of a byte.

### Sound Data chunk
The actual PCM audio is stored here:
```c
SoundDataChunk {
// These are fields from the generic Chunk type
ID chunk_id; // Must be set to "SSND"
i32 chunk_size; // This will be 4 + 4 + the length of the sample_frames array

// These form the `chunk_data` of the generic Chunk
u32 offset; // How many bytes to skip in sample_frames before reaching the first frame.
u32 block_size; // Number of Sample Frames in a block of audio
Frame[] sample_frames; // The actual Sample Frames of audio
}
```

For most applications both the `offset` and `block_size` will be set to 0. `sample_frames` contains the bytes for the sample frames making up the actual audio.

Depending on the bit-depth you choose, each sample point will be an `i8`, `i16` or `i32`. A `Frame` is then composed of `num_channel` sample points stored contiguously. There is a convention on how the sample points within a frame are ordered, as follows (change `i16` to whichever bit-depth you choose):
```
MonoFrame {
i16 sample;
}

StereoFrame {
i16 left;
i16 right;
}

ThreeChannelFrame {
i16 left;
i16 right;
i16 center;
}
```
76 changes: 76 additions & 0 deletions reference/data_types.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
# Data Types Used Across Projects
For many projects in this repository, you will be dealing with a few standard data types. You can find their descriptions here:

## Endianness
Values in multi-byte formats have two ways of being stored in memory or on disk. This is dependent on the addresses of each byte as follows (Assume N to be the address of the value):

1. Little-endian: The Least Significant Bit (LSB) is stored in the lowest order byte. This is the most common order you will find in modern systems:

| N | N + 1 | N + 2 | N + 3 |
| ------ | ------ | ------ | ------ |
| byte 0 | byte 1 | byte 2 | byte 3 |

2. Big-endian: The Most Significant Bit (MSB) is stored in the lowest order byte. This can be found in network protocols or on certain older architectures:

| N | N + 1 | N + 2 | N + 3 |
| ------ | ------ | ------ | ------ |
| byte 3 | byte 2 | byte 1 | byte 0 |

For most cases you will likely be working in Little Endian, and projects will specify if you need to work with Big Endian values.

## Unsigned Integer
| Name | Description |
| ----- | ----------- |
| `u8` | Unsigned 8-bit (1-byte) integer |
| `u16` | Unsigned 16-bit (2-byte) integer |
| `u32` | Unsigned 32-bit (4-byte) integer |
| `u64` | Unsigned 64-bit (8-byte) integer |

## Signed Integer
| Name | Description |
| ----- | ----------- |
| `i8` | Signed 8-bit (1-byte) integer |
| `i16` | Signed 16-bit (2-byte) integer |
| `i32` | Signed 32-bit (4-byte) integer |
| `i64` | Signed 64-bit (8-byte) integer |

## IEEE Floating Point
| Name | Description |
| ----- | ----------- |
| `f32` | Single-precision 32-bit (4-byte) floating point |
| `f64` | Double-precision 64-bit (8-byte) floating point |
| `f80` | Extended-precision 80-bit (10-byte) floating point |

## Textual
| Name | Description |
| ----- | ----------- |
| `char` | ASCII Character |
| `utf8` | Unicode UTF-8 string |

## Arrays
Any type can be suffixed with `[]` to specify an array of that type. A number can be specified between the brackets to signify an array size, or left empty for an unbounded array.

```c
// Bounded
u8[1024] byte_buffer;
// Unbounded
u8[] data_segment;
```

Some array types are common:
| Name | Description |
| ----- | ----------- |
| `ascii` | ASCII String *without* null termination, equivalent to `char[]` |
| `ascii_n` | ASCII String *with* null termination, equivalent to `char[]` with a subsequent null byte |

## Composite Structures
Most types will end up composing other data types into one, these will be represented as follows:
```c
StructureName {
type_1 field_name_1;
type_2 field_name_2;
// ...
}
```

Unless mentioned otherwise, the fields of a structure are stored contiguously.