The overall structure of a stereo .hps file is the following:
Offset | Section |
---|---|
0x00 | File Header |
0x10 | Left Channel Info |
0x48 | Right Channel Info |
0x80 | DSP Blocks |
The meat of an .hps file is the "DSP block" data. The song contained in the file is split into multiple "blocks", each containing encdoded audio data as well as a link to the start of the next block.
The first half of the frames in each block are for the left audio channel, and other half are for the right.
Offset | Section |
---|---|
0x00 | DSP Block Header |
0x0C | Left DSP Decoder State |
0x14 | Right DSP Decoder State |
0x1C | Padding (Always 0) |
0x20 | DSP Audio Frames |
All numeric types are in big-endian format
The file header is the first section within an .hps file. It contains the magic string, the sample rate of the song, and the number of audio channels used.
Length: 0x10
Offset | Name | Type | Length | Description |
---|---|---|---|---|
0x00 | Magic String | [u8; 8] | 0x08 | " HALPST\0" magic string |
0x08 | Sample Rate | u32 | 0x04 | Number of samples per channel per second |
0x0C | Channel Count | u32 | 0x04 | Number of audio channels |
The .hps file should have a channel info section for each audio channel. Notably, an audio channel contains 16 "coefficients" that are used in the calculation to decode samples within the channel blocks' frames
Length: 0x38
Offset | Name | Type | Length | Description |
---|---|---|---|---|
0x00 | Largest Block Length | u32 | 0x04 | Length of the largest block in the channel |
0x04 | (Unknown) | u32 | 0x04 | Always 0x2 |
0x08 | Sample Count | u32 | 0x04 | [!!UNSURE] Number of samples in the channel |
0x0C | (Unknown) | u32 | 0x04 | Always 0x2 |
0x10 | DSP Decode Coefficients | [i16; 16] | 0x20 | Each audio frame requires a 'coefficient' to calculate values of the 14 samples within the frame |
0x30 | Initial DSP Decoder State | DSP Decoder State | 0x08 | The first DSP decoder state for the channel |
Each block of audio has it's own header. It contains:
- The length of the data in the block (excluding the header itself)
- A pointer to the next block
Length: 0x20
Offset | Name | Type | Length | Description |
---|---|---|---|---|
0x00 | DSP Data Length | u32 | 0x04 | Length of non-header data contained within the block: blockLength - 0x20 |
0x04 | (Unknown) | u32 | 0x04 | Often 0xFFFF, but not always |
0x08 | Pointer to Next Block | u32 | 0x04 | Offset of the next block to read (offset from the start of the file) |
Length: 0x08
Offset | Name | Type | Length | Description |
---|---|---|---|---|
0x00 | P/S high byte | u8 | 0x01 | [!!UNSURE] (predictor and scale?) |
0x01 | P/S | u8 | 0x01 | [!!UNSURE] (predictor and scale?) |
0x02 | Initial hist 1 | i16 | 0x02 | Initial hist1 value for the block |
0x04 | Initial hist 2 | i16 | 0x02 | Initial hist2 value for the block |
0x06 | (Unknown) | u16 | 0x02 | Always 0 |
Each frame of audio data contains a one byte header followed by seven bytes of encoded samples.
The header byte contains a scale (u16)
which can be calculated like so:
1 << (header & 0xF)
as well as a coefficient_index (usize)
, which can be
calculated like so: header >> 4
. The coefficient_index
can be used to index
into the array of "DSP decode coefficient"s contained in the channel info
to obtain the coefficient
we need to decode the samples
in this frame.
Each of the seven bytes following the frame header contains two encoded samples, one in the first nibble of the byte, and the other in the second. To decode a nibble into a sample, we can use the following formula:
clamp_i16(((nibble * scale) << 11) + 1024 + ((coef1 * hist1) + (coef2 * hist2)) >> 11);
where hist1
and hist2
represent the two previously decoded samples.
Note: Whether a frame belongs to the left or the right audio channel depends on where it appears in the block. The first half of the frames in a block are for the left audio channel, and the other half are for the right.
Length: 0x08
Offset | Name | Type | Length | Description |
---|---|---|---|---|
0x00 | DSP Frame header | u8 | 0x01 | This byte contains an encdoded 'scale' and 'coefficient_index' |
0x01 | Encoded Samples | [u8; 7] | 0x07 | Each of these 7 bytes contains 2 encoded samples |
This documentation was put together using knowledge learned from the following sources:
- https://docs.rs/rodio/0.17.1/rodio/source/trait.Source.html#a-quick-lesson-about-sounds
- https://github.com/pdeljanov/Symphonia/blob/398dab0/GETTING_STARTED.md#multimedia-basics
- https://github.com/jmlee337/dsp2hps/blob/6531757/dsp2hps/dsp2hps/main.cpp
- https://github.com/Thealexbarney/VGAudio/blob/9d8f6ea/src/VGAudio/Containers/Hps
- https://github.com/vgmstream/vgmstream/blob/8d0dd44/src/meta/halpst.c
- https://www.metroid2002.com/retromodding/wiki/DSP_(File_Format)#ADPCM_Data