Skip to content

Commit

Permalink
[api] [cli] roc-streaminggh-608: Formats and sub-formats
Browse files Browse the repository at this point in the history
API
---

  1. roc_format: the only supported format (currently)
     is now ROC_FORMAT_PCM.

  2. roc_subformat: defines various PCM variants
     (sint, uint, float, endian).

  3. roc_media_encoding: format+subformat define
     sample coding together.

CLI
---

  1. --io-encoding and --packet-encoding options now have form
    <format>[@<subformat>]/<rate>/<channels>

  E.g.: pcm@s16/44100/stereo
  (whether sub-format is allowed or required depends
   on format)

  2. --input-format/--output-format options are removed,
     their function is now handled by <format> field
     of --io-encoding

  E.g.:
    --output file://- --io-encoding wav@s24/48000/stereo
     or
    --input file://- --io-encoding wav/-/-

  3. --print-supported is updated to discover and list all
     available sub-formats (divided into logical groups).

Docs
----

  1. Update --help messages of CLI tools.
  2. Update manual pages of CLI tools.

Internals
---------

Introduce concept of format (e.g. PCM, FLAC) and sub-format
(e.g. s16). Support formats and sub-formats in all sndio
backends.

roc_audio:

- SampleFormat => Format
- Formats: Format_Pcm, Format_Wav, Format_Custom
- PcmFormat => PcmSubformat
- SampleSpec: set_format(), set_custom_format(),
  set_pcm_subformat(), set_custom_subformat()
- SampleSpec: is_valid() => is_complete()
- Sample_RawFormat => PcmSubformat_Raw

roc_sndio:

- File formats are now *not* drivers. All file formats
  are handles by special "file://" driver, i.e.
  URI scheme is now always equal to driver.
- Supported file formats and sub-formats are
  discovered from backends separately from drivers.
- For discovery, we use DriverInfo and FormatInfo
  structs.

- IoConfig is empty by default; frame length and latency
  are zero.
- Each backend may use its own defaults for IoConfig.
- We can retrieve actually selected config using
  sample_spec() and frame_length() methods of IDevice.

- SoxBackend: remove file support, allow only devices
  (now we use sndfile for files)
- SoxBackend: support PCM format and sub-formats
- PulseaudioBackend: support PCM format and sub-formats
- SndfileBackend: support formats and sub-formats,
  map to sndfile major type and sub-type
- WavBackend: support WAV format and PCM sub-formats

- Refactoring & unification in sndio backends
- Bug-fixes in format handling in sndfile backend
  • Loading branch information
gavv committed Sep 11, 2024
1 parent 49e7517 commit 544f20e
Show file tree
Hide file tree
Showing 206 changed files with 7,085 additions and 4,195 deletions.
2 changes: 1 addition & 1 deletion .fmtignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
src/internal_modules/roc_audio/channel_tables.cpp
src/internal_modules/roc_audio/pcm_format.cpp
src/internal_modules/roc_audio/pcm_subformat.cpp
src/internal_modules/roc_core/macro_helpers.h
src/internal_modules/roc_core/target_libatomic_ops/roc_core/atomic_ops.h
src/public_api/include/roc/version.h
Expand Down
2 changes: 1 addition & 1 deletion SConstruct
Original file line number Diff line number Diff line change
Expand Up @@ -729,10 +729,10 @@ env['ROC_MODULES'] = [
'roc_audio',
'roc_rtp',
'roc_rtcp',
'roc_sdp',
'roc_netio',
'roc_sndio',
'roc_pipeline',
'roc_sdp',
'roc_ctl',
'roc_node',
]
Expand Down
64 changes: 49 additions & 15 deletions docs/man/roc-copy.1
Original file line number Diff line number Diff line change
Expand Up @@ -57,18 +57,15 @@ List supported protocols, formats, etc.
.SS I/O options
.INDENT 0.0
.TP
.BI \-i\fP,\fB \-\-input\fB= FILE_URI
.BI \-i\fP,\fB \-\-input\fB= IO_URI
Input file URI
.TP
.BI \-\-input\-format\fB= FILE_FORMAT
Force input file format
.BI \-\-input\-encoding\fB= IO_ENCODING
Input file encoding
.TP
.BI \-o\fP,\fB \-\-output\fB= FILE_URI
.BI \-o\fP,\fB \-\-output\fB= IO_URI
Output file URI
.TP
.BI \-\-output\-format\fB= FILE_FORMAT
Force output file format
.TP
.BI \-\-output\-encoding\fB= IO_ENCODING
Output file encoding
.TP
Expand All @@ -91,11 +88,11 @@ Resampler profile (possible values=\(dqlow\(dq, \(dqmedium\(dq, \(dqhigh\(dq de
Enable self\-profiling (default=off)
.UNINDENT
.SH DETAILS
.SS File URI
.SS I/O URI
.sp
\fB\-\-input\fP and \fB\-\-output\fP options define input / output file URI.
.sp
\fIFILE_URI\fP should have one of the following forms:
\fIIO_URI\fP should have one of the following forms:
.INDENT 0.0
.IP \(bu 2
\fBfile:///<abs>/<path>\fP \-\- absolute file path
Expand Down Expand Up @@ -129,11 +126,48 @@ The list of supported file formats can be retrieved using \fB\-\-list\-supported
.sp
If the \fB\-\-output\fP is omitted, the conversion results are discarded.
.sp
The \fB\-\-input\-format\fP and \fB\-\-output\-format\fP options can be used to force the file format. If the option is omitted, the file format is auto\-detected. This option is always required for stdin or stdout.
.sp
The path component of the provided URI is \fI\%percent\-decoded\fP\&. For convenience, unencoded characters are allowed as well, except that \fB%\fP should be always encoded as \fB%25\fP\&.
.sp
For example, the file named \fB/foo/bar%/[baz]\fP may be specified using either of the following URIs: \fBfile:///foo%2Fbar%25%2F%5Bbaz%5D\fP and \fBfile:///foo/bar%25/[baz]\fP\&.
.SS I/O encoding
.sp
\fB\-\-input\-encoding\fP and \fB\-\-output\-encoding\fP options allow to explicitly specify encoding of the input or output file.
.sp
This option is useful when file encoding can\(aqt be detected automatically (e.g. file doesn\(aqt have extension or uses header\-less format like raw PCM).
.sp
\fIIO_ENCODING\fP should have the following form:
.sp
\fB<format>[@<subformat>]/<rate>/<channels>\fP
.sp
Where:
.INDENT 0.0
.IP \(bu 2
\fBformat\fP defines container format, e.g. \fBpcm\fP (raw samples), \fBwav\fP, \fBogg\fP
.IP \(bu 2
\fBsubformat\fP is optional format\-dependent codec, e.g. \fBs16\fP for \fBpcm\fP or \fBwav\fP, and \fBvorbis\fP for \fBogg\fP
.IP \(bu 2
\fBrate\fP defines sample rate in Hertz (number of samples per second), e.g. \fB48000\fP
.IP \(bu 2
\fBchannels\fP defines channel layout, e.g. \fBmono\fP or \fBstereo\fP
.UNINDENT
.sp
\fBformat\fP, \fBrate\fP, and \fBchannels\fP may be set to special value \fB\-\fP, which means using default value for input device, or auto\-detect value for input file.
.sp
Whether \fBsubformat\fP is required, allowed, and what values are accepted, depends on \fBformat\fP\&.
.sp
Examples:
.INDENT 0.0
.IP \(bu 2
\fBpcm@s16/44100/mono\fP \-\- PCM, 16\-bit native\-endian integers, 44.1KHz, 1 channel
.IP \(bu 2
\fBpcm@f32_le/48000/stereo\fP \-\- PCM, 32\-bit little\-endian floats, 48KHz, 2 channels
.IP \(bu 2
\fBwav/\-/\-\fP \-\- WAV file, auto\-detect sub\-format, rate, channels
.IP \(bu 2
\fBflac\-/\-/\-\fP \-\- FLAC file, auto\-detect sub\-format, rate, channels
.UNINDENT
.sp
The list of supported formats, sub\-formats, and channel layouts can be retrieved using \fB\-\-list\-supported\fP option.
.SS Time units
.sp
\fITIME\fP defines duration with nanosecond precision.
Expand All @@ -150,7 +184,7 @@ Convert sample rate to 24\-bit 48k stereo:
.sp
.nf
.ft C
$ roc\-copy \-vv \-\-io\-encoding s24/48000/stereo \-i file:input.wav \-o file:output.wav
$ roc\-copy \-vv \-i file:input.wav \-o file:output.wav \-\-output\-encoding wav@s24/48000/stereo
.ft P
.fi
.UNINDENT
Expand All @@ -162,7 +196,7 @@ Same, but drop output results instead of writing to file (useful for benchmarkin
.sp
.nf
.ft C
$ roc\-copy \-vv \-\-io\-encoding s24/48000/stereo \-i file:input.wav
$ roc\-copy \-vv \-i file:input.wav \-\-output\-encoding pcm@s24/48000/stereo
.ft P
.fi
.UNINDENT
Expand All @@ -174,8 +208,8 @@ Input from stdin, output to stdout:
.sp
.nf
.ft C
$ roc\-copy \-vv \-\-input\-format=wav \-i file:\- \e
\-\-output\-format=wav \-o file:\- >./output.wav <./input.wav
$ roc\-copy \-vv \-\-input\-encoding=wav/\-/\- \-i file:\- \e
\-\-output\-encoding=wav/\-/\- \-o file:\- >./output.wav <./input.wav
.ft P
.fi
.UNINDENT
Expand Down
89 changes: 58 additions & 31 deletions docs/man/roc-recv.1
Original file line number Diff line number Diff line change
Expand Up @@ -66,9 +66,6 @@ Exit when last connection is closed (default=off)
.BI \-o\fP,\fB \-\-output\fB= IO_URI
Output file or device URI
.TP
.BI \-\-output\-format\fB= FILE_FORMAT
Force output file format
.TP
.BI \-\-io\-encoding\fB= IO_ENCODING
Output device encoding
.TP
Expand All @@ -82,10 +79,7 @@ Output frame length, TIME units
.INDENT 0.0
.TP
.BI \-\-backup\fB= IO_URI
Backup file or device URI (used as input when there are no connections)
.TP
.BI \-\-backup\-format\fB= FILE_FORMAT
Force backup file format
Backup file URI (used as input when there are no connections)
.UNINDENT
.SS Network options
.INDENT 0.0
Expand Down Expand Up @@ -221,8 +215,6 @@ The list of supported schemes and file formats can be retrieved using \fB\-\-lis
If the \fB\-\-output\fP is omitted, default driver and device are selected.
If the \fB\-\-backup\fP is omitted, no backup source is used.
.sp
The \fB\-\-output\-format\fP and \fB\-\-backup\-format\fP options can be used to force the output or backup file format. If the option is omitted, the file format is auto\-detected. The option is always required when the output or backup is stdout or stdin.
.sp
The path component of the provided URI is \fI\%percent\-decoded\fP\&. For convenience, unencoded characters are allowed as well, except that \fB%\fP should be always encoded as \fB%25\fP\&.
.sp
For example, the file named \fB/foo/bar%/[baz]\fP may be specified using either of the following URIs: \fBfile:///foo%2Fbar%25%2F%5Bbaz%5D\fP and \fBfile:///foo/bar%25/[baz]\fP\&.
Expand All @@ -234,31 +226,47 @@ This option is useful when device supports multiple encodings, or specific file
.sp
\fIIO_ENCODING\fP should have the following form:
.sp
\fB<format>/<rate>/<channels>\fP
\fB<format>[@<subformat>]/<rate>/<channels>\fP
.sp
Where:
.INDENT 0.0
.IP \(bu 2
\fBformat\fP defines sample precision and binary representation, e.g. \fBs16_le\fP stands for little\-endian signed 16\-bit integers
\fBformat\fP defines container format, e.g. \fBpcm\fP (raw samples), \fBwav\fP, \fBogg\fP
.IP \(bu 2
\fBsubformat\fP is optional format\-dependent codec, e.g. \fBs16\fP for \fBpcm\fP or \fBwav\fP, and \fBvorbis\fP for \fBogg\fP
.IP \(bu 2
\fBrate\fP defines sample rate in Hertz (number of samples per second), e.g. \fB48000\fP
.IP \(bu 2
\fBchannels\fP defines channel layout, e.g. \fBmono\fP or \fBstereo\fP
.UNINDENT
.sp
Any component may be set to special value \fB\-\fP, which means use default value for the specified output device or file format.
\fBformat\fP, \fBrate\fP, and \fBchannels\fP may be set to special value \fB\-\fP, which means using default value for the specified output device or file format.
.sp
Whether \fBsubformat\fP is required, allowed, and what values are accepted, depends on \fBformat\fP\&.
.sp
Examples:
.INDENT 0.0
.IP \(bu 2
\fBs16/44100/mono\fP \-\- 16\-bit native\-endian integers, 44.1KHz, 1 channel
\fBpcm@s16/44100/mono\fP \-\- PCM, 16\-bit native\-endian integers, 44.1KHz, 1 channel
.IP \(bu 2
\fBpcm@f32_le/48000/stereo\fP \-\- PCM, 32\-bit little\-endian floats, 48KHz, 2 channels
.IP \(bu 2
\fBpcm@s24_4be/\-/\-\fP \-\- PCM, 24\-bit integers packed into 4\-byte big\-endian frames, default rate and channels
.IP \(bu 2
\fBf32_le/48000/stereo\fP \-\- 32\-bit little\-endian floats, 48KHz, 2 channels
\fBwav/\-/\-\fP \-\- WAV, default sample width, rate, and channels
.IP \(bu 2
\fBs24_4be/\-/\-\fP \-\- 24\-bit PCM packed into 4\-byte big\-endian frames, default rate and channels
\fBwav@s24/\-/\-\fP \-\- WAV, 24\-bit samples, default rate and channels
.IP \(bu 2
\fBflac@s16/48000/stereo\fP \-\- FLAC, 16\-bit samples, 48KHz, 2 channels
.IP \(bu 2
\fBogg/48000/stereo\fP \-\- OGG, default codec, 48KHz, 2 channels
.IP \(bu 2
\fBogg@vorbis/48000/stereo\fP \-\- OGG, Vorbis codec, 48KHz, 2 channels
.UNINDENT
.sp
The list of supported formats and channel layouts can be retrieved using \fB\-\-list\-supported\fP option.
Devices (\fBpulse://\fP, \fBalsa://\fP, etc.) usually support only \fBpcm\fP format. Files (\fBfile://\fP) support a lot of different formats.
.sp
The list of supported formats, sub\-formats, and channel layouts can be retrieved using \fB\-\-list\-supported\fP option.
.SS I/O latency and frame
.sp
\fB\-\-io\-latency\fP option defines I/O buffer size for the output device. It can\(aqt be used if output is a file.
Expand Down Expand Up @@ -323,26 +331,32 @@ Supported control protocols:
.sp
\fIPKT_ENCODING\fP is similar to \fIIO_ENCODING\fP, but adds numeric encoding identifier:
.sp
\fB<id>:<format>/<rate>/<channels>\fP
\fB<id>:<format>[@<subformat>]/<rate>/<channels>\fP
.sp
Where:
.INDENT 0.0
.IP \(bu 2
\fBid\fP is an arbitrary number in range 100..127, which should uniquely identify encoding on all related senders and receivers
.IP \(bu 2
\fBformat\fP defines sample precision and binary representation, e.g. \fBs16_le\fP stands for little\-endian signed 16\-bit integers
\fBformat\fP defines container format, e.g. \fBpcm\fP (raw samples), \fBflac\fP
.IP \(bu 2
\fBsubformat\fP is optional format\-dependent codec, e.g. \fBs16\fP for \fBpcm\fP or \fBflac\fP
.IP \(bu 2
\fBrate\fP defines sample rate in Hertz (number of samples per second), e.g. \fB48000\fP
.IP \(bu 2
\fBchannels\fP defines channel layout, e.g. \fBmono\fP or \fBstereo\fP
.UNINDENT
.sp
Whether \fBsubformat\fP is required, allowed, and what values are accepted, depends on \fBformat\fP\&.
.sp
Examples:
.INDENT 0.0
.IP \(bu 2
\fB101:s16_be/44100/mono\fP \-\- 16\-bit big\-endian integers, 44.1KHz, 1 channel
\fB101:pcm@s24/44100/mono\fP \-\- PCM, 24\-bit network\-endian integers, 44.1KHz, 1 channel
.IP \(bu 2
\fB102:pcm@f32/48000/stereo\fP \-\- PCM, 32\-bit network\-endian floats, 48KHz, 2 channels
.IP \(bu 2
\fB102:f32_le/48000/stereo\fP \-\- 32\-bit little\-endian floats, 48KHz, 2 channels
\fB103:flac@s16/48000/stereo\fP \-\- FLAC, 16\-bit precision, 48KHz, 2 channels
.UNINDENT
.sp
The list of supported formats and channel layouts can be retrieved using \fB\-\-list\-supported\fP option.
Expand Down Expand Up @@ -371,12 +385,12 @@ A few backends are available:
.IP \(bu 2
\fBspeex\fP \-\- fast, good\-quality, low\-precision resampler based on SpeexDSP
.IP \(bu 2
\fBspeexdec\fP \-\- very fast, medium\-quality, medium\-precision resampler combining SpeexDSP for base rate conversion with decimation for clock drift compensation
\fBspeexdec\fP \-\- very fast, medium\-quality, medium\-precision resampler combining SpeexDSP for base rate conversion, and decimation for clock drift compensation
.UNINDENT
.sp
Here, quality reflects potential distortions introduced by resampler, and precision reflects how accurately resampler can apply scaling and hence how accurately we can tune latency.
.sp
For very low latency or very low latency error, you usually need to use \fBbuiltin\fP backend. If those factors are not critical, you may use \fBspeex\fP resampler to reduce CPU usage. \fBspeexdec\fP backend is a compromise for situations when both CPU usage and latency are critical, and quality is less important.
For very low or very precise latency, you usually need to use \fBbuiltin\fP backend. If those factors are not critical, you may use \fBspeex\fP resampler to reduce CPU usage. \fBspeexdec\fP backend is a compromise for situations when both CPU usage and latency are critical, and quality is less important.
.sp
If receiver\-side latency tuning is disabled (by default it\(aqs enabled), resampler precision is not relevant, and \fBspeex\fP is almost always the best choice.
.SS Latency configuration
Expand All @@ -388,9 +402,9 @@ By default, latency tuning is performed on receiver side: \fB\-\-latency\-profil
\fB\-\-target\-latency\fP option defines the latency value to maintain, as measured by the \fB\-\-latency\-backend\fP:
.INDENT 0.0
.IP \(bu 2
If value is provided, \fBfixed latency\fP mode is activated. The latency starts from \fB\-\-target\-latency\fP and is kept close to that value.
If value is provided, \fIfixed latency\fP mode is activated. The latency starts from \fB\-\-target\-latency\fP and is kept close to that value.
.IP \(bu 2
If option is omitted or set to \fBauto\fP, \fBadaptive latency\fP mode is activated. The latency is chosen dynamically. Initial latency is \fB\-\-start\-latency\fP, and the allowed range is \fB\-\-min\-latency\fP to \fB\-\-max\-latency\fP\&.
If option is omitted or set to \fBauto\fP, \fIadaptive latency\fP mode is activated. The latency is chosen dynamically. Initial latency is \fB\-\-start\-latency\fP, and the allowed range is \fB\-\-min\-latency\fP to \fB\-\-max\-latency\fP\&.
.UNINDENT
.sp
\fB\-\-latency\-tolerance\fP option defines maximum allowed deviation of the actual latency from the (current) target latency. If this limit is exceeded for some reason (typically due to poor network conditions), connection is restarted.
Expand Down Expand Up @@ -448,7 +462,7 @@ For UDP, it allows multiple processes to bind to the same address, which may be
Regardless of the option, \fBSO_REUSEADDR\fP is always disabled when binding to ephemeral port.
.SS Backup audio
.sp
If \fB\-\-backup\fP option is given, it defines input audio device or file which will be played when there are no connected sessions. If it\(aqs not given, silence is played instead.
If \fB\-\-backup\fP option is given, it defines input file to be played when there are no connected sessions. If it\(aqs not given, silence is played instead.
.sp
Backup file is restarted from the beginning each time when the last session disconnect. The playback of of the backup file is automatically looped.
.SS Time and size units
Expand Down Expand Up @@ -529,7 +543,7 @@ $ roc\-recv \-vv \-s rtp+rs8m://225.1.2.3:10001 \-r rs8m://225.1.2.3:10002 \e
.UNINDENT
.UNINDENT
.sp
Bind two sets of source, repair, and control endpoints (six endpoints in total):
Bind two sets (\(dqslots\(dq) of source, repair, and control endpoints (six endpoints in total):
.INDENT 0.0
.INDENT 3.5
.sp
Expand Down Expand Up @@ -600,7 +614,7 @@ Output to a file in WAV format (specify format manually):
.sp
.nf
.ft C
$ roc\-recv \-vv \-s rtp://0.0.0.0:10001 \-o file:./output.file \-\-output\-format wav
$ roc\-recv \-vv \-s rtp://0.0.0.0:10001 \-o file:./output.1 \-\-io\-encoding wav/\-/\-
.ft P
.fi
.UNINDENT
Expand All @@ -612,7 +626,7 @@ Output to stdout in WAV format:
.sp
.nf
.ft C
$ roc\-recv \-vv \-s rtp://0.0.0.0:10001 \-o file:\- \-\-output\-format wav >./output.wav
$ roc\-recv \-vv \-s rtp://0.0.0.0:10001 \-o file:\- \-\-io\-encoding wav/\-/\- >./output.wav
.ft P
.fi
.UNINDENT
Expand Down Expand Up @@ -650,7 +664,20 @@ Force specific encoding on the output device:
.nf
.ft C
$ roc\-recv \-vv \-s rtp://0.0.0.0:10001 \e
\-\-output alsa://hw:1,0 \-\-io\-encoding s32/48000/stereo
\-\-output alsa://hw:1,0 \-\-io\-encoding pcm@s32/48000/stereo
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Force specific encoding on the output file:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ roc\-recv \-vv \-s rtp://0.0.0.0:10001 \e
\-\-output file:./output.flac \-\-io\-encoding flac@s24/48000/stereo
.ft P
.fi
.UNINDENT
Expand All @@ -662,7 +689,7 @@ Use specific encoding for network packets:
.sp
.nf
.ft C
$ roc\-send \-vv \-s rtp://192.168.0.3:10001 \-\-packet\-encoding 101:s32/48000/stereo
$ roc\-send \-vv \-s rtp://192.168.0.3:10001 \-\-packet\-encoding 101:pcm@s24/48000/stereo
.ft P
.fi
.UNINDENT
Expand All @@ -672,7 +699,7 @@ $ roc\-send \-vv \-s rtp://192.168.0.3:10001 \-\-packet\-encoding 101:s32/48000/
.sp
.nf
.ft C
$ roc\-recv \-vv \-s rtp://0.0.0.0:10001 \-\-packet\-encoding 101:s32/48000/stereo
$ roc\-recv \-vv \-s rtp://0.0.0.0:10001 \-\-packet\-encoding 101:pcm@s24/48000/stereo
.ft P
.fi
.UNINDENT
Expand Down
Loading

0 comments on commit 544f20e

Please sign in to comment.