Skip to content

Add page for WebSocket Channel Driver #140

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 6, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions docs/Configuration/.pages
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
nav:
- Core-Configuration
- Channel-Drivers
- Dialplan
- Applications
- Functions
- Features
- Interfaces
- Reporting
- Miscellaneous
- WebRTC
- Codec-Opus.md
13 changes: 13 additions & 0 deletions docs/Configuration/Channel-Drivers/.pages
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
nav:
- SIP
- DAHDI.md
- Inter-Asterisk-eXchange-protocol-version-2-IAX2
- Local-Channel
- AudioSocket.md
- WebSocket.md
- Mobile-Channel
- Motif
- Skinny
- Unistim
- mISDN
- IP-Quality-of-Service.md
210 changes: 210 additions & 0 deletions docs/Configuration/Channel-Drivers/WebSocket.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,210 @@
# WebSocket **DRAFT**

/// warning
This document is in DRAFT status and may change before final publication.
///

## Background

The WebSocket Channel Driver (chan_websocket) is designed to ease the burden on ARI application developers with getting media in and out of Asterisk. The ARI /channels/externalMedia REST endpoint already has two other channel drivers available (AudioSocket and RTP) but they require binary packet manipulation (RTP especially) and both require that the app developer handle the timing of sending packets to asterisk. chan_websocket requires neither.

## Features

* Send and receive media using most Asterisk codecs.
* TLS is supported.
* Send arbitrary packet lengths to Asterisk. The channel driver will break them up into appropriately sized frames (see notes below though).
* No need to time your own packet transmits.
* Silence is automatically generated when no packets have been received from the app.
* The channel driver can accept incoming websocket connections _from_ your app as well as make outgoing connections _to_ your app.
* Although the driver is targetted at ARI ExternalMedia users, it's not tied to ARI and can be used directly from the Dial dialplan app.

## Connection Types

### Outgoing Connections

Outgoing connections require you to pre-configure a websocket client in the `websocket_client.conf` config file (see details below). Once done, you can reference the connection in a dial string.

```ini title="Dialplan Example"
[default]
exten = _x.,1,Dial(WebSocket/connection1/c(ulaw))
```

This would connect to your application's websocket server using the client named `connection1` and using the `ulaw` codec. Right after your server accepts the connection, you'll get a TEXT websocket message `MEDIA_START connection_id:<connection_id> channel:<channel_name> optimal_frame_size:<optimal_frame_size>`. This will allow you to correlate the incoming connection to the specific channel. The `a` dialstring options tells the channel driver to auto-answer the channel on successful connection. If you omit the `a`, you can send an `ANSWER` TEXT message to answer the channel yourself.

### Incoming Connections

Incoming connections must be made to the global Asterisk HTTP server using the `media` URI but you must still "Dial" the channel using the special `INCOMING` connection name.

```ini title="Dialplan Example"
[default]
exten = _x.,1,Dial(WebSocket/INCOMING/c(ulaw)n)
```

The websocket channel will be created immediately and the `MEDIA_WEBSOCKET_CONNECTION_ID` channel variable will be set to an ephemeral connection id which must be used in the URI your application will connect to Asterisk with. For example `/media/32966726-4388-456b-a333-fdf5dbecc60d`. When Asterisk accepts the connection, you'll see the same `MEDIA_START` message as above. You can also omit the `a` option and send an `ANSWER` TEXT message as above.

The default behavior is to automatically answer the channel when the websocket has connected successfully. If for some reason you want to answer the channel yourself, you can add the `n` parameter to the dialstring and make a REST `channels/<id>/answer` call or send the `ANSWER` command (mentioned below) over the media websocket.

## Protocol

### Media Transfer

Media sent from Asterisk to your application is simply streamed in BINARY websocket messages. The message size will be whatever the internal Asterisk frame size is. For ulaw/alaw for instance, Asterisk will send a 160 byte packet every 20ms. This is the same as RTP except the messages will contain raw media with no RTP or other headers. You could stream this directly to a file or other service.

Media sent _to_ Asterisk _from_ your app is a bit trickier because chances are that the media you send Asterisk will eventually need to go out to a caller in a format that is both properly framed and properly timed. I.E. 160 byte blocks every 20 ms for a/ulaw. Sending short, long or mistimed packets will surely result is poor audio quality. To relieve your app of the burden of having to do the framing and timing, the channel driver will do it automatically but there are a few rules you have to follow.

When the websocket channel is created, a `MEDIA_WEBSOCKET_OPTIMAL_FRAME_SIZE` channel variable will be set that tells you the amount of data Asterisk needs to create a good 20ms frame using the codec you specified in the dialstring. This is also reported in the `MEDIA_START` TEXT message. If you send a websocket message with a length that's exactly that size or some even multiple of that size, the channel driver will happily break that message up into the correctly sized frames and send one frame to the Asterisk core every 20ms with no leftover data. If you send an oddly sized message though, the extra data that won't fill a frame will be dropped. However...

If you need to send a file or a buffer received from an external source like an AI agent, it's quite possible that the buffer size won't be an even multiple of the optimal size. In this case, the app can send Asterisk a `START_BULK_MEDIA` TEXT websocket message before sending the media. This tells the channel driver to buffer the data received so it can make full frames even across multiple received BINARY messages. That process will continue until the app sends Asterisk a `END_BULK_MEDIA` TEXT message. When the channel driver receives that, it'll take whatever data is left in the buffer that couldn't make a full frame and append silence to it to make up a full frame and send that to the core.

So why can't Asterisk just do that process all the time and dispense with the TEXT messages? Well, let's say the app sends a message with an odd amount of data and the channel driver saves off the odd bit. What happens if you don't send any data for a while? If 20ms goes by and the channel driver doesn't get any more data what is it supposed to do with the leftover? If it appends silence to make a full frame and sends it to the core, then the app sends more data after 30ms, the caller will hear a gap in the audio. If the app does that a lot, it'll be a bad experience for ther caller.

### Max Message Size and Flow Control

Chances are that your app will be sending data faster to Asterisk than Asterisk will be sending out to a caller so there are some rules you need to follow to prevent the channel driver from consuming excessive memory...

/// warning
The maximum websocket message size the underlying websocket code can handle is 65500 bytes. Attempting to send a message greater than that length will result in the websocket being closed and the call hungup!
///

* The maximum number of frames the channel driver will keep in its queue waiting to be sent to the core is about 1000. That's about 20 seconds of audio with a 20ms packetization rate. When the queue gets to about 900 frames, the channel driver will send a `MEDIA_XOFF` TEXT message to the app. The media the app sent just prior to receiving `MEDIA_XOFF` will be processed in its entirety even if the resulting frames cause the queue to reach 1000 but any data the app sends after that will probably be dropped. When the queue backlog drops down below about 800 frames, the channel driver will send a `MEDIA_XON` TEXT message at which time it's safe to start sending data again.

See the next section for more commands the app can send.

### Control Messages

/// warning
You must ensure that the control messages are sent as TEXT messages. Sending them as BINARY messages will cause them to be treated as media.
///

Some of the control TEXT messages you can send the driver have already been mentioned but here's the full list:

#### Commands

/// note
All commands are case-sensitive.
///

/// define
`ANSWER`: Answer the WebSocket channel

- This will cause the WebSocket channel to be answered.

`HANGUP`: Hangup the WebSocket channel

- This will cause the WebSocket channel to be hung up and the websocket to be closed.

`START_BULK_MEDIA`: Start buffering media

- Indicates to the channel driver that the following media should be buffered to create properly sized and timed frames.

`END_BULK_MEDIA <optional_id>`: Stop buffering media

- Indicates to the channel driver that buffering is no longer needed and anything remaining in the buffer should have silence appended before sending to the Asterisk core. When the last frame of this bulk transfer has been sent to the core, the app will receive a `BULK_MEDIA_END` notification. If the optional id was specified in this command, it'll be returned in the notification. If you send multiple files in quick succession, the id can help you correlate the `BULK_MEDIA_END` notification to the `END_BULK_MEDIA` command that trigfgered it.

`PAUSE_MEDIA`: Pause media being sent to the Asterisk core

- If you've sent a large amount of media but need to pause it playing to a caller while you decide if you need to flush it or not, you can send a `PAUSE_MEDIA` command. The channel driver will then start playing silence to the caller but keep the data you've already sent in the queue. You can still send media to the channel driver while it's paused; it just will get queued behind whatever was already in the queue.

`CONTINUE_MEDIA`: Continue media being sent to the Asterisk core

- If you've previously paused the media, this will cause the channel driver to stop playing silence and resume playing media from the queue from the point you paused it.

`FLUSH_MEDIA`: Flush the buffer

- Send this command to the channel driver if you've sent a large amount of media but want to discard any queued but not sent. Flushing the buffer automatically ends any bulk transfer in progress and also resets the paused state so there's no need to send `END_BULK_MEDIA` or `CONTINUE_MEDIA` commands. No `BULK_MEDIA_END` notification will be sent in this case. This command could be useful if an automated agent detects the caller is speaking and wants to interrupt a prompt it already replied with.

`GET_STATUS`: Get the current queue status

- This will cause the channel driver to send back a `STATUS` message (described below).

`REPORT_QUEUE_DRAINED`: Request a notification when the frame queue is empty

- This will cause the channel driver to send back a one-time `QUEUE_DRAINED` notification the next time it detects that there are no more frames to process in the queue.

///

#### Notifications

/// define

`MEDIA_XOFF`: Stop sending media

- The channel driver will send this notification to the app when the frame queue length reaches the high water mark. The app should then pause sending media. Any media sent after this has a high probability of being dropped.

`MEDIA_XON`: Start sending media again

- The channel driver will send this notification when the frame queue length drops below the low water mark. This indicates that it's safe for the app to start sending media again.

`STATUS`: Response from the `GET_STATUS` command

- The channel driver will send this notification in response to a `GET_STATUS` command.<br>Example: `STATUS queue_length:43 xon_level:800 xoff_level:900 queue_full:false bulk_media:true media_paused:false`

`BULK_MEDIA_END [ <optional_id> ]`: Indicates that a bulk media transfer has finished.

- The channel driver will send this mesage when bulk media has finished being framed, timed and sent to the Asterisk core. If an optional id was supplied on the `END_BULK_XFER` command, it will be returned in this message.

`QUEUE_DRAINED`: Response from `REPORT_QUEUE_DRAINED`

- The channel driver will send this when it's processed the last frame in the queue and you've asked to be notified with a `REPORT_QUEUE_DRAINED` command. If no media is received within the next 20ms, a silence frame will be sent to the core. This is a one-time notification. You must send additional `REPORT_QUEUE_DRAINED` commands to get more notifications.

///

## Configuration

All configuration is done in the common [websocket_client.conf](/Latest_API/API_Documentation/Module_Configuration/res_websocket_client) file shared with ARI Outbound WebSockets. That file has detailed information for configuring websocket client connections. There are a few additional things to know though...

* You only need to configure a connection for outgoing websocket connections. Incoming connections (those with the special `INCOMING` connection id in the dial string) are handled by the internal http/websocket servers.

* chan_websocket can only use `per_call_config` connection types. `persistent` websocket connections aren't supported for media.

* Never try to use the same websocket connection for both ARI and Media. "Bad things will happen"®

## Creating the Channel

### Using the Dial() Dialplan App

The full dial string is as follows:

``` title="Dialstring Syntax"
Dial(WebSocket/<connection_id>/<options>[,<timeout>[,<dial_options>]])
```

* **WebSocket**: The channel technology.
* **&lt;connection_id&gt;**: For outgoing connections, this is the name of the pre-defined client connection from websocket_client.conf. For incoming connections, this must be the special `INCOMING` id.
* **&lt;options&gt;**:
* `c(<codec>)`: If not specified, the first codec from the caller's channel will be used. Having said that, if your app is expecting a specific codec, you should specify it here or you may be getting audio in a format you don't expect.
* `n`: Don't auto-answer the WebSocket channel upon successful connection. Set this if you wish to answer the channel yourself. You can then send an `ANSWER` TEXT message on the websocket when you're ready to answer the channel.

Examples:

``` title="Dial() Examples"
Dial(WebSocket/connection1/c(alaw)n)
Dial(WebSocket/INCOMING/c(slin16))
```

### Using the ARI `/channels`, `/channels/<id>` or `/channels/create` REST APIs

You can also create a WebSocket channel using the normal channel API calls and setting the `endpoint` parameter to the same dial string syntax described in the previous section.

Examples:

``` title="ARI /channel Examples"
POST http://server:8088/ari/channels?endpoint="WebSocket/connection1/c(alaw)"
POST http://server:8088/ari/channels/create?endpoint="WebSocket/INCOMING/c(ulaw)n"
```

The first example will create and dial the channel then connect to your app using the "media_connection1" websocket_client configuration. The channel will auto answer when the websocket connection is established. The second example will create the channel but not dial or auto-answer it. Instead the channel driver will wait for your app to connect to it. You can then dial and answer it yourself when appropriate. You can still omit the `n` to have incoming connections auto-answered.

### Using ARI External Media (`/channels/externalMedia`)

You can also create a channel using externalMedia with a transport of `websocket` and an encapsulation of `none`.

Example:

``` title="ARI External Media Examples"
POST http://server:8088/ari/channels/externalMedia?transport=websocket&encapsulation=none&external_host=media_connection1&format=ulaw
POST http://server:8088/ari/channels/externalMedia?transport=websocket&encapsulation=none&external_host=INCOMING&connection_type=server&format=ulaw
```

The first example will create an outbound websocket connection to your app using the "media_connection1" websocket_client configuration. The second example will wait for an incoming connection from your app. Both examples will automatically dial and answer the websocket channel. There's no option to suppress either. Use the normal channel creation APIs if you need to handle them yourself.

10 changes: 10 additions & 0 deletions docs/Configuration/Interfaces/.pages
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
nav:
- Asterisk-Manager-Interface-AMI
- Asterisk-REST-Interface-ARI
- Asterisk-Gateway-Interface-AGI.md
- Asterisk-External-Application-Protocol-AEAP.md
- Back-end-Database-and-Realtime-Connectivity
- Distributed-Device-State
- Asterisk-Call-Files.md
- Asterisk-Calendaring
- Utilizing-the-StatsD-Dialplan-Application.md
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,10 @@ Once the apps are registered, Asterisk will attempt to connect to to your applic

There are a few differences between persistent and per-call connections. When Asterisk starts, per-call connections only create the dialplan contexts named `stasis-<app name>` with the `Stasis(<app name>)` extension. Nothing else happens until a channel causes `Stasis(<app name>)` to be called. When it does, Stasis() checks the internal app registry and if it doesn't find an ARI/Stasis app registered with that name (which it won't in this case), it looks to see if an outbound-websocket "per_call" connection has been defined and if it finds one, it creates an ephemeral ARI/Stasis app with the name `<app name>:<channel name>` and that's the name that will appear in the initial `ApplicationRegistered` event your external application will see.

/// note
If you plan to create additional channels using this same websocket connection, ensure you specify the full `<app name>:<channel name>` in the REST call's `app` parameter or Asterisk will attempt to make a new outbound websocket connection instead of using the existing one.
///

Active per-call connections are never reconfigured so there'll be no further `Application*` messages sent.

If a per-call connection fails to (re-)connect after `reconnect_attempts` tries, the Stasis() application will set the `${STASISSTATUS}` variable to `FAILED` and return control to the dialplan.
Expand Down