diff --git a/public/website/blog.html b/public/website/blog.html index 2314acb4..13a6788c 100644 --- a/public/website/blog.html +++ b/public/website/blog.html @@ -67,8 +67,8 @@
+ Many AI voice systems, such as Retell + AI + and Vocode are built on top of Twilio. + While Twilio is an excellent platform for rapid development, building an AI voice business solely on + top of it comes with significant risks such as deplatforming and prohibitive costs. +
+ ++ Somleng addresses these risks by offering an open-source alternative, giving businesses the freedom to + self-host or work with lower-cost providers. Since Somleng offers full compatibility with the Twilio + API, AI-powered voice systems built on Twilio can + easily transition to Somleng without requiring significant changes to their existing codebase or + architecture. This enables businesses to easily transition to a more customizable and cost-effective + open-source alternative, without having to overhaul their existing Twilio-based workflows. +
+ ++ In this post, we'll walk you through the technical journey of integrating AI into Somleng's voice + platform—combining key architectural decisions, the challenges we faced and the solutions we developed + to overcome them. +
Twilio's <Stream> noun,
- used within the
<Connect>
verb, enables
+ used within the <Connect>
verb, is a set of TwiML™
+ instructions which enables
real-time streaming of voice data to external systems for AI processing. It allows live audio from a
call to be streamed over WebSockets to AI-driven platforms, where speech recognition, natural language
processing, or machine learning algorithms can analyze the conversation in real time. This makes it
ideal for building voice-powered AI applications like interactive voice response (IVR), real-time
transcription, sentiment analysis, and intelligent customer support bots, enhancing the call experience
- with AI capabilities.
+ with AI capabilities. Most existing AI Voice systems, such as Retell
+ AI
+ and Vocode return the following TwiML
+ instructions to initiate a connection with Twilio.
+
+
<?xml version="1.0" encoding="UTF-8"?>
+<Response>
+ <Connect>
+ <Stream url="wss://example.com/audiostream" />
+ </Connect>
+</Response>
+
+
+
+ After Twilio parses this TwiML document, it opens a Websockets connection to the AI voice system by
+ connecting to the
+ URL
+ provided in the
+ url
attribute of the <Stream>
noun. Audio is Base64 encoded, included
+ in a
+ Twilio defined Websocket message
+ and sent
+ bi-directionally between Twilio and the AI voice system. Below is an example of a
+ Media message.
+
+
{
+ "event": "media",
+ "sequenceNumber": "3",
+ "media": {
+ "track": "outbound",
+ "chunk": "1",
+ "timestamp": "5",
+ "payload": "no+JhoaJjpz..."
+ } ,
+ "streamSid": "MZXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
+}
+
+ + Twilio receives websockets messages sent from the AI voice system over the websockets connection and + returns audio to + the caller, while the AI Voice system + receives websockets messages from Twilio to interperate audio from the caller. See below: +
++ +
+ ++ When you make a call to Twilio a bunch of stuff happens behind the scenes which allows developers to + programmatically control the call using TwiML™. Twilio encapsulates this logic into a black box. + In this section we'll open up the black box and do a bit of a deep dive into how Somleng handles this + process. We'll then build on this knowledge to explain how we handle the <Connect> verb.
+ + + ++ The diagram above is a high-level overview of how incoming calls are handled by Somleng explained in + more detail below: +
<?xml version="1.0" encoding="UTF-8"?>
+<Response>
+ <Say>Hello World<</Say>
+</Response>
+
+ + Now that we have a high-level overview of how Somleng handles incoming calls, let's take a look at + introducing the <Connect> verb. +
+ + + ++ The diagram above shows what happens when we introduce the <Connect> verb. + Note that steps 1-6 are the same as in the previous section and we have just + replaced the Customer App with the AI voice system. Let's explore what happens from step 7 in more + detail below: +
+ ++
<?xml version="1.0" encoding="UTF-8"?>
+<Response>
+ <Connect>
+ <Stream url="wss://openvoice.ai" />
+ </Connect>
+</Response>
+ wss://openvoice.ai
+ wss://openvoice.ai
and sends
+ the connected
message followed by the start
message.
+ {
+ "event": "connected",
+ "protocol": "Call",
+ "version": "1.0.0"
+}
+ {
+ "event": "start",
+ "sequenceNumber": "1",
+ "start": {
+ "accountSid": "ACXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
+ "streamSid": "MZXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
+ "callSid": "CAXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
+ "tracks": [ "inbound" ],
+ "mediaFormat": {
+ "encoding": "audio/x-mulaw",
+ "sampleRate": 8000,
+ "channels": 1
+ },
+ "customParameters": {
+ "FirstName": "Jane",
+ "LastName": "Doe",
+ "RemoteParty": "Bob",
+ },
+ },
+ "streamSid": "MZXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
+}
+ media
messages to the FreeSWITCH task via the Websockets
+ connection such as:
+ {
+ "event": "media",
+ "sequenceNumber": "3",
+ "media": {
+ "track": "outbound",
+ "chunk": "1",
+ "timestamp": "5",
+ "payload": "no+JhoaJjpz..."
+ },
+ "streamSid": "MZXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
+}
+ media
message and sends it to the AI
+ voice
+ system over the Websockets connection.
+ {
+ "event": "media",
+ "sequenceNumber": "135",
+ "media": {
+ "track": "outbound",
+ "chunk": "1",
+ "timestamp": "10",
+ "payload": "no+JhoaJjpz..."
+ },
+ "streamSid": "MZXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
+}
+ + In order to achieve the above functionality, we needed to build the missing components. +
+ +
- In the example above, the responses will be sent back to 10.10.1.21
at the default port for
+ In the example above, the responses will be sent back to 10.10.1.21
at the default port
+ for
UDP 5060
.
- Here you can see a problem immediately with our infrastructure above. 10.10.1.21 is the IP address of - the FreeSWITCH ECS task and is not publicly routable from the Internet. Therefore any responses from the + Here you can see a problem immediately with our infrastructure above. 10.10.1.21 is the IP address + of + the FreeSWITCH ECS task and is not publicly routable from the Internet. Therefore any responses from + the UAS will not be received by the UAC. This is because FreeSWITCH (acting as the UAC) constructs the SIP - header and is only aware of its own IP address and port. When the request passes through the managed NAT + header and is only aware of its own IP address and port. When the request passes through the managed + NAT Gateway, it will do Port Address Translation (PAT) @@ -289,10 +570,14 @@
This parameter was designed to assist routing of responses - not only where NAT is involved but also where the upstream device has used a hostname rather than an IP - address in the Via it inserted. The default behavior for routing of SIP responses over UDP, as described - in RFC 3261, is to use address and port information embedded in the relevant Via header - the address is - taken from the "received" parameter value and the port is taken from the "sent-by" component. If there + not only where NAT is involved but also where the upstream device has used a hostname rather than an + IP + address in the Via it inserted. The default behavior for routing of SIP responses over UDP, as + described + in RFC 3261, is to use address and port information embedded in the relevant Via header - the + address is + taken from the "received" parameter value and the port is taken from the "sent-by" component. If + there is no port, it defaults to port 5060 for UDP and TCP, or port 5061 for TLS.
@@ -306,8 +591,10 @@The UAS has added the "received" parameter which correctly corresponds to the public IP of the NAT - gateway. However this alone is not enough. Responses from the UAS will still be sent to the port 5060 - (default for UDP). Since this port is not mapped by the NAT gateway, responses cannot be received by the + gateway. However this alone is not enough. Responses from the UAS will still be sent to the port + 5060 + (default for UDP). Since this port is not mapped by the NAT gateway, responses cannot be received by + the UAS.
@@ -332,7 +619,8 @@Fortunately FreeSWITCH automatically adds this empty "rport" parameter to the via parameter when it - detects it's behind a NAT. Here's how the Via header looks with the empty "rport" parameter when sent + detects it's behind a NAT. Here's how the Via header looks with the empty "rport" parameter when + sent via the UAC (FreeSWITCH):
@@ -349,9 +637,11 @@- The addition of an "rport" parameter alongside the "received" parameter means that SIP responses can be + The addition of an "rport" parameter alongside the "received" parameter means that SIP responses can + be passed back to the source using symmetric routing. Responses will now be sent from the UAS to - 13.250.230.15 on port 37058 which correspond to the IP address and port address mapping of the managed + 13.250.230.15 on port 37058 which correspond to the IP address and port address mapping of the + managed NAT Gateway. The NAT gateway will then map this port back to the FreeSWITCH task which is running on 10.10.1.21.
@@ -389,7 +679,8 @@ext-rtp-ip
parameter to the IP address of the NAT Gateway which causes
- FreeSWITCH to insert the correct value. Without this configuration FreeSWITCH would insert the private
+ FreeSWITCH to insert the correct value. Without this configuration FreeSWITCH would insert the
+ private
IP address of the ECS task's network interface into the SDP. However there's also an issue with the
following line:
@@ -400,16 +691,21 @@ - This line specifies the port in which the media should be sent and received from. FreeSWITCH opens this - port (25984) for sending media and inserts it into the SDP. By default FreeSWITCH will select a random + This line specifies the port in which the media should be sent and received from. FreeSWITCH opens + this + port (25984) for sending media and inserts it into the SDP. By default FreeSWITCH will select a + random port in the range 16384-32768.
- However when RTP media is sent out through the NAT gateway to the UAS, the port will be translated and - will no longer match the value in the SDP. This typically results in one-way audio issues because the - media server on the UAS side will try to send audio back to the port specified in the SDP which cannot + However when RTP media is sent out through the NAT gateway to the UAS, the port will be translated + and + will no longer match the value in the SDP. This typically results in one-way audio issues because + the + media server on the UAS side will try to send audio back to the port specified in the SDP which + cannot be reached as illustrated below:
@@ -419,7 +715,8 @@In order to work around this issue, we typically require UAS/Media Servers connecting to Somleng to enable Symmetric RTP. Symmetric RTP means that the IP address and port pair used by an - outbound RTP flow is reused for the inbound flow. The IP address and port are learned when the initial + outbound RTP flow is reused for the inbound flow. The IP address and port are learned when the + initial RTP flow is received on the UAS. The flow's source address and port are latched onto and used as the destination for the RTP sourced by the UAC. The IP address and port in the c line and m line respectively in the SDP message are ignored. This is illustrated below: @@ -429,21 +726,28 @@
- In some cases, our customer's devices do not support symmetric RTP, or they cannot enable it. In these - cases, we work around the issue by bypassing the Managed NAT Gateway and routing RTP through a 1-1 NAT + In some cases, our customer's devices do not support symmetric RTP, or they cannot enable it. In + these + cases, we work around the issue by bypassing the Managed NAT Gateway and routing RTP through a 1-1 + NAT instance.
- The 1-1 NAT instance does not automatically apply PAT to outgoing packets. Instead, it will try to keep - the outbound port of the client and only apply PAT if there is a conflicting port. Given that FreeSWITCH - generates a random port for RTP for each session in the range 16384-32768 it should be relatively rare + The 1-1 NAT instance does not automatically apply PAT to outgoing packets. Instead, it will try to + keep + the outbound port of the client and only apply PAT if there is a conflicting port. Given that + FreeSWITCH + generates a random port for RTP for each session in the range 16384-32768 it should be relatively + rare to see conflicts and PAT.
- Routing RTP through the 1-1 NAT instance means that the UAS will receive RTP packets from the port which - corresponds to the value in the (m) line of the SDP. The UAS media server can therefore send RTP back to + Routing RTP through the 1-1 NAT instance means that the UAS will receive RTP packets from the port + which + corresponds to the value in the (m) line of the SDP. The UAS media server can therefore send RTP + back to this port which will be routed back to the FreeSWITCH media server as illustrated below:
@@ -456,8 +760,10 @@- So far we have managed to get away with using off-the-shelf managed services such as an AWS Application - Load Balancer and an AWS NAT Gateway to horizontally scale FreeSWITCH when used as a User Agent Client + So far we have managed to get away with using off-the-shelf managed services such as an AWS + Application + Load Balancer and an AWS NAT Gateway to horizontally scale FreeSWITCH when used as a User Agent + Client (UAC). But we also want to use FreeSWITCH as a User Agent Server (UAS) to handle inbound calls. This poses a new set of challenges described in the following section.
@@ -470,8 +776,10 @@- In order to handle this we need a SIP-aware proxy that sits between the managed Network Load Balancer - and the FreeSWITCH UAS tasks. The proxy will be able to handle incoming requests and route them to the + In order to handle this we need a SIP-aware proxy that sits between the managed Network Load + Balancer + and the FreeSWITCH UAS tasks. The proxy will be able to handle incoming requests and route them to + the correct FreeSWITCH task. For this we use OpenSIPS. OpenSIPS comes with a load balancer module which @@ -496,7 +806,8 @@
- With an OpenSIPS proxy in place, new requests (SIP Invites) will be load balanced across the available + With an OpenSIPS proxy in place, new requests (SIP Invites) will be load balanced across the + available FreeSWITCH tasks. In order to illustrate this better let's look at a complete example.
@@ -505,13 +816,15 @@10.65.104.1
) sends SIP Invite to the UAC's SIP Proxy (175.100.32.29
)
+ UAC (10.65.104.1
) sends SIP Invite to the UAC's SIP Proxy
+ (175.100.32.29
)
52.74.4.205
)
10.10.1.162
)
+ Network Load Balancer forwards the request to the OpenSIPS proxy ECS task
+ (10.10.1.162
)
10.10.1.21
)
@@ -519,16 +832,19 @@ - When the SIP invite arrives at the OpenSIPS proxy it will add a new Record-Route header containing its + When the SIP invite arrives at the OpenSIPS proxy it will add a new Record-Route header containing + its own address above any existing Record-Route headers. In our setup, OpenSIPS is configured to add two Record-Route headers. This is known as double Record-Route headers and handles the special situation where the proxy receives a request on one network interface and sends it onwards using a different - interface. In our case the request is received on by Network Load Balancer but sent out via the private + interface. In our case the request is received on by Network Load Balancer but sent out via the + private IP address of the OpenSIPS ECS Task.
- The set of Record-Route headers describes the path through all proxy nodes. Combined with the Contact + The set of Record-Route headers describes the path through all proxy nodes. Combined with the + Contact header, this provides a complete description of the upstream path that leads back to the UAC.
@@ -551,11 +867,13 @@10.10.1.162
is the internal IP address of the OpenSIPS ECS task. This Record-Route header
+ 10.10.1.162
is the internal IP address of the OpenSIPS ECS task. This Record-Route
+ header
is added by
the OpenSIPS proxy before it sends the request to the UAS.
r2=on
in the Record-Route headers indicates double Record-Route headers.
+ The attribute r2=on
in the Record-Route headers indicates double Record-Route
+ headers.
175.100.32.29
is the public IP address of the SIP proxy associated with the UAC and is
+ 175.100.32.29
is the public IP address of the SIP proxy associated with the UAC and
+ is
added by the UAC's proxy.
- Now the UAS (FreeSWITCH ECS task) has a complete description of the path back to the UAC, but at this + Now the UAS (FreeSWITCH ECS task) has a complete description of the path back to the UAC, but at + this point the UAC doesn't have any knowledge of the path, the proxy nodes or even the address of the - UAS. It needs to know the path too, for example when it wants to send more requests to the UAS within + UAS. It needs to know the path too, for example when it wants to send more requests to the UAS + within the current dialogue. This information is exchanged by the UAS in the response, which includes a - complete copy of all the Record-Route headers. It also includes its own Contact header in the response. - The path that the response follows is defined by the Via headers - the Record-Route headers are present + complete copy of all the Record-Route headers. It also includes its own Contact header in the + response. + The path that the response follows is defined by the Via headers - the Record-Route headers are + present in the response but do not influence its transmission path.
@@ -591,18 +915,22 @@10.10.1.21
) sends the response to the OpenSIPS proxy's
- internal IP address (10.10.1.162
) obtained from the received parameter in the Via Header.
+ internal IP address (10.10.1.162
) obtained from the received parameter in the Via
+ Header.
175.100.32.29
)
+ The Network Load Balancer forwards the response to the UAC's SIP Proxy
+ (175.100.32.29
)
obtained from the Via Header.
10.65.104.1
) obtained from the Via
+ The UAC's SIP proxy forwards the response to the UAC (10.65.104.1
) obtained from the
+ Via
Header.
10.10.1.162
is the internal IP address of the OpenSIPS ECS task seen in the "received"
+ 10.10.1.162
is the internal IP address of the OpenSIPS ECS task seen in the
+ "received"
parameter of the first
Via Header. This "received" parameter is used to determine the first hop back to the the OpenSIPS
Proxy in the
response, not the Record-Route header.
10.10.1.21
(in the contact header) is the internal IP address of the UAS (FreeSWITCH ECS
+ 10.10.1.21
(in the contact header) is the internal IP address of the UAS (FreeSWITCH
+ ECS
Task). This is set by the UAS in the response.
- So now, both the UAC and the UAS have a copy of the full set of Record-Route headers and is remembered - by both endpoints. Once the dialogue is established, the proxies should not insert any more Record-Route - headers after that initial transaction. Instead, all the sequential SIP requests should contain Route + So now, both the UAC and the UAS have a copy of the full set of Record-Route headers and is + remembered + by both endpoints. Once the dialogue is established, the proxies should not insert any more + Record-Route + headers after that initial transaction. Instead, all the sequential SIP requests should contain + Route headers.
- The Route Set is used to create a set of Route headers. The sequence of these headers is important - the + The Route Set is used to create a set of Route headers. The sequence of these headers is important - + the upstream server will invert the order, but a downstream server does not.
@@ -660,7 +995,8 @@175.100.32.29
).
@@ -668,7 +1004,8 @@ 52.74.4.205
).
10.10.1.21
)
+ The Request Line contains the private IP address of the FreeSWITCH ECS task
+ (10.10.1.21
)
which it got from the Contact Header of the Response (see above).
- So far we have discussed how we deploy FreeSWITCH behind both an Application Load Balancer (when acting + So far we have discussed how we deploy FreeSWITCH behind both an Application Load Balancer (when + acting as a UAC) and a SIP proxy (when acting as a UAS). This section discusses how we automatically scale FreeSWITCH tasks based on CPU and Session count.
@@ -730,15 +1070,19 @@- When a scaling policy adds a new task, a Lambda function is triggered which adds a load balancer target + When a scaling policy adds a new task, a Lambda function is triggered which adds a load balancer + target to the OpenSIPS load balancer table. OpenSIPS will then start load balancing SIP requests to the new - task. When FreeSWITCH is acting as a UAC, this happens automatically when ECS registers the task with + task. When FreeSWITCH is acting as a UAC, this happens automatically when ECS registers the task + with the Application Load Balancer.
- Similarly, when a task scales-in the Lambda function is triggered which removes the load balancer target - from the OpenSIPS load balancer table. OpenSIPS then stops sending new requests to this endpoint. When + Similarly, when a task scales-in the Lambda function is triggered which removes the load balancer + target + from the OpenSIPS load balancer table. OpenSIPS then stops sending new requests to this endpoint. + When FreeSWITCH is acting as a UAC, this happens automatically when ECS de-registers the task from the Application Load Balancer. After a specified timeout period, the task is terminated.
@@ -764,7 +1108,8 @@