Jump to deployment steps
Businesses and industries often need to provide customer help, before a purchase of your product or service, during an existing customer support instance, or throughout their core workflow. For example, a healthcare company may want to engage users on a weight loss program or a fintech company might want to verify identity information or explain loan application steps - customer interaction happens in many ways. The key is being ready to help your customers when and how they need it. As a business, you may need to ensure few things including,
- Understanding your customer well and their journey - where is your customer coming from, why are they here and what is the best way to solve their problems.
- Understanding outcome and business impact – how well you served them, how brief contacts were required to solve the issue, how much was the ROI for your business, how much future wallet would it bring.
- Understanding the opportunities - how do we use all the data to continuously improve customer experience and margins.
- Understanding compliance needs – Are there certain regulations that we need to adhere to? GDPR? HIPAA?
Many organizations currently rely on a manual quality control process for customer interactions. This involves randomly sampling a limited number of conversations for analysis. For regulated industries like healthcare and fintech, more rigorous analysis of customer conversations may be required for conformity and protocol adherence. Additionally, there is often a failure to quickly identify emerging themes and training opportunities across conversations.
Organizations interact with customers across multiple channels - phone, video, chat etc. A robust "conversation intelligence" solution that combines AI/ML technologies is needed to enable processing of entire customer conversations, regardless of channel.
This sample solution leverages Amazon Web Services (AWS) AI/ML services and open source models to extract insights from audio calls, chat conversations and could potentially do video as well. Key capabilities include:
- Sentiment analysis
- Identifying call drivers and emerging trends
- Agent coaching opportunities
- Compliance monitoring and adherence
By processing all customer interactions with AI/ML, this sample solution provides comprehensive coverage and actionable insights. The open source foundations and AWS services make it easy to configure for your needs.
At AWS, there are many ways in which we can build conversation intelligence. This project is inspired by our other sample solutions that are listed below,
- Post Call Analytics using Amazon Transcribe - https://github.com/aws-samples/amazon-transcribe-post-call-analytics
- Live Call Analytics - https://github.com/aws-samples/amazon-transcribe-live-call-analytics
This sample solution is for those organizations that need to use open-source or custom models that are specifically built for them. It does most of the heavy lifting associated with providing an end-to-end workflow that can process call recordings, chat transcripts from your existing workflow. It provides actionable insights to spot emerging trends, identify agent coaching opportunities, and assess the general sentiment of calls.
This sample is modular and can be plugged in to your existing analytics workflows to analyze conversations. Refer the architecture below.
The solution supports MP3, WAV and can easily customized to support other audio types. If you have videos, by using Amazon Elemental MediaConvert, we can extract audio and process audio through this solution. Please take a look at the process flow below,
The first step is to identify speakers. To improve accuracy, and identify speakers on the conversation, we need to first perform diarization. Speaker Diarization is the process of the model helping you understand, "Who spoke when?". We can use models like Pyannote Audio or Nvidia’s NeMo to diarize input audio files. We use Pyannote.audio, which is an open-source toolkit written in Python for speaker diarization. This is based on PyTorch machine learning framework, and it comes with state-of-the-art pretrained models and pipelines. It can be further fine-tuned to your own data for even better performance.
After diarization, we split original input file to audio clips based on speaker diarization data. The model then takes audio input and gives text output.
In this solution, we use Faster-Whisper with CTranslate2. faster-whisper is a reimplementation of Whisper model using CTranslate2, which is a fast inference engine for Transformer models. This implementation claims to have upto 4 times faster performance than whisper for the same accuracy while using less memory.
This solution is model agnostic, and you can use any model of your choice by changing container scripts under
ml_stack
with minimal code change.
After transcription is over, we do translation using same whisper model and give audio as input.
The solution use Amazon Comprehend, natural language processing / NLP service that uses machine learning to uncover information in unstructured data. We extract sentiment, entities using Comprehend. We can easily extend this to detect PII and mask if need be
Finally, using generated transcripts, the solution use Generative AI models such as Anthropic Claude v2
on Amazon Bedrock
to summarize the conversation. We’ve built various prompts to extract actions, issues, call back
and other KPIs as needed. For example, you can build a prompt to check if the agent greeted customers properly. And
ended call asking for feedback. And can also get in to complex needs like extracting answers of security question and
comparing with the database to ensure the caller was legitimate.
Please note that the models used in this sample solution need significant storage and network. We recommend to use AWS Cloud9 IDE to build and deploy. Also, since this build ML model containers and deploy to Amazon SageMaker Inference Endpoints, its better to run these on Cloud9. Follow instructions from Setting up Cloud9 and make sure you have enough storage to build models (recommended to have 100GB). After provisioning, ensure you increased disk capacity to 100 GB following the steps here Resize Environment Storage
This project is set up like a standard Python based CDK project. The initialization process also creates a virtualenv
within this project, stored under the .venv directory. To create the virtualenv it assumes that there is a python3
executable in your path with access to the venv
package. If for any reason the automatic creation of the virtualenv
fails, you can create the virtualenv manually once the init process completes.
To manually create a virtualenv on MacOS and Linux:
python3 -m venv .venv
After the init process completes and the virtualenv is created, you can use the following step to activate your virtualenv.
source .venv/bin/activate
If you are a Windows platform, you would activate the virtualenv like this:
% .venv\Scripts\activate.bat
Once the virtualenv is activated, you can install the required dependencies.
pip install -r requirements.txt
To add additional dependencies, for example other CDK libraries, just add to your requirements.txt file and rerun
the pip install -r requirements.txt
command.
We need to configure access credentials in the cloud environment before deploying the CDK stack. For that,
run aws configure
command for the first time to confiture account.
aws configure
Then ensure you have configured the right region and modified cfg.py
. It's ideal to pick a region that
has access to Amazon Bedrock
REGION = "us-west-2"
The web application based on Cloudscape. The source code is
within web_app/ci-portal
. We need to install and build using npm.
cd web_app/ci-portal
npm install
npm run build
Amazon Bedrock users need to request access to models before they are available for use. Model access can be managed only in the Amazon Bedrock console. To request access to a models, select the Model access link in the left side navigation panel in the Amazon Bedrock console.
As a prerequisite, you need to add model access Amazon Bedrock in the region where you are deploying this solution.
We are using pyannote/speaker-diarization-3.0
and pyannote/segmentation-3.0
which are pretrained models from
HuggingFace. Please ensure you add your HuggingFace Security Token
to cfg.py
, or else, the solution will fail while executing.
To create access token, go to your settings in HuggingFace portal, then click on the Access Tokens tab. Also, make sure token has access to both these models by going to the respective models page pyannote/speaker-diarization-3.0 and pyannote/segmentation-3.0 to check if you can access using token.
HF_TOKEN = 'hf_xxxx'
If you are setting up CDK environment for the first time in the region, then run
cdk bootstrap
Finally, you can run the following command to deploy all stacks.
cdk deploy --all
You can optionally do cdk deploy --all --require-approval never
flag which will skip confirming any changes to the
stack deployment and will continue with deployment.
Congratulations! 🎉 You have completed all the steps for setting up conversation intelligence sample solution using AIML on AWS.
The CDK stack will deploy two g5.2xlarge
instances for SageMaker Inference Endpoints which will be running all
the time. We recommend to adjust application scaling policy in diarization_stack
and transcription_stack based on your usage pattern.
cdk ls
list all stacks in the appcdk synth
emits the synthesized CloudFormation templatecdk deploy
deploy this stack to your default AWS account/regioncdk diff
compare deployed stack with current statecdk docs
open CDK documentation
There are three stakes in this solution
ml_stack
- Stack with speaker diarization and transcription models and its respective resources. The stack useAmazon SageMaker
to deploy inference endpointsserver
- Stack with all the functions and workflows usingAWS Lambda
andAWS Step Functions
web_app
- Stack with the dashboard application and APIs usingAmazon API Gateway
,AWS Lambda
,AWS Amplify
- Check for
cdk
output messages for Conversation Bucket name, and CloudFront URL - Go to S3, and open S3 bucket named
ci-process-conversationsxxxxxx-xxx
as mentioned in the CDK output - Create a prefix named
input
and upload sample audio files inside the prefix. If you don't have samples, you shall use files from https://github.com/aws-samples/amazon-transcribe-post-call-analytics/tree/develop/pca-samples/src/samples - Once audio files are upload, you shall check workflow status by opening Step Function
ciworkflowxxxx
- Then, login to the "Conversation Intelligence" dashboard using the CDN URL.
- Create a user and login to the dashboard.
- While creating user, give your email and password, and you will receive verification code. Enter the verification code to successfully register.
- Once you login, there are three major modules, 1. Conversation List, 2. Call Details, and 3. Workflows Administration.
After login, you will land in to the conversation list page. This is a paginated list of all conversations that are processed by the solution.
You also have advanced search/filter feature that allows users to filter list by job name (audio file), language code, duration and other criteria as shown.
You can click Job name to get into the conversation details page.
Conversation Details page has multiple sections 1. Call Metadata, 2. Sentiment and Speaker Insights, 3. Entities, 4. Insights by Generative AI, 5. Analysis powered by Generative AI, and 5. Turn by turn transcript and translated transcript
Call metadata section has file and process metadata including uploaded time, call duration, language, customer and agent sentiment and file format
Sentiment and Speaker Insights section has few charts displaying three insights
- Sentiment progress across four quarters of the call
- Speaker time, total time various speakers conversed across the call
- Turn by turn sentiment
This section displays all the entities detected by Amazon Comprehend throughout the call. This is grouped in to various categories and displayed
This section displays summary, action item, topic, product, resolution, politeness and call back which are extracted by Generative AI. The transcript is given to the LLM and based on the defined prompts, the values are extracted. These prompts are customizable under "Workflows Administration" module
This section allows you to run ad-hoc prompts to extract additional insights by deep diving into the selected conversation. It also lists additional workflow prompts that can extract insights based on specific functions (e.g. sales, product support). These prompts help obtain targeted data and insights.
This section displays entire conversation as turn-by-turn message along with sentiment detected for each turn. This section also has an option to switch to translated version if the original language is not English. It also has audio player widget to play entire audio which also highlights the conversation turn based on time cue.
You can open Workflows administration by choosing the option from navigation bar.
It will have Default Workflow
that contains all the prompts that are required to extract data needed to
give Insights
about the call. These prompts are customizable by selecting
the Default Workflow
card. You can edit or add new workflows. Please note that you won't be able to delete Default
Workflows, but you can edit prompts inside.
Each workflow can have upto 10 prompts (soft limit that can be modified in the code). By default, the solution
leverages Anthropic Claude v2
using Amazon Bedrock
APIs. This can be entirely modified easily in the respective lambdas.
Upon clicking edit icon next to each prompt, you can modify the prompt as shown in the image below. The prompts have to be in specific formats that the chosen Large Language Model (LLM) understands, or it will throw error during execution.
We are working on code enhancements and additional features to make speakers configurable.
This conversation intelligence using AIML offers a scalable, and cost-effective approach to extract summary and insights from agent to customer conversations. It uses Amazon SageMaker, Amazon Bedrock, Amazon Comprehend and other AIML services along with a dashboard that helps contact center quality teams. This solution is published as open source and expect it to be a starting point of your own solution. This can be plugged in to your existing flows, and can be customized to fit your specific needs. Help us make this sample solution better by contributing back with improvements and fixes via GitHub pull requests. For expert assistance, AWS Professional Services and other AWS Partners are here to help.
Congratulations! 🎉 You have completed all the steps for setting up your conversation intelligence sample solution using AIML on AWS.
To make sure you are not charged for any unwanted services, you can clean up by deleting the stack created in the Deploy section and its resources.
When you’re finished experimenting with this sample solution, clean up your resources by using the AWS CloudFormation
console to delete the ci-*
stacks that you deployed. This deletes resources that were created by deploying the
solution. The recording S3 buckets, the DynamoDB table and CloudWatch Log groups are retained after the stack is deleted
to avoid deleting your data.
Or, for deleting this stack, run following cdk command.
$ cdk destroy --all
Your contributions are always welcome! Please have a look at the contribution guidelines first. : tada:
See CONTRIBUTING for more information.
This sample code is made available under the MIT-0 license. See the LICENSE file.