Skip to content

Commit

Permalink
Merge branch 'develop' v0.7.6
Browse files Browse the repository at this point in the history
  • Loading branch information
rstrahan committed Feb 16, 2024
2 parents aa0dcc0 + 3c69eb3 commit 0faade2
Show file tree
Hide file tree
Showing 29 changed files with 839 additions and 358 deletions.
17 changes: 15 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,21 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

## [0.7.6] - 2024-02-16
### Added
- PCA UI now shows the status of the step function workflow for each call.
- Added support for the new Transcribe Call Analytics generative call summarization option.

### Fixed
- Dependabot updates for PCA
- #234 Fix exception for files that contain no speech segments.
- Fix input bucket trigger to not create a DynamoDB record for metadata files.
- Updated NodeJS to v16.
- Fix bug when deploying with Amazon Titan Text Express.

## [0.7.5] - 2024-01-17
### Added
- Support for larger prompts by storing LLMPromptSummaryTemplate in S3 rather than SSM. By default, the CF templtae will migrate existing SSM prompts to DynamoDB.
- Support for larger prompts by storing LLMPromptSummaryTemplate in S3 rather than SSM. By default, the CF template will migrate existing SSM prompts to DynamoDB.

### Fixed
- #125 Updated the pca-aws-sf-bulk-queue-space.py function to correctly count jobs based on IN_PROGRESS as well as QUEUED
Expand Down Expand Up @@ -160,7 +172,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
### Added
- Initial release

[Unreleased]: https://github.com/aws-samples/amazon-transcribe-post-call-analytics/compare/v0.7.5...develop
[Unreleased]: https://github.com/aws-samples/amazon-transcribe-post-call-analytics/compare/v0.7.6...develop
[0.7.6]: https://github.com/aws-samples/amazon-transcribe-post-call-analytics/releases/tag/v0.7.6
[0.7.5]: https://github.com/aws-samples/amazon-transcribe-post-call-analytics/releases/tag/v0.7.5
[0.7.4]: https://github.com/aws-samples/amazon-transcribe-post-call-analytics/releases/tag/v0.7.4
[0.7.3]: https://github.com/aws-samples/amazon-transcribe-post-call-analytics/releases/tag/v0.7.3
Expand Down
2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
0.7.5
0.7.6
6 changes: 3 additions & 3 deletions aws-kendra-transcribe-media-search/package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

12 changes: 9 additions & 3 deletions docs/generative_ai.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,17 +3,23 @@
Post-Call Analytics has an optional step in the step function workflow to generate insights with generative AI.
PCA supports [Amazon Bedrock](https://aws.amazon.com/bedrock/) (Titan or Anthropic models) and [Anthropic](https://www.anthropic.com/) (3rd party) foundational models (FMs). Customers may also write a Lambda function and provide PCA the ARN, and use any FM of their choice. The prompts below are based on Anthropic's prompt formats. Learn more about prompt design at Anthropic's [Introduction to Prompt Design].(https://docs.anthropic.com/claude/docs/introduction-to-prompt-design).

For Amazon Bedrock models, you must [request model access](https://docs.aws.amazon.com/bedrock/latest/userguide/model-access.html) for the models selected.
For Amazon Bedrock models, you must [request model access](https://docs.aws.amazon.com/bedrock/latest/userguide/model-access.html) for the models selected.

PCA also supports 'Generative AI Queries' - which simply means you can ask questions about a specific call. These queries appear in a chat-like window from within the call details page.

*All the prompts below were tested with Amazon Titan and Anthropic FMs.*

**Note:** If you choose to call Anthropic directly, data will leave your AWS account! Also, the Anthropic API key will be stored in [AWS Secrets Manager](https://docs.aws.amazon.com/secretsmanager/latest/userguide/intro.html), under the key `{StackName}-ThirdPartyApiKey`, where `{StackName}` is replaced with your PCA CloudFormation stack's name.

## Generative AI Insights
## How to enable generative AI Summarization and Insights

When enabled, PCA can run one or more FM inferences against Amazon Bedrock or Anthropic APIs. The prompt used to generate the insights is stored in DynamoDB. The name of the table contains the string `LLMPromptConfigure`, and the table partition key is `LLMPromptTemplateId`. There are two items in the table, one with the partition key value of `LLMPromptSummaryTemplate` and the other with the partition key value of `LLMPromptQueryTemplate`.
To enable generative AI summarization and insights, update the **CallSummarization** CloudFormation parameter with one of the following values: `BEDROCK`, `BEDROCK+TCA`, `ANTHROPIC`, `TCA-ONLY`.

To use Transcribe Call Analytics (TCA) [generative call summarization](https://docs.aws.amazon.com/transcribe/latest/dg/call-analytics-batch.html#tca-summarization-batch), use `BEDROCK+TCA` or `TCA-ONLY` as the value.

**Note:** If you enable TCA, the summarization templates below will skip the 'Summary' prompt for files that are analyzed with TCA. For audio files that are analyzed with Transcribe `standard` mode, such as mono audio files, the 'Summary' prompt will be executed.

When summarization is enabled, PCA can run one or more FM inferences against Amazon Bedrock or Anthropic APIs. The prompt used to generate the insights is stored in DynamoDB. The name of the table contains the string `LLMPromptConfigure`, and the table partition key is `LLMPromptTemplateId`. There are two items in the table, one with the partition key value of `LLMPromptSummaryTemplate` and the other with the partition key value of `LLMPromptQueryTemplate`.

### Generative AI interactive queries

Expand Down
4 changes: 3 additions & 1 deletion docs/output_json_structure.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,8 @@ Contains header-level information around the analytics that have been generated,
"OutcomesDetected": [ ],
"Telephony": [ ],
"SourceInformation": [ ],
"Summary": { }
"Summary": { },
"ContactSummary": { }
}
```

Expand All @@ -62,6 +63,7 @@ Contains header-level information around the analytics that have been generated,
| Telephony | - | *[Optional]* A list of telephony-specific metadata fields extract from the CTR files (only present if the chosen telephony CTR parser chooses to write this information out) |
| SourceInformation | - | Source-specific details for the conversation. Contains just one of any of the possible supported sources |
| Summary | - | Key value pairs that define summary topics and values. These will be rendered inside the GenAI Call Summary panel in the user interface. The key will be rendered as the title, and the value is the body. |
| ContactSummary | - | [Generative call summarization](https://docs.aws.amazon.com/transcribe/latest/dg/call-analytics-batch.html#tca-summarization-batch) output from Transcribe Call Analytics. This is a nested structure. See the structure [here](https://docs.aws.amazon.com/transcribe/latest/dg/tca-output-batch.html#tca-output-summarization-batch). |

###### SpeakerLabels

Expand Down
13 changes: 10 additions & 3 deletions pca-boto3-bedrock/template.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,11 @@ Description: >
PCA Bedrock Boto3 Lambda Layer. This will create an S3 bucket, download
the Boto3 WHL file, and create a Lambda layer for use.
Parameters:
Boto3Version:
Type: String
Default: "1.34.40"

Resources:

BedrockBoto3Bucket:
Expand Down Expand Up @@ -56,6 +61,7 @@ Resources:
Environment:
Variables:
BOTO3_BUCKET: !Ref BedrockBoto3Bucket
BOTO3_VERSION: !Ref Boto3Version
Code:
ZipFile: |
import os
Expand All @@ -70,6 +76,7 @@ Resources:
from datetime import datetime
import cfnresponse
boto3_bucket = os.environ['BOTO3_BUCKET']
boto3_version = os.environ['BOTO3_VERSION']
def upload_file_to_s3(file_path, bucket, key):
s3 = boto3.client('s3')
Expand Down Expand Up @@ -112,8 +119,8 @@ Resources:
try:
if event['RequestType'] != 'Delete':
os.chdir('/tmp')
print(f"running pip install boto3==1.28.57")
subprocess.check_call([sys.executable, "-m", "pip", "install", "boto3==1.28.57", "-t", "python" ])
print(f"running pip install boto3=={boto3_version}")
subprocess.check_call([sys.executable, "-m", "pip", "install", f"boto3=={boto3_version}", "-t", "python" ])
boto3_zip_name = make_zip_filename()
zipdir("python",boto3_zip_name)
print(f"uploading {boto3_zip_name} to s3 bucket {boto3_bucket}")
Expand Down Expand Up @@ -144,7 +151,7 @@ Resources:
ServiceToken: !GetAtt BedrockBoto3ZipFunction.Arn
# Rerun BedrockBoto3ZipFunction if any of the following parameters change
BOTO3_BUCKET: !Ref BedrockBoto3Bucket
VERSION: 1
VERSION: !Ref Boto3Version

BedrockBoto3Layer:
Type: "AWS::Lambda::LayerVersion"
Expand Down
25 changes: 20 additions & 5 deletions pca-main-nokendra.template
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
AWSTemplateFormatVersion: "2010-09-09"

Description: Amazon Transcribe Post Call Analytics - PCA (v0.7.5) (uksb-1sn29lk73)
Description: Amazon Transcribe Post Call Analytics - PCA (v0.7.6) (uksb-1sn29lk73)

Parameters:

Expand Down Expand Up @@ -376,18 +376,22 @@ Parameters:
Type: String
AllowedValues:
- 'DISABLED'
- 'BEDROCK+TCA'
- 'BEDROCK'
- 'TCA-ONLY'
- 'SAGEMAKER'
- 'ANTHROPIC'
- 'LAMBDA'
Description: >
Set to enable call summarization by a Large Language Model.
The BEDROCK+TCA will use Transcribe Call Analytics for summarization and Bedrock for other analytics, and is available only for English.
The BEDROCK option requires you to choose one of the supported model IDs from the provided list (SummarizationBedrockModelId).
You must also accept access to that model in the Amazon Bedrock > Model Access console.
The TCA-ONLY option will not use Bedrock, but will only use Transcribe Call Analytics summarization, and is available only for English.
The SAGEMAKER option uses a SageMaker endpoint with the pretrained bart-large-cnn-samsum model with a ml.m5.xlarge instance type.
The LAMBDA option requires you to provide a function ARN below.
The ANTHROPIC option is a third party service, and you must enter your Anthropic API key in the Third Party LLM API Key section.

SummarizationBedrockModelId:
Type: String
Default: anthropic.claude-instant-v1
Expand Down Expand Up @@ -537,11 +541,22 @@ Conditions:
ShouldCreateKendraIndexDeveloperEdition: !Equals [!Ref EnableTranscriptKendraSearch, 'Yes, create new Kendra Index (Developer Edition)']
ShouldDeployPcaDashboards: !Equals [!Ref EnablePcaDashboards, 'Yes']
ShouldLoadSampleFiles: !Equals [!Ref loadSampleAudioFiles, 'true']
ShouldDeployBedrockBoto3Layer: !Or [!Equals [!Ref CallSummarization, 'BEDROCK'], !Equals [!Ref GenAIQuery, 'BEDROCK'],]
ShouldDeployBedrockBoto3Layer: !Or [
!Equals [!Ref CallSummarization, 'BEDROCK'],
!Equals [!Ref CallSummarization, 'BEDROCK+TCA'],
!Equals [!Ref CallSummarization, 'TCA-ONLY'],
!Equals [!Ref GenAIQuery, 'BEDROCK'],
]
ShouldDeployLLMThirdPartyApiKey: !And [!Not [!Equals [!Ref SummarizationLLMThirdPartyApiKey, '']], !Not [!Equals [!Ref SummarizationLLMThirdPartyApiKey, undefined]]]
ShouldTestBedrockModelId: !Or [!Equals [!Ref CallSummarization, 'BEDROCK'], !Equals [!Ref GenAIQuery, 'BEDROCK'],]
ShouldTestBedrockModelId: !Or [
!Equals [!Ref CallSummarization, 'BEDROCK'],
!Equals [!Ref CallSummarization, "BEDROCK+TCA"],
!Equals [!Ref GenAIQuery, 'BEDROCK'],]
ShouldTestGenAIQueryBedrockModelId: !Equals [!Ref GenAIQuery, 'BEDROCK']
ShouldTestSummarizationBedrockModelId: !Equals [!Ref CallSummarization, 'BEDROCK']
ShouldTestSummarizationBedrockModelId: !Or [
!Equals [!Ref CallSummarization, 'BEDROCK'],
!Equals [!Ref CallSummarization, "BEDROCK+TCA"],
]

Resources:
########################################################
Expand Down
28 changes: 22 additions & 6 deletions pca-main.template
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
AWSTemplateFormatVersion: "2010-09-09"

Description: Amazon Transcribe Post Call Analytics - PCA (v0.7.5) (uksb-1sn29lk73)
Description: Amazon Transcribe Post Call Analytics - PCA (v0.7.6) (uksb-1sn29lk73)

Parameters:

Expand Down Expand Up @@ -378,14 +378,18 @@ Parameters:
Type: String
AllowedValues:
- 'DISABLED'
- 'BEDROCK+TCA'
- 'BEDROCK'
- 'TCA-ONLY'
- 'SAGEMAKER'
- 'ANTHROPIC'
- 'LAMBDA'
Description: >
Set to enable call summarization by a Large Language Model.
The BEDROCK+TCA will use Transcribe Call Analytics for summarization and Bedrock for other analytics, and is available only for English.
The BEDROCK option requires you to choose one of the supported model IDs from the provided list (SummarizationBedrockModelId).
You must also accept access to that model in the Amazon Bedrock > Model Access console.
The TCA-ONLY option will not use Bedrock, but will only use Transcribe Call Analytics summarization, and is available only for English.
The SAGEMAKER option uses a SageMaker endpoint with the pretrained bart-large-cnn-samsum model with a ml.m5.xlarge instance type.
The LAMBDA option requires you to provide a function ARN below.
The ANTHROPIC option is a third party service, and you must enter your Anthropic API key in the Third Party LLM API Key section.
Expand Down Expand Up @@ -539,12 +543,22 @@ Conditions:
ShouldCreateKendraIndexDeveloperEdition: !Equals [!Ref EnableTranscriptKendraSearch, 'Yes, create new Kendra Index (Developer Edition)']
ShouldDeployPcaDashboards: !Equals [!Ref EnablePcaDashboards, 'Yes']
ShouldLoadSampleFiles: !Equals [!Ref loadSampleAudioFiles, 'true']
ShouldDeployBedrockBoto3Layer: !Or [!Equals [!Ref CallSummarization, 'BEDROCK'], !Equals [!Ref GenAIQuery, 'BEDROCK'],]
ShouldDeployBedrockBoto3Layer: !Or [
!Equals [!Ref CallSummarization, 'BEDROCK'],
!Equals [!Ref CallSummarization, 'BEDROCK+TCA'],
!Equals [!Ref CallSummarization, 'TCA-ONLY'],
!Equals [!Ref GenAIQuery, 'BEDROCK'],
]
ShouldDeployLLMThirdPartyApiKey: !And [!Not [!Equals [!Ref SummarizationLLMThirdPartyApiKey, '']], !Not [!Equals [!Ref SummarizationLLMThirdPartyApiKey, undefined]]]
ShouldTestBedrockModelId: !Or [!Equals [!Ref CallSummarization, 'BEDROCK'], !Equals [!Ref GenAIQuery, 'BEDROCK'],]
ShouldTestBedrockModelId: !Or [
!Equals [!Ref CallSummarization, 'BEDROCK'],
!Equals [!Ref CallSummarization, "BEDROCK+TCA"],
!Equals [!Ref GenAIQuery, 'BEDROCK'],]
ShouldTestGenAIQueryBedrockModelId: !Equals [!Ref GenAIQuery, 'BEDROCK']
ShouldTestSummarizationBedrockModelId: !Equals [!Ref CallSummarization, 'BEDROCK']

ShouldTestSummarizationBedrockModelId: !Or [
!Equals [!Ref CallSummarization, 'BEDROCK'],
!Equals [!Ref CallSummarization, "BEDROCK+TCA"],
]

Resources:
########################################################
Expand Down Expand Up @@ -911,6 +925,7 @@ Resources:
- !Ref BulkUploadBucketName
BulkUploadMaxDripRate: !Ref BulkUploadMaxDripRate
BulkUploadMaxTranscribeJobs: !Ref BulkUploadMaxTranscribeJobs
CallSummarization: !Ref CallSummarization
ComprehendLanguages: !Ref ComprehendLanguages
ContentRedactionLanguages: !Ref ContentRedactionLanguages
ConversationLocation: !Ref ConversationLocation
Expand Down Expand Up @@ -977,7 +992,7 @@ Resources:

BedrockBoto3Layer:
Type: AWS::CloudFormation::Stack
Condition: ShouldDeployBedrockBoto3Layer
# Condition: ShouldDeployBedrockBoto3Layer
Properties:
TemplateURL: pca-boto3-bedrock/template.yaml

Expand Down Expand Up @@ -1041,6 +1056,7 @@ Resources:
GenAIQueryBedrockModelId: !Ref GenAIQueryBedrockModelId
FetchTranscriptArn: !GetAtt PCAServer.Outputs.FetchTranscriptArn
SummarizerArn: !GetAtt PCAServer.Outputs.SummarizerArn
StepFunctionName: !Ref StepFunctionName
LLMThirdPartyApiKey: !If
- ShouldDeployLLMThirdPartyApiKey
- !Ref LLMThirdPartyApiKeySecret
Expand Down
1 change: 1 addition & 0 deletions pca-server/cfn/lib/llm.template
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ Description: Amazon Transcribe Post Call Analytics - PCA Server - S3 Trigger
Transform: AWS::Serverless-2016-10-31

Parameters:

LLMPromptSummaryTemplate:
Type: String
Description: >-
Expand Down
2 changes: 1 addition & 1 deletion pca-server/cfn/lib/trigger.template
Original file line number Diff line number Diff line change
Expand Up @@ -116,7 +116,7 @@ Resources:
Properties:
Code: ../../src/trigger
Handler: index.handler
Runtime: nodejs14.x
Runtime: nodejs16.x
Role: !GetAtt ConfigureBucketRole.Arn
Environment:
Variables:
Expand Down
24 changes: 17 additions & 7 deletions pca-server/cfn/pca-server.template
Original file line number Diff line number Diff line change
Expand Up @@ -9,18 +9,25 @@ Parameters:
Description: URL for ffmpeg binary distribution tar file download - see https://www.johnvansickle.com/ffmpeg/

CallSummarization:
Default: 'DISABLED'
Default: 'BEDROCK'
Type: String
AllowedValues:
- 'DISABLED'
- 'SAGEMAKER'
- 'BEDROCK+TCA'
- 'BEDROCK'
- 'LAMBDA'
- 'TCA-ONLY'
- 'SAGEMAKER'
- 'ANTHROPIC'
- 'LAMBDA'
Description: >
Set to enable call summarization by a Large Language Model. The SAGEMAKER option uses a SageMaker endpoint with
the pretrained bart-large-cnn-samsum model with a ml.m5.xlarge instance type. The LAMBDA option requires you
to provide a function ARN below. The ANTHROPIC option is a third party service, and you must enter your Anthropic API key below.
Set to enable call summarization by a Large Language Model.
The BEDROCK+TCA will use Transcribe Call Analytics for summarization and Bedrock for other analytics.
The BEDROCK option requires you to choose one of the supported model IDs from the provided list (SummarizationBedrockModelId).
You must also accept access to that model in the Amazon Bedrock > Model Access console.
The TCA-ONLY option will not use Bedrock, but will only use Transcribe Call Analytics summarization.
The SAGEMAKER option uses a SageMaker endpoint with the pretrained bart-large-cnn-samsum model with a ml.m5.xlarge instance type.
The LAMBDA option requires you to provide a function ARN below.
The ANTHROPIC option is a third party service, and you must enter your Anthropic API key in the Third Party LLM API Key section.

SummarizationBedrockModelId:
Type: String
Expand Down Expand Up @@ -68,7 +75,10 @@ Parameters:

Conditions:
ShouldCreateBoto3Layer: !Equals [!Ref Boto3LayerArn, '']
ShouldDeployBedrockSummarizer: !Equals [!Ref CallSummarization, "BEDROCK"]
ShouldDeployBedrockSummarizer: !Or [
!Equals [!Ref CallSummarization, "BEDROCK"],
!Equals [!Ref CallSummarization, "BEDROCK+TCA"],
]
ShouldDeploySageMakerSummarizer: !Equals [!Ref CallSummarization, "SAGEMAKER"]
ShouldEnableAnthropicSummarizer: !Equals [!Ref CallSummarization, "ANTHROPIC"]
ShouldEnableEndOfCallLambdaHookFunction: !Equals [!Ref CallSummarization, "LAMBDA"]
Expand Down
9 changes: 9 additions & 0 deletions pca-server/src/pca/pca-aws-sf-process-turn-by-turn.py
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,12 @@ def __init__(self, min_sentiment_pos, min_sentiment_neg, custom_entity_endpoint)
self.simpleEntityMatchingUsed = (self.customEntityEndpointARN == "") and \
(cf.appConfig[cf.CONF_ENTITY_FILE] != "")

def process_tca_summary(self):
if self.api_mode == cf.API_ANALYTICS and \
self.asr_output["ConversationCharacteristics"] and \
"ContactSummary" in self.asr_output["ConversationCharacteristics"]:
self.analytics.contact_summary = self.asr_output["ConversationCharacteristics"]["ContactSummary"]

def generate_sentiment_trend(self, speaker, speaker_num):
"""
Generates an entry for the "SentimentTrends" block for the given speaker, which is the overall speaker
Expand Down Expand Up @@ -1149,6 +1155,9 @@ def parse_transcribe_file(self, sf_event):
# Update our results data structures, generate JSON results and save them to S3
self.push_turn_by_turn_results()

# Update summary structures
self.process_tca_summary()

# Write out the JSON data back to our interim S3 location
json_output, output_filename = self.pca_results.write_results_to_s3(bucket=output_bucket,
object_key=sf_event["interimResultsFile"])
Expand Down
Loading

0 comments on commit 0faade2

Please sign in to comment.