Merge branch 'develop' v0.7.2

aws-samples · Oct 3, 2023 · 645eb89 · 645eb89
2 parents 032fa5f + e2c1467
commit 645eb89
Show file tree

Hide file tree

Showing 16 changed files with 534 additions and 188 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -4,6 +4,14 @@ All notable changes to this project will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
+## [0.7.2] - 2023-10-03
+### Fixed
+- Enable Bedrock GA by default for call summarization and chat/generative query
+- Prompt updates for Bedrock GA release (formatting, multiple prompts per call)
+- Updated GenerativeAI README and main README with model access details
+- Links to the LLM Parameter Store Prompts from the CloudFormation Output
+- Adaptive retries for SSM GetParameter and InvokeModel to prevent throttling errors
+
 ## [0.7.1] - 2023-09-05
 ### Fixed
 - Stack deploy failure (unable to create secret in SecretsManager) when SummarizationLLMThirdPartyApiKey is left empty. Changed default value to 'undefined'.
@@ -127,8 +135,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ### Added
 - Initial release
 
-[Unreleased]: https://github.com/aws-samples/amazon-transcribe-post-call-analytics/compare/v0.7.1...develop
-[0.7.1]: https://github.com/aws-samples/amazon-transcribe-post-call-analytics/releases/tag/v0.7.0
+[Unreleased]: https://github.com/aws-samples/amazon-transcribe-post-call-analytics/compare/v0.7.2...develop
+[0.7.2]: https://github.com/aws-samples/amazon-transcribe-post-call-analytics/releases/tag/v0.7.2
+[0.7.1]: https://github.com/aws-samples/amazon-transcribe-post-call-analytics/releases/tag/v0.7.1
 [0.7.0]: https://github.com/aws-samples/amazon-transcribe-post-call-analytics/releases/tag/v0.7.0
 [0.6.0]: https://github.com/aws-samples/amazon-transcribe-post-call-analytics/releases/tag/v0.6.0
 [0.5.2]: https://github.com/aws-samples/amazon-transcribe-post-call-analytics/releases/tag/v0.5.2

diff --git a/README.md b/README.md
@@ -34,7 +34,7 @@ PCA currently supports the following features:
     * Detects when caller and agent interrupt each other
     * Speaker loudness
 * **Generative AI**
-    * Abstractive call summarization using [HuggingFace bart-large-cnn-samsum](https://huggingface.co/philschmid/bart-large-cnn-samsum) deployed on Sagemaker, [Anthropic Claude](https://www.anthropic.com/index/introducing-claude) (which is coming to [Amazon Bedrock](https://aws.amazon.com/bedrock/)), or a user-defined custom AWS Lambda function.
+    * Abstractive call summarization using [Amazon Bedrock](https://aws.amazon.com/bedrock/), [HuggingFace bart-large-cnn-samsum](https://huggingface.co/philschmid/bart-large-cnn-samsum) deployed on Sagemaker, [Anthropic Claude](https://www.anthropic.com/index/introducing-claude) (third party API), or a user-defined custom AWS Lambda function.
 * **Search**
     * Search on call attributes such as time range, sentiment, or entities
     * Search transcriptions
@@ -84,7 +84,7 @@ Once standard PCA processing is complete the telephony-specific CTR handler will
 
 ## (optional) Generative AI Call Summarization
 
-PCA contains a new step in the step functions that (if enabled) will generate a call summary. There are 4 choices for call summarization - Sagemaker Endpoint with HuggingFace bart-large-cnn-samsum, Amazon Bedrock (preview access only) Anthropic Claude, or a custom AWS Lambda function.  
+PCA contains a new step in the step functions that (if enabled) will generate a call summary. There are 4 choices for call summarization - Amazon Bedrock, Sagemaker Endpoint with HuggingFace bart-large-cnn-samsum, Anthropic Claude, or a custom AWS Lambda function.
 
 Learn more about the features in the [Generative AI readme](./docs/generative_ai.md)
 
@@ -94,7 +94,7 @@ When deploying PCA, the CloudFormation parameter `CallSummarization` value defin
 
 If `DISABLED` is chosen, the PCA step function will bypass the summarization step.
 
-If `BEDROCK` is chosen, you must have access to the Amazon Bedrock service, currently in private preview. Also select the Bedrock model `SummarizationBedrockModelId` parameter. 
+If `BEDROCK` is chosen, you must select the Bedrock model `SummarizationBedrockModelId` parameter. You must [request model access](https://docs.aws.amazon.com/bedrock/latest/userguide/model-access.html) for the model selected.
 
 If `SAGEMAKER` is chosen, PCA will be deployed with the [HuggingFace bart-large-cnn-samsum](https://huggingface.co/philschmid/bart-large-cnn-samsum) model on a `ml.m5.xlarge` instance type. By default, it is deployed as a single instance count, defined by the `SummarizationSageMakerInitialInstanceCount` parameter. If `SummarizationSageMakerInitialInstanceCount` is set to `0`, the endpoint will be deployed as a [SageMaker Serverless Inference](https://docs.aws.amazon.com/sagemaker/latest/dg/serverless-endpoints.html) endpoint.
 

diff --git a/VERSION b/VERSION
@@ -1 +1 @@
-0.7.1
+0.7.2
diff --git a/docs/generative_ai.md b/docs/generative_ai.md
@@ -1,7 +1,9 @@
 # PCA and Generative AI
 
 Post-Call Analytics has an optional step in the step function workflow to generate insights with generative AI. 
-PCA supports [Amazon Bedrock](https://aws.amazon.com/bedrock/) (Titan or Anthropic models) and [Anthropic](https://www.anthropic.com/) (3rd party) foundational models (FMs). Customers may also write a Lambda function and provide PCA the ARN, and use any FM of their choice.
+PCA supports [Amazon Bedrock](https://aws.amazon.com/bedrock/) (Titan or Anthropic models) and [Anthropic](https://www.anthropic.com/) (3rd party) foundational models (FMs). Customers may also write a Lambda function and provide PCA the ARN, and use any FM of their choice. The prompts below are based on Anthropic's prompt formats. Learn more about prompt design at Anthropic's [Introduction to Prompt Design].(https://docs.anthropic.com/claude/docs/introduction-to-prompt-design). 
+
+For Amazon Bedrock models, you must [request model access](https://docs.aws.amazon.com/bedrock/latest/userguide/model-access.html) for the models selected.
 
 PCA also supports 'Generative AI Queries' - which simply means you can ask questions about a specific call. These queries appear in a chat-like window from within the call details page.
 
@@ -11,34 +13,27 @@ PCA also supports 'Generative AI Queries' - which simply means you can ask quest
 
 ## Generative AI Insights
 
-When enabled, PCA can run one or more FM inferences against Bedrock or Anthropic APIs. The prompt used to generate the insights is configured in a [AWS Systems Manager Parameter Store](https://docs.aws.amazon.com/systems-manager/latest/userguide/systems-manager-parameter-store.html). The name of the parameter is `LLMPromptSummaryTemplate`.
+When enabled, PCA can run one or more FM inferences against Amazon Bedrock or Anthropic APIs. The prompt used to generate the insights is configured in a [AWS Systems Manager Parameter Store](https://docs.aws.amazon.com/systems-manager/latest/userguide/systems-manager-parameter-store.html). The name of the parameter is `LLMPromptSummaryTemplate`.
 
-### Single FM Inference
+### Multiple inferences per call
 
-The default value for the prompt parameter provides one single prompt:
+The default value for `LLMPromptSummaryTemplate` is a JSON object with key/value pairs, each pair representing the label (key) and prompt (value). During the `Summarize` step, PCA will iterate the keys and run each prompt. PCA will replace  `<br>` tags with newlines, and  `{transcript}` is replaced with the call transcript.  The key will be used as a header for the value in the "generated insights" section in the PCA UI.
+
+Below is the default value of `LLMpromptSummaryTemplate`. 
 
 ```
-Human: Answer all the questions below as a json object with key value pairs, the key is provided, and answer as the value, based on the transcript. Only return json. 
-<br><questions> 
-<br>Summary: Summarize the call. 
-<br>Topic: Topic of the call. Choose from one of these or make one up (iphone issue, billing issue, cancellation) 
-<br>Product: What product did the customer call about? (internet, broadband, mobile phone, mobile plans) 
-<br>Resolved: Did the agent resolve the customer's questions? (yes or no)  
-<br>Callback: Was this a callback? (yes or no)  
-<br>Politeness: Was the agent polite and professional? (yes or no) 
-<br>Actions: What actions did the Agent take?  
-<br></questions>  
-<br><transcript> 
-<br>{transcript} 
-<br></transcript> 
-<br>Assistant: Here is the JSON object with the answers to the questions:
+{
+  "Summary":"<br><br>Human: Answer the questions below, defined in <question></question> based on the transcript defined in <transcript></transcript>. If you cannot answer the question, reply with 'n/a'. Use gender neutral pronouns. When you reply, only respond with the answer.<br><br><question>What is a summary of the transcript?</question><br><br><transcript><br>{transcript}<br></transcript><br><br>Assistant:",
+  "Topic":"<br><br>Human: Answer the questions below, defined in <question></question> based on the transcript defined in <transcript></transcript>. If you cannot answer the question, reply with 'n/a'. Use gender neutral pronouns. When you reply, only respond with the answer.<br><br><question>What is the topic of the call? For example, iphone issue, billing issue, cancellation. Only reply with the topic, nothing more.</question><br><br><transcript><br>{transcript}<br></transcript><br><br>Assistant:",
+  "Product":"<br><br>Human: Answer the questions below, defined in <question></question> based on the transcript defined in <transcript></transcript>. If you cannot answer the question, reply with 'n/a'. Use gender neutral pronouns. When you reply, only respond with the answer.<br><br><question>What product did the customer call about? For example, internet, broadband, mobile phone, mobile plans. Only reply with the product, nothing more.</question><br><br><transcript><br>{transcript}<br></transcript><br><br>Assistant:",
+  "Resolved":"<br><br>Human: Answer the questions below, defined in <question></question> based on the transcript defined in <transcript></transcript>. If you cannot answer the question, reply with 'n/a'. Use gender neutral pronouns. When you reply, only respond with the answer.<br><br><question>Did the agent resolve the customer's questions? Only reply with yes or no, nothing more. </question><br><br><transcript><br>{transcript}<br></transcript><br><br>Assistant:",
+  "Callback":"<br><br>Human: Answer the questions below, defined in <question></question> based on the transcript defined in <transcript></transcript>. If you cannot answer the question, reply with 'n/a'. Use gender neutral pronouns. When you reply, only respond with the answer.<br><br><question>Was this a callback? (yes or no) Only reply with yes or no, nothing more.</question><br><br><transcript><br>{transcript}<br></transcript><br><br>Assistant:",
+  "Politeness":"<br><br>Human: Answer the question below, defined in <question></question> based on the transcript defined in <transcript></transcript>. If you cannot answer the question, reply with 'n/a'. Use gender neutral pronouns. When you reply, only respond with the answer.<br><br><question>Was the agent polite and professional? (yes or no) Only reply with yes or no, nothing more.</question><br><br><transcript><br>{transcript}<br></transcript><br><br>Assistant:",
+  "Actions":"<br><br>Human: Answer the question below, defined in <question></question> based on the transcript defined in <transcript></transcript>. If you cannot answer the question, reply with 'n/a'. Use gender neutral pronouns. When you reply, only respond with the answer.<br><br><question>What actions did the Agent take? </question><br><br><transcript><br>{transcript}<br></transcript><br><br>Assistant:"
+}
 ```
 
-The `<br>` tags are replaced with newlines, and  `{transcript}` is replaced with the call transcript.
-
-**Note:** This prompt generates 7 insights in a single inference - summary, topic, product, resolved, callback, agent politeness, and actions.
-
-The expected output of the inference should be a single JSON object with key-value pairs, similar to the below:
+The expected output after the summarize step is a single json object, as a string, that contains all the key/value pairs. For example:
 
 ```
 {
@@ -52,18 +47,37 @@ The expected output of the inference should be a single JSON object with key-val
 }
 ```
 
-### Multiple inferences per call
 
-If you would like to run individual inferences to generate the summary (for example, if you are using a fine-tuned FM for a specific inference, or your FM does not generate proper JSON), then you can change the prompt parameter input to be a JSON with key value pairs. The key will be the title in the generated insights section, and the value will be the prompt used to generate the value. Don't forget to add `{transcript}` to each prompt!
+### Single FM Inference
+
+Some LLMs may be able to generate the JSON with one inference, rather than several. Below is an example that we've seen work, but with mixed results. 
 
 ```
-{
-  "Summary":"Human: Summarize the following transcript:<br><transcript><br>{transcript}<br></transcript><br>Assistant:",
-  "Agent Politeness":"Human: Based on the following transcript, reply 'yes' if the agent was polite, or provide details if they were not polite.<br><transcript><br>{transcript}<br></transcript><br>Assistant:"
-}
+<br>
+<br>Human: Answer all the questions below, based on the contents of <transcript></transcript>, as a json object with key value pairs. Use the text before the colon as the key, and the answer as the value.  If you cannot answer the question, reply with 'n/a'. Only return json. Use gender neutral pronouns. Skip the preamble; go straight into the json.
+<br>
+<br><questions>
+<br>Summary: Summarize the transcript in no more than 5 sentences. Were the caller's needs met during the call?
+<br>Topic: Topic of the call. Choose from one of these or make one up (iphone issue, billing issue, cancellation)
+<br>Product: What product did the customer call about? (internet, broadband, mobile phone, mobile plans)
+<br>Resolved: Did the agent resolve the customer's questions? (yes or no) 
+<br>Callback: Was this a callback? (yes or no) 
+<br>Politeness: Was the agent polite and professional? (yes or no)
+<br>Actions: What actions did the Agent take? 
+<br></questions> 
+<br>
+<br><transcript>
+<br>{transcript}
+<br></transcript>
+<br>
+<br>Assistant:
 ```
 
-The expected output from the LLM is a single string that contains the value/answer. The key from the prompt definition will be used as the header in the UI.
+The `<br>` tags are replaced with newlines, and  `{transcript}` is replaced with the call transcript.
+
+**Note:** This prompt generates 7 insights in a single inference - summary, topic, product, resolved, callback, agent politeness, and actions.
+
+The expected output of the inference should be a single JSON object with key-value pairs, similar to above.
 
 ### Call list default columns
 
@@ -76,11 +90,18 @@ For interactive queries from within PCA, it uses a different parameter, named `L
 The default value is:
 
 ```
-Human: You are an AI chatbot. Carefully read the following transcript and then provide a short answer to the question. If the answer cannot be determined from the transcript or the context, then reply saying Sorry, I don't know.  
-<br><question>{question}</question> 
-<br><transcript> 
-<br>{transcript} 
-<br></transcript> 
+<br>
+<br>Human: You are an AI chatbot. Carefully read the following transcript within <transcript></transcript> 
+and then provide a short answer to the question. If the answer cannot be determined from the transcript or 
+the context, then reply saying Sorry, I don't know. Use gender neutral pronouns. Skip the preamble; when you reply, only 
+respond with the answer.
+<br>
+<br><question>{question}</question>
+<br>
+<br><transcript>
+<br>{transcript}
+<br></transcript>
+<br>
 <br>Assistant:
 ```
 

diff --git a/pca-boto3-bedrock/template.yaml b/pca-boto3-bedrock/template.yaml
@@ -3,13 +3,6 @@ Description: >
   PCA Bedrock Boto3 Lambda Layer. This will create an S3 bucket, download
   the Boto3 WHL file, and create a Lambda layer for use.
 
-Parameters:
-
-  BedrockPreviewSdkUrl:
-    Type: String
-    Default: https://d2eo22ngex1n9g.cloudfront.net/Documentation/SDK/bedrock-python-sdk.zip
-    Description: URL for the Bedrock SDK zip file (Bedrock preview access only)
-
 Resources:
 
   BedrockBoto3Bucket:
@@ -62,7 +55,6 @@ Resources:
       MemorySize: 512
       Environment:
         Variables:
-          SDK_DOWNLOAD_URL: !Ref BedrockPreviewSdkUrl
           BOTO3_BUCKET: !Ref BedrockBoto3Bucket
       Code:
         ZipFile: |
@@ -76,40 +68,13 @@ Resources:
           import urllib3
           from datetime import datetime
           import cfnresponse
-          bedrock_sdk_url = os.environ['SDK_DOWNLOAD_URL']
           boto3_bucket = os.environ['BOTO3_BUCKET']
 
-          def download_file_from_url(url, local_path):
-              """Download a file from a URL to a local save path."""
-              http = urllib3.PoolManager()
-              response = http.request('GET', url)
-              if response.status == 200:
-                  with open(local_path, 'wb') as file:
-                      file.write(response.data)
-                  print("File downloaded successfully.")
-              else:
-                  print("Failed to download the file.", response)
-
           def upload_file_to_s3(file_path, bucket, key):
               s3 = boto3.client('s3')
               s3.upload_file(file_path, bucket, key)
               print(f"Upload successful. {file_path} uploaded to {bucket}/{key}")
 
-          def extract_file_from_zip(zip_file_path, file_name):
-              with zipfile.ZipFile(zip_file_path, 'r') as zip_ref:
-                zip_ref.extract(file_name)
-                print(f"Successfully extracted {file_name} from {zip_file_path}")
-
-          def find_boto_wheels(zipname):
-            zipf = zipfile.ZipFile(zipname, 'r')
-            zip_files = zipf.namelist()
-            b = re.compile('boto3(.*)\.whl')
-            bc = re.compile('botocore(.*)\.whl')
-            boto3_whl_file = [ s for s in zip_files if b.match(s) ][0]
-            botocore_whl_file = [ s for s in zip_files if bc.match(s) ][0]
-            
-            return boto3_whl_file, botocore_whl_file
-
           def make_zip_filename():
             now = datetime.now()
             timestamp = now.strftime('%Y%m%d_%H%M%S')
@@ -141,20 +106,8 @@ Resources:
             try: 
               if event['RequestType'] != 'Delete':
                 os.chdir('/tmp')
-                # download Bedrock SDK
-                zip_file_name='bedrock-python-sdk.zip'
-                print(f"downloading from {bedrock_sdk_url} to {zip_file_name}")
-                download_file_from_url(bedrock_sdk_url, zip_file_name)
-                boto3_whl_file, botocore_whl_file = find_boto_wheels(zip_file_name)
-                extract_file_from_zip(zip_file_name, botocore_whl_file)
-                extract_file_from_zip(zip_file_name, boto3_whl_file)
-                if os.path.exists("python"):
-                  shutil.rmtree("python")
-                os.mkdir("python")
-                print(f"running pip install botocore")
-                subprocess.check_call([sys.executable, "-m", "pip", "install", botocore_whl_file, "-t", "python" ])
-                print(f"running pip install boto3")
-                subprocess.check_call([sys.executable, "-m", "pip", "install", boto3_whl_file, "-t", "python" ])
+                print(f"running pip install boto3==1.28.57")
+                subprocess.check_call([sys.executable, "-m", "pip", "install", "boto3==1.28.57", "-t", "python" ])
                 boto3_zip_name = make_zip_filename()
                 zipdir("python",boto3_zip_name)
                 print(f"uploading {boto3_zip_name} to s3 bucket {boto3_bucket}")
@@ -181,7 +134,6 @@ Resources:
     Properties:
       ServiceToken: !GetAtt BedrockBoto3ZipFunction.Arn
       # Rerun BedrockBoto3ZipFunction if any of the following parameters change
-      SDK_DOWNLOAD_URL: !Ref BedrockPreviewSdkUrl
       BOTO3_BUCKET: !Ref BedrockBoto3Bucket
 
   BedrockBoto3Layer: