Create SimplyE Analytics Testbed Manually

These are the steps required to create a testbed manually:

Creating an S3 bucket

Creating a bucket

Sign in to the AWS Management Console, open the Amazon S3 console at https://console.aws.amazon.com/s3/ and click on Create bucket button:
In the newly opened dialog enter a new bucket name:

ℹ️ To be able to make the name unique you may want to add a GUID to its end.

Scroll to the end of the page and click on Create bucket:
After the bucket is created, you'll be redirected to the list of all available buckets. Find the newly created bucket in the list and click on its name:

Creating required folders

In the new window showing bucket's settings click on Create folder button:
In the folder's setting window enter the name: json-input. It's the folder that will be storing JSON files containing Circulation Manager analytics events:

Repeat steps 5 - 6 and create the following folder structure:

|- athena
|- glue
   |- scripts
   |- temporary
|- json-input
|- parquet-output

After creating all the folders, the bucket's folder structure should look like as it's shown on the screenshot below:

Uploading test data to the bucket

Now upload test files to the bucket. Go to json-input folder and click on Upload button:
Drag and drop the files from test-data folder:
After adding all the files scroll down to the end of the page and click on Upload button:

Creating a Glue crawler for json-input folder

Creating a new crawler

Open the AWS Glue console at https://console.aws.amazon.com/glue/, choose Crawlers in the navigation pane and then click Add crawler:
Enter the name of the new crawler and click on Next:
Select Crawl new folders and click on Next:
Select S3 data store and choose json-input folder as Input path:
Let Glue to generate a new IAM role, specify its name and click on Next:
Choose Run on demand as Frequency and click on Next:
Click on Add database:
Enter the name of the database and click on Create:
Check Update all new and existing partitions with metadata from the table:
Click on Next until the last page of the wizard and then click on Finish.

Running the crawler

Select the newly created crawler in the list and click on Run to trigger it. After running you should be able to see the message saying that it completed and a new table has been successfully created:

Updating the schema created by the crawler

Select Tables on the left, choose json_input and click on Edit schema:
Select Tables on the left, choose json_input and click on Edit schema:
Walk through all the columns and change data type to timestamp for the following columns:

issued
end
availability_time
start
published After finishing scroll down to the end of the page and click on Save.

Creating a Glue job for converting json-input data to the Apache Parquet format

Creating an IAM policy for the Parquet converter

Open the IAM console at https://console.aws.amazon.com/iam/, in the navigation pane on the left choose Policies:
Click on Create policy:
Switch to JSON tab and insert the following:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject"
            ],
            "Resource": [
                "arn:aws:s3:::<cm-analytics-bucket>/json-input*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:PutObject",
                "s3:DeleteObject"
            ],
            "Resource": [
                "arn:aws:s3:::<cm-analytics-bucket>/glue*",
                "arn:aws:s3:::<cm-analytics-bucket>/parquet-output*"
            ]
        }
    ]
}

where <cm-analytics-bucket> must be replaced with the name of the bucket created in 1 - 3.

Enter the new policy's name on the review page and click on Finish:

Creating an IAM role for the Parquet converter

In the navigation pane on the left choose Roles and click on Create role:
Choose Glue as a trusted entity and go to the next page:
Choose the following policies:

AWSGlueServiceRole
AWSGlueServiceRole-CMAnalyticsParquetConverter, the policy created in 21 - 24:

Enter the new role's name and click on Create role:

Creating a Glue job

Open the AWS Glue console at https://console.aws.amazon.com/glue/, choose Crawlers in the navigation pane and then click Add job:
Enter the name of the new job, select the role created in 29 - 32:
Then scroll down to Advanced properties, enable Job bookmark and scroll down to the next page of the wizard.
Choose json-input as a data source and click on Next:
Leave the tranform type as is and click on Next:
Choose Parqeut and parquet-output as a target type and target path respectively:

Creating a Glue crawler for parquet-output

Creating a new Glue crawler

In the navigation bar on the left select Crawlers again and click on Add crawler:
Enter the name and click on Next:
Specify source type and click on Next:
Specify parquet-output as a target data source:
Specify parquet-output as a target data source:
Let Glue create a new IAM role:
Select cm-analytics as a database where the crawler will reside the output table:
Run the newly created crawler:

Set up AWS Athena

Open the Athena console at https://console.aws.amazon.com/athena/ and start setting it up:
Set athena folder as query result location and click on Save:
Run the query to ensure that Athena has been set up correctly:

Setting up QuickSight

Create a new analysis in QuickSight:
Create a new dataset:
Set up a new Athena dataset and select cm-analytics database:
Select parquet-output table:
Don't use SPICE, directly query data:
Change to N. Virginia regioon and click on Manage QuickSight:
Click on Security & permissions:
Under QuickSight access to AWS services click on Add or remove:
Scroll down to Amazon S3 and click on Select buckets:
Select the bucket created in Creating an S3 bucket and click on Finish:
Click on Update:
Try to create a new dashboard:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!