We recommand that using Cloud9 environment to deploy, or you must ensure you had installed following requirements in local before starting
git clone https://github.com/gavinjwl/clickstream-on-aws
cd clickstream-on-aws
poetry install
source .venv/bin/activate
Deploy all stacks
cdk deploy --all \
--parameters CoreStack:WriteKey='<define-your-write-key>' \
--parameters CoreStack:RedshiftServerlessSubnetIds='<assign-subnets-to-redshift>' \
--parameters CoreStack:RedshiftServerlessSecurityGroupIds='assign-security-groups-for-redshift'
Or, deploy ONLY CoreStack by
cdk deploy CoreStack \
--parameters CoreStack:WriteKey='<define-your-write-key>' \
--parameters CoreStack:RedshiftServerlessSubnetIds='<assign-subnets-to-redshift>' \
--parameters CoreStack:RedshiftServerlessSecurityGroupIds='assign-security-groups-for-redshift'
Or, deploy ONLY CoreStack with Dashboard by
cdk deploy CoreStack Dashboard \
--parameters CoreStack:WriteKey='<define-your-write-key>' \
--parameters CoreStack:RedshiftServerlessSubnetIds='<assign-subnets-to-redshift>' \
--parameters CoreStack:RedshiftServerlessSecurityGroupIds='assign-security-groups-for-redshift'
Or, deploy ONLY CoreStack with Scheduled Refresh feature by
cdk deploy CoreStack ScheduledRefreshStack \
--parameters CoreStack:WriteKey='<define-your-write-key>' \
--parameters CoreStack:RedshiftServerlessSubnetIds='<assign-subnets-to-redshift>' \
--parameters CoreStack:RedshiftServerlessSecurityGroupIds='assign-security-groups-for-redshift'
After CDK deployments complete, we need to change Redshift Serverless Namespace password, so that we can connect.
Use Username-Password to Connect to Redshift Serverless Namespace. Following show how to connect with QueryEditorV2
Create an external schema for Kinesis Stream
-- Create external schema for kinesis
CREATE EXTERNAL SCHEMA IF NOT EXISTS kinesis FROM KINESIS IAM_ROLE default;
Create clickstream schema
-- Create schema for clickstream
CREATE SCHEMA IF NOT EXISTS clickstream;
Create user and grant permissions
-- Create clickstream user and grant required permissions
-- Please do not change `IAMR:ClickstreamRedshiftRole`
CREATE USER "IAMR:ClickstreamRedshiftRole" PASSWORD DISABLE;
GRANT ALL ON SCHEMA kinesis TO "IAMR:ClickstreamRedshiftRole";
GRANT ALL ON SCHEMA clickstream TO "IAMR:ClickstreamRedshiftRole";
GRANT ALL ON ALL TABLES IN SCHEMA clickstream TO "IAMR:ClickstreamRedshiftRole";
Create a materialized view to consume the stream data
SET enable_case_sensitive_identifier TO true;
CREATE MATERIALIZED VIEW clickstream.mv_kinesisSource
AS
SELECT
ApproximateArrivalTimestamp AS approximateArrivalTimestamp,
PartitionKey AS partitionKey,
ShardId AS shardId,
SequenceNumber AS sequenceNumber,
-- JSON_PARSE(from_varbyte(Data, 'utf-8')) as data,
json_extract_path_text(from_varbyte(data, 'utf-8'), 'messageId')::VARCHAR AS messageId,
json_extract_path_text(from_varbyte(data, 'utf-8'), 'timestamp')::VARCHAR AS event_timestamp,
json_extract_path_text(from_varbyte(data, 'utf-8'), 'type')::VARCHAR AS type,
-- Common
json_extract_path_text(from_varbyte(data, 'utf-8'), 'userId')::VARCHAR AS userId,
json_extract_path_text(from_varbyte(data, 'utf-8'), 'anonymousId')::VARCHAR AS anonymousId,
json_extract_path_text(from_varbyte(data, 'utf-8'), 'context')::SUPER AS context,
json_extract_path_text(from_varbyte(data, 'utf-8'), 'integrations')::SUPER AS integrations,
-- Identify
json_extract_path_text(from_varbyte(data, 'utf-8'), 'traits')::SUPER AS traits,
-- Track
json_extract_path_text(from_varbyte(data, 'utf-8'), 'event')::VARCHAR AS event,
json_extract_path_text(from_varbyte(data, 'utf-8'), 'properties')::SUPER AS properties,
-- Alias
json_extract_path_text(from_varbyte(data, 'utf-8'), 'previousId')::VARCHAR AS previousId,
-- Group
json_extract_path_text(from_varbyte(data, 'utf-8'), 'groupId')::VARCHAR AS groupId,
-- Page
json_extract_path_text(from_varbyte(data, 'utf-8'), 'category')::VARCHAR AS category,
json_extract_path_text(from_varbyte(data, 'utf-8'), 'name')::VARCHAR AS name
FROM kinesis."ClickstreamKinesisStream"
WHERE is_utf8(Data) AND is_valid_json(from_varbyte(Data, 'utf-8'));
Change materialized view owner to IAMR:ClickstreamRedshiftRole
so that ScheduledRefreshStack can work.
SET enable_case_sensitive_identifier TO true;
ALTER TABLE clickstream.mv_kinesisSource OWNER TO "IAMR:ClickstreamRedshiftRole";
-
The easiest way to simulate is doing follow command, for more detail
# Enable your python venv, if not source .venv/bin/activate # Execute simulator python3 simulator.py --host <API Gateway URL> --writeKey <Your Write Key>
-
If you want to simulate more users, you can leverage Locust.
Note You need to change
HOST = '<API Gateway URL>'
andWRITE_KEY = '<Your Write Key>'
in main.py first.# Enable your python venv, if not source .venv/bin/activate # Start locust locust -f benchmark/main.py \ --web-port 8089 # Open your browser and input <API Gateway URL> and how many users you want.
SET enable_case_sensitive_identifier TO true;
SELECT *
FROM clickstream.mv_kinesisSource
LIMIT 10
;
Note
You need to change HOST
to your API Gateway url and WRITE_KEY
to the value you defined in CDK deployment in any SDK.