- In movie_review, score_film is randomly from 1-10. If score_film > 7 it will be good review and asgin as 1
- amount_spent = quantity * unit_price
- review_score is the number of reviews greater than 7
- review_count is the number of reviews of each customer
- insert_date is date that invoice_purchase happen
- File py will generate data for moive_review.csv then push to s3 and invoce_purchase table in database
- Airflow wil run to push data to S3, create external table and insert data in public.user_behavior in Redshift
- AWS account to setup infrastructure
- Docker to run Airflow
- Dbever connect Redshift to check result
- Down load code
- Create S3 bucket to storage data
- Run file py to generate data (remember add permission to push file csv to S3 and change code in file py)
- Download and config to run Airflow https://airflow.apache.org/docs/apache-airflow/stable/howto/docker-compose/index.html
- Go to airflow webserver container and add connections to Postgres DB and AWS Redshift
- Create Redshift cluster and set role for that cluster to use S3
- Using Dbeaver to connect Redshift and check the result: https://www.kodyaz.com/aws/connect-to-amazon-redshift-using-dbeaver-database-management-tool.aspx
- Data in spectrum.invoice_purchase
- user_behavior_metric