This is an ETL pipeline to pull crypto exchange data from CoinCap API and load it into our data warehouse.
Note: We use python to pull, transform and load data. Our warehouse is postgres. We also spin up a Metabase instance for our presentation layer.
Note: All of the components are running as docker containers.
-
Docker compose v2: The original python project, called
docker-compose
, aka v1 of docker/compose repo, has now been deprecated and development has moved over to v2. To install the v2docker compose
as a CLI plugin on Linux, supported distribution can now install thedocker-compose-plugin
package. E.g. on debian, I runapt-get install docker-compose-plugin
. -
AWS Configuration and credential file settings: Read more at credential file settings & configuration settings and precendence
# Clone the code as shown below.
https://github.com/maciejbrasewicz/real_time_monitor.git
cd real_time_monitor
Replace content in the following files:
- CODEOWNERS: change the user id from
@maciejbrasewicz
to your GitHub user id. - cd.yml: In this file change the
real_time_monitor
part of theTARGET
to your repository name. If you haven't changed the name, leave it like that - variable.tf: change the default values for
alert_email_id
variable with your email address. If you are going to use a different EC2 instance type than t2.micro, you can also change that in this file.
Note: when asked which purchasing option is right for me? I will answer in one of my blog posts. There is always a "better option"
-
- main.tf: make sure
Clone git repo to EC2
param is customized for your needs (line 112).
- main.tf: make sure
this file defines all the services we need. In our main.tf, we create an EC2 instance, security group where we configure access and a cost alert. For instance, the security group for access to EC2:
# Create security group for access to EC2 from your Anywhere
resource "aws_security_group" "sde_security_group" {
name = "sde_security_group"
description = "Security group to allow inbound SCP & outbound 8080 (Airflow) connections"
ingress {
description = "Inbound SCP"
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
...
The following commands will allow us to configure the project:
make up # docker containers & runs migrations under ./migrations
make ci # Runs auto formatting, lint checks, & all the test files under ./tests
# Create AWS services with Terraform
make tf-init
make infra-up # type in yes after verifying the changes TF will make or `terraform apply -auto-approve` to avoid interactive prompt in the future.
Wait until the EC2 instance is initialized, you can check this via your AWS UI See "Status Check" on the EC2 console, it should be "2/2 checks passed" before proceeding.
In the main.tf file we have created security group for access to EC2 . Now, just create an inbound rule for traffic on port 3000
to open Metabase at http://your_ec2_public_dns:3000
. You can customize it to accept traffic from a particular IP, a particular IP range or Open to public.
make cloud-metabase # this command will allow you to open the metabase at `http://your_ec2_public_dns:3000`
You can connect metabase to the warehouse with the configs in the env file.
make db-migration # enter a description, e.g., create some schema
make warehouse-migration # to run the new migration on your warehouse
Set up the infrastructure with terraform, & defined the following repository secrets. You can set up the repository secrets by going to Settings > Secrets > Actions > New repository secret
.
SERVER_SSH_KEY
: We can get this by runningterraform -chdir=./terraform output -raw private_key
in the project directory and paste the entire content in a new Action secret called SERVER_SSH_KEY.REMOTE_HOST
: Get this by runningterraform -chdir=./terraform output -raw ec2_public_dns
in the project directory.REMOTE_USER
: The value for this is ubuntu.
Make sure to destroy your cloud infrastructure.
make down # Stop docker containers on your computer
make infra-down # type in yes after verifying the changes TF will make