This project is deployed only on prod.
A Dockerized Python script that will export all raw events from MixPanel API and upload to an AWS S3 Bucket
- python 3 and pip
- Docker (optional)
- Python packages listed in
requirements.txt
All parameters are expected as Environment Variables:
AWS_REGION
: set your AWS region example:us-east-1
AWS_ACCESS_KEY_ID
: set your AWS IAM access key IDAWS_SECRET_ACCESS_KEY
: set your AWS IAM Secret access keyS3_BUCKET
: set name of your target S3 bucketS3_PATH
: set the base PATH inside your S3 bucket... do not put a leading/
example:my/mixpanel/data
MIXPANEL_API_SECRET
: Your Mixpanel API secretSTART_DATE
" (Optional) a date from which start exporting events in ISO formatYYYY-MM-DD
example:2018-11-01
- Edit
.env
file and set the proper values for each environment variable - Create Docker image with
docker build --rm -f "Dockerfile" -t mixpanel-to-s3:latest .
(Note: In case of building locally from Macbook M1 arch, thendocker buildx build --rm -f "Dockerfile" --platform=linux/amd64 -t mixpanel-to-s3:latest .
to make it work on ECS Fargate). - Run Docker image with
docker run --rm -it --env-file .env mixpanel-to-s3:latest
- set every environment variables listed in
.env
file with your own values usingexport VAR=VALUE
for each. - install python package requirements (only needed once) with
pip install -r requirements.txt
- run with
python3 mixpanel-to-s3.py
The script will:
- Starting on Date
START_DATE
or (default) since last 5 days - will fetch the MixPanel Raw events in JSON format into a single compressed (gzip), one per day with name
rawEvents_{isodate}.json.gz
- Each file will be uploaded, using S3 Multipart Upload to your specified
S3_BUCKET/S3_PATH
under a folder with the following structure:year=YYYY/month=MM/day=DD
Example:
given date is 2018-11-01, then the final S3 file will be under: s3://S3_BUCKET/S3_PATH/year=2018/month=11/day=01/rawEvents_2018-11-01.json.gz
This folder naming convention make it easier to be queried with tools like Hive or AWS glue, in a way that data will be partitioned by year, month and day.
This needs to be updated.