This repo contains some of the utilities that were used for Redshift to Snowflake migration and are described in more detail in the blog post here.
Some of the internal Python libraries that aren't release as part of this start with faire.internal.*
and would need to be replaced in order to correctly use the content of this repo.
Utilities to parse Redshift SQL dialect to Snowflake is located under parser/snowflake_parser.py
that contains an exhaustive list of patterns that we encountered while migrating ETLs at Faire.
mode_utils
contains the helper script that was reverse-engineered with help of Chrome DevTools to convert and migrate Mode reports from Redshift to Snowflake using Mode API.
table_validation
contains implementation of a YAML based framework that we used to perform automated data parity checks between Redshift and Snowflake tables using Datafold.
The framework does the following:
- Copy Redshift table to S3
- Create a watermarked Snowflake table from dumped S3 file
- Create a watermarked copy of the ETL Snowflake table
- Run data parity between the two copied tables in Snowflake and saves the result in a Mode table, including the diff URL generated by Datafold