You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The objective is to integrate Great Expectations into our Python ETL pipeline to ensure data quality. The task involves researching various integration methods, documenting the findings, and implementing the best solution. This integration aims to validate the data processed by our ETL pipeline, ensuring its accuracy and consistency while catching problems before production.
Acceptance Criteria
Research different options for integrating Great Expectations with a Python ETL pipeline and, if necessary, our Postgres database.
Summarize the pros and cons of each option.
Select the most suitable integration method and explain why it was chosen.
Implement the chosen method to validate the outputs of the ETL pipeline.
Demonstrate the successful integration by showing validation results from the ETL pipeline.
Additional context
Currently, I can imagine either using Great Expectations directly in our Python pipeline or connecting to our Postgres database for data QC. Here are two resources that might help:
This tutorial provides a step-by-step guide on setting up a local PostgreSQL database, creating tables, inserting data, and performing data validations using Great Expectations. It serves as a practical example of integrating Great Expectations with a database.
This guide explains how to connect Great Expectations to in-memory pandas or Spark DataFrames. It introduces the concepts of Data Assets (data in its original format) and Data Sources (storage locations for Data Assets), which are crucial for configuring data validation.
The text was updated successfully, but these errors were encountered:
Describe the task
The objective is to integrate Great Expectations into our Python ETL pipeline to ensure data quality. The task involves researching various integration methods, documenting the findings, and implementing the best solution. This integration aims to validate the data processed by our ETL pipeline, ensuring its accuracy and consistency while catching problems before production.
Acceptance Criteria
Additional context
Currently, I can imagine either using Great Expectations directly in our Python pipeline or connecting to our Postgres database for data QC. Here are two resources that might help:
Great Expectations Postgres Tutorial:
Connect to in-memory Data Assets:
The text was updated successfully, but these errors were encountered: