Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BFD-3854: IDR pipeline POC #2548

Open
wants to merge 8 commits into
base: master
Choose a base branch
from
Open

BFD-3854: IDR pipeline POC #2548

wants to merge 8 commits into from

Conversation

aschey-forpeople
Copy link
Contributor

JIRA Ticket:
BFD-3854

What Does This PR Do?

Creates the initial POC for the IDR pipeline. The Snowflake connector is not fully set up yet since we don't have access.
In the meantime, we can mock the IDR schema directly in Postgres for testing purposes. I've currently included a small subset of the data and simplified queries so we can quickly make changes without dealing with the large number of columns that we will eventually need to handle.

The current approach used here is to stream data out of IDR in batches and load them into Postgres using COPY. It's possible that this approach will change once we are able to do performance testing with real data.

There is a simple integration test included to verify the data load is working on a basic level.

What Should Reviewers Watch For?

If you're reviewing this PR, please check for these things in particular:

What Security Implications Does This PR Have?

Please indicate if this PR does any of the following:

  • Adds any new software dependencies

  • Modifies any security controls

  • Adds new transmission or storage of data

  • Any other changes that could possibly affect security?

  • I have considered the above security implications as it relates to this PR. (If one or more of the above apply, it cannot be merged without the ISSO or team security engineer's (@sb-benohe) approval.)

Validation

Have you fully verified and tested these changes? Is the acceptance criteria met? Please provide reproducible testing instructions, code snippets, or screenshots as applicable.

Tests included.

from pydantic import BaseModel


class IdrBeneficiary(BaseModel):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pydantic's model classes make it easy to work with typed data coming out of the database. It does introduce some overhead due to runtime type checking though, so we may want to tweak this in the future if the performance hit is large.

class SnowflakeExtractor(Extractor):
def __init__(self, batch_size: int):
super().__init__()
self.conn = snowflake.connector.connect(user="", password="", account="")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is untested for now because we don't have Snowflake access. This will be used for prod data and the PostgresExtractor above can be used for test data.

@aschey-forpeople aschey-forpeople enabled auto-merge (squash) February 12, 2025 17:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant