-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New Dagster Openmetadata Ingestion Pipeline #1484
base: main
Are you sure you want to change the base?
Conversation
Added the new openmetadata definition to workspaces. Added the trino configuration details as a dictionary object under resources. Added the new pipeline definition for openmetadata ingestion using Trino as the initial data sources as an asset and using dated partitions.
Removing the yaml load logic. Removing unnecessary asset_helper functions.
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
The pull request introduces the initial scaffolding for an OpenMetadata ingestion pipeline, which is a valuable addition to the platform. The code is generally well-structured, but the hardcoded credentials in trino.py
must be addressed before merging. Additionally, the dependency version changes in uv.lock
should be carefully reviewed to ensure they are intentional and don't introduce any issues.
Summary of Findings
- Security Risk: Hardcoded Credentials: The
trino.py
file contains hardcoded credentials, which is a significant security vulnerability. These should be managed securely using Vault or a similar secrets management system. - Dependency Version Changes: Several dependency versions in
uv.lock
have been downgraded or changed. It's important to verify that these changes are intentional and don't introduce compatibility issues or regressions.
Assessment
The pull request introduces a new Dagster pipeline for ingesting Trino table metadata into OpenMetadata. The scaffolding appears well-structured, incorporating necessary dependencies, configurations, and asset definitions. However, there are some areas that require attention before merging. Specifically, the hardcoded credentials pose a security risk and should be addressed. Additionally, some dependency version changes in uv.lock
seem unusual and warrant further investigation. Users should have others review and approve this code before merging.
"password": "<retrieve this from https://vault-production.odl.mit.edu/ui/vault/secrets/platform-secrets/kv/starburst-galaxy/details?version=4>" | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Storing credentials directly in the code is a security risk. Consider using a more secure method, such as retrieving the password from Vault or another secrets management system. This is a critical security concern that needs to be addressed before merging.
"password": vault.read_secret("path/to/secret")["password"] # Retrieve from Vault
"jwtToken": "<retrieve this from https://open-metadata-ci.ol.mit.edu/bots/ingestion-bot>" | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the relevant tickets?
#1355
Description (What does it do?)
First pass at building out the scaffolding for our openmetadata ingestion pipeline.
This PR adds pipeline code to ingest Trino table metadata into our openmetadata instance.
Updated dependencies in toml and lock files.
Added the new openmetadata definition to workspaces.
Added the trino configuration details as a dictionary object under resources.
Added the new pipeline definition for openmetadata ingestion using Trino as the initial data sources as an asset and using dated partitions.
This is still WIP.
How can this be tested?
Update the credentials in lines 12 and 34 in src/ol_orchestrate/resources/openmetadata/trino.py by using the referenced urls: Trino bot_metadata password from vault
openmetadata JWT token
Export your github token and launch docker compose.
export GITHUB_TOKEN=<YOUR_GITHUB_TOKEN> docker compose up