Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ingestion/glue): Implement unified AWS Glue table representation #12335

Conversation

svdimchenko
Copy link
Contributor

@svdimchenko svdimchenko commented Jan 14, 2025

Problem statement

We are currently using following tech stack:

  • kafka
  • s3 + external tables defined in glue catalog.
  • dbt
  • airflow
  • tableau

We want to have the full data lineage, however it's broken on glue tables -> dbt stage.
Actually glue ingestor ignores athena's catalog name and stores entities in format database.table, however actual representation in case when platform='athena' should be catalog.database.table.

As I dig deeper, I understand that currently catalog entity is missed for glue ingest type.
So adding catalog at database level and database at datahub's schema level seems logical to me. However this brings breaking changes to the current implementation.
WDYT regarding the issue ? Once we finalise the vision, I can improve the current PR.

Checklist

@github-actions github-actions bot added ingestion PR or Issue related to the ingestion of metadata community-contribution PR or Issue raised by member(s) of DataHub Community labels Jan 14, 2025
Copy link

codecov bot commented Jan 14, 2025

Codecov Report

Attention: Patch coverage is 83.78378% with 6 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...ingestion/src/datahub/ingestion/source/aws/glue.py 84.84% 5 Missing ⚠️
...ion/src/datahub/ingestion/source/aws/aws_common.py 75.00% 1 Missing ⚠️
Files with missing lines Coverage Δ
...ion/src/datahub/ingestion/source/aws/aws_common.py 66.19% <75.00%> (ø)
...ingestion/src/datahub/ingestion/source/aws/glue.py 87.82% <84.84%> (ø)

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ef36837...5382a60. Read the comment docs.

@svdimchenko svdimchenko marked this pull request as ready for review January 14, 2025 21:04
@datahub-cyborg datahub-cyborg bot added the needs-review Label for PRs that need review from a maintainer. label Jan 14, 2025
@svdimchenko svdimchenko marked this pull request as draft January 15, 2025 08:44
@svdimchenko svdimchenko changed the title feat: Implement unified AWS Glue table representation feat(ingestion/glue): Implement unified AWS Glue table representation Jan 15, 2025
@svdimchenko
Copy link
Contributor Author

closing due to more complex reengineering is required #12410

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community-contribution PR or Issue raised by member(s) of DataHub Community ingestion PR or Issue related to the ingestion of metadata needs-review Label for PRs that need review from a maintainer.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant