Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(): Introducing support for DataHub Events Source #142

Merged
merged 10 commits into from
Jan 22, 2025

Conversation

jjoyce0510
Copy link
Contributor

As of v0.3.7 of DataHub Cloud, connecting DataHub actions is supported.

Summary

  • Add new DataHubEventsSource for reading events from DataHub Cloud versions > v0.3.7.
  • Add unit tests
  • Add documentation

QA & Validation

Verified that the integration is working correctly with longtailcompanions.acryl.io:

Hello world! Received event:
{
    "event_type": "EntityChangeEvent_v1",
    "event": {
        "entityType": "assertion",
        "entityUrn": "urn:li:assertion:aed3ec48-5dbb-40d5-8942-e44a2ff3fa78",
        "category": "RUN",
        "operation": "COMPLETED",
        "auditStamp": {
            "time": 1737418020285,
            "actor": "urn:li:corpuser:__datahub_system"
        },
        "version": 0,
        "parameters": {
            "asserteeUrn": "urn:li:dataset:(urn:li:dataPlatform:snowflake,test_db.public.purchase_event,PROD)",
            "runId": "urn:li:assertion:aed3ec48-5dbb-40d5-8942-e44a2ff3fa78-1737418020282",
            "runResult": "COMPLETE",
            "assertionResult": "SUCCESS"
        }
    },
    "meta": {
        "batch_id": 1,
        "msg_id": 2
    }
}

Copy link

github-actions bot commented Jan 21, 2025

Unit Test Results (build & test)

94 tests  +31   94 ✅ +31   14s ⏱️ +10s
 1 suites ± 0    0 💤 ± 0 
 1 files   ± 0    0 ❌ ± 0 

Results for commit 226a305. ± Comparison against base commit c0fc442.

♻️ This comment has been updated with latest results.


# Converts a DataHub Events Message to an EntityChangeEvent.
def build_entity_change_event(payload: GenericPayloadClass) -> EntityChangeEvent:
return EntityChangeEvent.from_json(payload.get("value"))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this need any try/catch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no - this should raise an exception if it fails to parse. all i would do here is wrap and raise the exception. i can do that to add context but this indicates a fundamental issue

if pipeline_name.startswith("urn:li:dataHubAction:"):
return pipeline_name
else:
return f"urn:li:dataHubAction:{pipeline_name}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefer urn_builder here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a pretty rarely constructured urn from python, so we don't have a utility for it: https://github.com/datahub-project/datahub/blob/8ac35fa9ee6229ab7b8c0f4b2a0e669113574cd1/metadata-ingestion/src/datahub/emitter/mce_builder.py#L125 (unless you're referring to another set of utils)

### Privileges

By default, users do not have access to the Events API of DataHub Cloud. In order to access the API, the user or service account
associated with the access token used to configure this events source _must_ have the `Get Platform Events` platform privilege, which
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not grant this to all Admins? Admins already have the ability to read all metadata. This seems like it will be a constant source of frustration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, the root user does have it but they are the only ones. We can grant to admins once we validate that everything is working. I agree it could be frustrating

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g. the Admin user

@jjoyce0510 jjoyce0510 merged commit 2fee322 into main Jan 22, 2025
3 checks passed
@jjoyce0510 jjoyce0510 deleted the jj--add-datahub-events-source branch January 22, 2025 16:57
@@ -127,6 +127,7 @@ def get_long_description():
"jsonpickle",
"build",
"twine",
"tenacity",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like this was added to dev deps, not framework_common

hsheth2 added a commit that referenced this pull request Jan 22, 2025
Fixing a regression from #142
jjoyce0510 pushed a commit that referenced this pull request Jan 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants