Skip to content

Commit

Permalink
feat(docs) assertion execution behavior (datahub-project#12484)
Browse files Browse the repository at this point in the history
  • Loading branch information
jayacryl authored and eagle-25 committed Feb 5, 2025
1 parent 88f433f commit 2d0b912
Showing 1 changed file with 18 additions and 0 deletions.
18 changes: 18 additions & 0 deletions docs/managed-datahub/observe/assertions.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,3 +46,21 @@ With Acryl Observe, you can get the Assertion Change event by getting API events
## Cost

We provide a plethora of ways to run your assertions, aiming to allow you to use the cheapest possible means to do so and/or the most accurate means to do so, depending on your use case. For example, for Freshness (SLA) assertions, it is relatively cheap to use either their Audit Log or Information Schema as a means to run freshness checks, and we support both of those as well as Last Modified Column, High Watermark Column, and DataHub Operation ([see the docs for more details](/docs/managed-datahub/observe/freshness-assertions.md#3-change-source)).

## Execution details - Where and How

There are a few ways DataHub Cloud assertions can be executed:
1. Directly query the source system:
a. `Information Schema` tables are used by default to power cheap, fast checks on a table's freshness or row count.
b. `Audit log` or `Operation log` tables can be used to granularly monitor table operations.
c. The table itself can also be queried directly. This is useful for freshness checks referencing `last_updated` columns, row count checks targetting a subset of the data, and column value checks. We offer several optimizations to reduce query costs for these checks.
2. Reference DataHub profiling information
a. `Operation`s that are reported via ingestion or our SDKs can power monitoring table freshness.
b. `DatasetProfile` and `SchemaFieldProfile` ingested or reported via SDKs can power monitoring table metrics and column metrics.

### Privacy: Execute In-Network, avoid exposing data externally
As a part of DataHub Cloud, we offer a [Remote Executor](/docs/managed-datahub/operator-guide/setting-up-remote-ingestion-executor.md) deployment model. If this model is used, assertions will execute within your network, and only the results will be sent back to DataHub Cloud. Neither your actual credentials, nor your source data will leave your network.

### Source system selection
Assertions will execute queries using the same source system that was used to initially ingest the table.
There are some scenarios where customers may have multiple ingestion sources for, i.e. a BigQuery table. In this case, by default the executor will take the ingestion source that was used to ingest the table's `DatasetProperties`. This behavior can be modified by your customer success rep.

0 comments on commit 2d0b912

Please sign in to comment.