-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(#4884) Add support for ratio metrics #5027
Conversation
Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dropped some notes inline in the PR. Excited to discuss!
core/dbt/contracts/graph/manifest.py
Outdated
# Skip if the metric is disabled! | ||
return metric | ||
|
||
# TODO: Figure this part out... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we allow other yaml-only resources (like sources) to be disabled? If so, we should implement this part
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sources can be disabled, exposures cannot (but we should support this). In the past, it's only been possible to disable a source via the sources:
block in dbt_project.yml
.
In v1.1, we'll be adding the ability to disable a source inline with its definition (#5008):
sources:
- name: my_source
config:
enabled: False
self.package_name = package_name | ||
|
||
|
||
class ResolvedMetricReference(MetricReference): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was my take on a class analogous to the Relation
object. I figured that this kind of object could be returned from the metric()
function. It would help us avoid the need to find the metric in the graph
manually like we do here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks reasonable to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets remove .metric_value
based on conversation here.
core/dbt/contracts/graph/metrics.py
Outdated
return f'{namespace}.{self.metric_name}' | ||
|
||
def __str__(self): | ||
return self.node.name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this return the metric name? Or the sql
expression? Or something else? Not 100% sure yet....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When would the ResolvedMetricReference get stringified?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would happen when you interpolate the results of the metric()
function, so:
select
{{ metric('revenue') }}
from {{ metric('revenue').model }}
would evaluate to
select
revenue
from analytics.fct_revenue
So - returning the name of the metric is kind of useless, but I am unsure if we want to return anything more sophisticated than that? I guess we could return self.node.sql
instead (which feels reasonable), but this doesn't extend well to the ratio
metric type which does not have a .sql
value
So, uh, short answer: you probably wouldn't stringify it in practice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we shouldn't even have a str method then? With the idea that it's better to fail noisily than fail silently.
core/dbt/contracts/graph/parsed.py
Outdated
created_at: float = field(default_factory=lambda: time.time()) | ||
|
||
def resolve_metric_references(self, context): | ||
# TODO: Obviously don't do this... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is where I implemented "patching" for the metric node. Big idea is that we have code like:
- name: arpc
label: Average Revenue per Customer
type: ratio
ratio_terms:
numerator: metric('revenue')
denominator: metric('customers')
When we parse the arpc
metric, we don't necessarily know anything about revenue
or customers
yet. I think this is sort of akin to how exposures work, though I wasn't able to find the way that we "patch" exposures when I was looking through the code. Wondering if this is different because metrics can depend on other metrics, whereas exposures can only depend on models/sources/snapshots, so we can't bypass this problem with parsing order.
Feedback/guidance very welcomed here!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, exposures don't refer to themselves, so exposures refs, sources, etc get resolved in the various parser.manifest 'process_XXX' methods. I wouldn't call what the metrics need a "patch" step, because when we did that we saved a bunch of patch dictionaries (with config, etc) until the end of parsing and applied them then. So far metrics just need a resolution phase, as you've implemented.
core/dbt/main.py
Outdated
@@ -142,6 +142,8 @@ def main(args=None): | |||
exit_code = e.code | |||
|
|||
except BaseException as e: | |||
import traceback |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just here for debugging - will remove before PRing this
@@ -66,6 +66,9 @@ def should_render_keypath(self, keypath: Keypath) -> bool: | |||
return False | |||
elif self._is_norender_key(keypath[0:]): | |||
return False | |||
elif self.key == "metrics": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually don't think this part is necessary... will remove. This came from an initial idea of implementing ratios via the sql
config, eg:
sql: "{{metric('revenue') }} / {{ metric('customers') }}"
Decided against doing it that way (instead: specify numerator and denominator separately) in order to make metric queries leveraging these ratio metrics more tractable to implement.
core/dbt/parser/schemas.py
Outdated
@@ -1042,8 +1046,42 @@ def parse_metric(self, unparsed: UnparsedMetric) -> ParsedMetric: | |||
self.schema_parser.manifest, | |||
package_name, | |||
) | |||
model_ref = "{{ " + unparsed.model + " }}" | |||
get_rendered(model_ref, ctx, parsed, capture_macros=True) | |||
if unparsed.type == 'ratio': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't love the idea of teaching the metric parser about metric subtypes. For now, the only special metric is the "ratio" type, though I suppose there could be other special types in the future. Happy to discuss if anyone feels strongly about us doing this part differently
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heard. If we want the validation to happen in Python code (and it should), I think we'll need to teach some of it to dbt-core
(or a dbt-core
dependency). I don't mind ratio
being a special type for now.
Is there a more general solution that looks like "super"/"sub" metric types? Like super/subclasses for Python objects. Or maybe we don't want to build the semantic layer as OOP :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels wrong to me to have this code be in the parser. I'd be more comfortable if we could move some of it into the UnparsedMetric or ParsedMetric or a subclass based on type... Having to render it does complicate things though. Is it actually necessary to render the numerator and denominator at parse time? Can they actually contain macros? Because if they can't contain macros, I would think the get_rendered is a no-op.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gshank I think we do need to render the numerator and denominator at parse-time because we need to capture the metric()
call in order to build the edges between the two metrics. In this way, metric()
is kind of like ref()
, but for metrics :)
I'd be ok with moving this logic into the ParsedMetric (or similar)... is it ok if we call get_rendered
in there? I don't want to explode the number of places that we render jinja code, but i definitely hear what you're saying
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's only the 'metrics' call that we need to capture, this feels like something that we might want to do with static parsing eventually. Wouldn't need to do that in the first pass though.
I think we're already calling get_rendered in ParsedMetric in "resolve_metric_references". And it does seem out of place there too. Hmm... I'm not exactly sure what the best way of factoring this is. Separation of concerns would be nice, but the concerns seem kind of inherently tangled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a good solid start :)
We use pre-commit to do black, flake8 and mypy, and there are a number of mypy complaints about this. Sometimes in order to make mypy happy you have to move things around a bit, so you might want to pip install pre-commit, and do 'pre-commit run --all-files'.
For partial parsing, will we need to re-parse anything with a changed metric reference? I think we'll have to handle metric references in a handful of places.
core/dbt/contracts/graph/metrics.py
Outdated
return f'{namespace}.{self.metric_name}' | ||
|
||
def __str__(self): | ||
return self.node.name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When would the ResolvedMetricReference get stringified?
core/dbt/contracts/graph/parsed.py
Outdated
created_at: float = field(default_factory=lambda: time.time()) | ||
|
||
def resolve_metric_references(self, context): | ||
# TODO: Obviously don't do this... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, exposures don't refer to themselves, so exposures refs, sources, etc get resolved in the various parser.manifest 'process_XXX' methods. I wouldn't call what the metrics need a "patch" step, because when we did that we saved a bunch of patch dictionaries (with config, etc) until the end of parsing and applied them then. So far metrics just need a resolution phase, as you've implemented.
core/dbt/parser/schemas.py
Outdated
@@ -1042,8 +1046,42 @@ def parse_metric(self, unparsed: UnparsedMetric) -> ParsedMetric: | |||
self.schema_parser.manifest, | |||
package_name, | |||
) | |||
model_ref = "{{ " + unparsed.model + " }}" | |||
get_rendered(model_ref, ctx, parsed, capture_macros=True) | |||
if unparsed.type == 'ratio': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels wrong to me to have this code be in the parser. I'd be more comfortable if we could move some of it into the UnparsedMetric or ParsedMetric or a subclass based on type... Having to render it does complicate things though. Is it actually necessary to render the numerator and denominator at parse time? Can they actually contain macros? Because if they can't contain macros, I would think the get_rendered is a no-op.
self.package_name = package_name | ||
|
||
|
||
class ResolvedMetricReference(MetricReference): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks reasonable to me.
core/dbt/contracts/graph/manifest.py
Outdated
self.metrics[metric.unique_id] = metric | ||
source_file.metrics.append(metric.unique_id) | ||
if not metric.config.enabled: | ||
self.add_disabled_nofile(metric) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this piece is actually working for a metric with
metrics:
- name: some_metric
config:
enabled: False
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll open a separate issue for supporting config.enabled
on metrics (and exposures!) In the meantime, I think we should probably just remove the enabled
-related code from this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
core/dbt/contracts/graph/metrics.py
Outdated
def __str__(self): | ||
# TODO : Qualify the metric name as a CTE or field? | ||
return f"{self.metric_name}.metric_value" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed with @callum-mcdata live: let's just make this return f"{self.metric_name}"
instead, since otherwise we're just replacing this with ""
in Jinja here
# TODO: Expressions _cannot_ have `model` properties | ||
if data.get("model") is None and data.get("type") != "expression": | ||
raise ValidationError("Non-expression metrics require a 'model' property") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@callum-mcdata To add validation for the opposite case (this TODO)
core/dbt/contracts/graph/parsed.py
Outdated
resource_type: NodeType = NodeType.Metric | ||
meta: Dict[str, Any] = field(default_factory=dict) | ||
tags: List[str] = field(default_factory=list) | ||
sources: List[List[str]] = field(default_factory=list) | ||
depends_on: DependsOn = field(default_factory=DependsOn) | ||
refs: List[List[str]] = field(default_factory=list) | ||
metrics: List[List[str]] = field(default_factory=list) | ||
config: SourceConfig = field(default_factory=SourceConfig) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should remove this (and related code) for now, and split off support for disabling metrics into a separate unit of work
…labs/dbt-core into feature/metric-improvements
* Bumping manifest version to v6 * Adding manifest file for tests * Reverting unneeded changes * Updating v6 * Updating test to add metrics field * Adding changelog
…ure/metric-improvements
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really good!
I think there might be a minor issue related to partial parsing. We now have metric references in models. If a model and metric have been parsed and the metric is deleted or renamed, I'm pretty sure we aren't scheduling the model for re-parsing, so the error would show up at execution time rather than parse time. I don't consider this a blocker -- we can fix it later.
* Update test_previous_version_state for v6. Cleanup * Regenerate, rm breakpoint * Code checks * Add assertion that will fail when we bump manifest version * update tests to automatically tests all previous versions Co-authored-by: Emily Rockman <[email protected]>
* wip * More support for ratio metrics * Formatting and linting * Fix unit tests * Support disabling metrics * mypy * address all TODOs * make pypy happy * wip * checkpoint * refactor, remove ratio_terms * flake8 and unit tests * remove debugger * quickfix for filters * Experiment with functional testing for 'expression' metrics * reformatting slightly * make file and mypy fix * remove config from metrics - wip * add metrics back to context * adding test changes * fixing test metrics * revert name audit * pre-commit fixes * add changelog * Bumping manifest version to v6 (dbt-labs#5430) * Bumping manifest version to v6 * Adding manifest file for tests * Reverting unneeded changes * Updating v6 * Updating test to add metrics field * Adding changelog * add v5 to backwards compatibility * Clean up test_previous_version_state, update for v6 (dbt-labs#5440) * Update test_previous_version_state for v6. Cleanup * Regenerate, rm breakpoint * Code checks * Add assertion that will fail when we bump manifest version * update tests to automatically tests all previous versions Co-authored-by: Emily Rockman <[email protected]> Co-authored-by: Jeremy Cohen <[email protected]> Co-authored-by: Callum McCann <[email protected]> Co-authored-by: Emily Rockman <[email protected]> Co-authored-by: leahwicz <[email protected]>
resolves #4884
Description
This PR adds support for ratio metrics in dbt. A ratio metric is defined in terms of an expression, eg:
This PR also adds a
metric
function that 1) resolves a specified metric and 2) creates an edge in the DAG between a metric node and another node.Deliverables:
metric
context method which returns a MetricReference object (similar to the RelationProxy)expression
metricmodel
node for a concrete (ie. non-expression) metricmetric()
function creates DAG edges between the referencing and referenced nodesdepends_on.nodes
list for the child nodemetrics
list (akin torefs
listexpression
type. These are expression metrics which are defined as an expression over multiple other metricsexpression
metric type is mutually exclusive with themodel
config for metrics. A metric must either be derived from a model, or from one or more other metricsmetric()
function can be called from Snapshots, Models, Tests, Macros, and other MetricsExample of metrics depending on other metrics in the DAG viz (note: this requires a PR to dbt-docs):
Checklist