Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Metrics Should Support All-Time Grain Metrics #97

Closed
3 tasks done
callum-mcdata opened this issue Aug 31, 2022 · 11 comments
Closed
3 tasks done

[Feature] Metrics Should Support All-Time Grain Metrics #97

callum-mcdata opened this issue Aug 31, 2022 · 11 comments
Assignees
Labels
enhancement New feature or request

Comments

@callum-mcdata
Copy link
Contributor

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion

Describe the feature

From the beginning of time the metrics issue , people have been asking about non-time based metrics. Some people believed that non-time based metrics were a requirement while others thought that we'd run into practical constraints for usage. George Fraser of Fivetran had this comment:

This isn’t just a technical issue: there are lots of business metrics which are time-series in principle but in practice we choose to calculate at just the present time, because it’s simpler.

He was right! We've heard from a lot of our customers that they're interested in getting single numbers returned for their metrics. The metrics themselves might be time-series in principle but the end consumer is only interested in the present value.

So lets add that functionality.

Implementation

There are two different options we have for adding this type of functionality.

  1. Alter the metric node to make timestamp & time_grains optional - if those attributes are not present then we return the single aggregate.
  2. Add all_time as an option for the grain input that aggregates across the entire date range.

I am almost certain that option 1 is NOT the route we want to go. Making these parameters optional has the potential to introduce too many issues downstream. For example:

  • Our partners rely on their being a grain input. They'd need to change their integrations to support that parameter becoming conditional
  • There are a host of validation issues that could arise if we no longer enforce timestamp and time_grains. Metrics that actually do require it wouldn't know that the metric is misconfigured until they are actually run.

So with that being said, I believe that we should add support for all_time (or some similarly named input) in time grains.

Open Questions

  • Should this be defined in the metric like all other time grains or should it be supported in all metrics?
    • I feel strongly that it should have to be defined.
  • Should it be called all_time? Is there some other name that better captures it?
  • What error message should we raise to show that secondary calculations wouldn't be supported with this functionality?
    • Should we try and figure out some way of supporting them?

Describe alternatives you've considered

Option 1 raised in above message. I think that is a capital B bad idea but I am willing to be convinced otherwise!

Who will this benefit?

Anyone who wants to return metric results across all time.

Are you interested in contributing this feature?

Yep!

Anything else?

No response

@callum-mcdata callum-mcdata added the enhancement New feature or request label Aug 31, 2022
@dave-connors-3
Copy link
Contributor

this is an interesting one! Do you have other examples from the community on the kinds of non-time aggregated metrics? the quote above says

there are lots of business metrics which are time-series in principle but in practice we choose to calculate at just the present time, because it’s simpler.

Wouldn’t this imply that the metric is indeed time aggregated, but the way it’s reported depends on a definition of current value or present time? To me, that sounds more like a calendar dimension filter applied to a metric definition than a removal of the time-boundedness of the metric definition itself.

I support the inclusion of the all_time option in any case, but it feels like the examples of single value in my head are like “revenue this quarter” which is more of a final filter view than anything.

@gwenwindflower
Copy link

agree with dave, i like 'all_time' as something you can explicitly specify vs the more ambiguous idea of making time grains optional -- mainly because a) most metrics are going to be time bucketed and b) as dave said, these non-time-bucketed metrics are actually still measured over time, this is just a convenience to say 'up to present'.

@callum-mcdata
Copy link
Contributor Author

Examples from partners & community members:

  • one integration partner wants to return the single metric value in a chrome extension for each metric that would be copied into your clipboard. More of a design example than specific metric type
  • Unique customers: one beta customer said they frequently look at the raw count of unique customers and don't particularly care to chart it over time.

There are a few others like this but tbh we've not dug as deep into this use case because it wasn't supported.

Moving Forward

I think we're all in agreement that the all_time option is better because it keeps to the current framework and doesn't throw everything else out. Additionally then customers can look at that time grain and switch to monthly if they want more context.

As for your point @dave-connors-3 around the definition of present_time, that is pretty fair. I think the implementation of that would be all_time as the grain and then some form of parameter passed into end_date - maybe current_date or something like that. all_time is just generic enough that it gives customers the flexibility to specify their time ranges.

@dave-connors-3
Copy link
Contributor

@callum-mcdata -- one additional idea to meet somewhere in the middle is to update the default calendar model to include is_current_{{ time_part }} columns so that for a given metric, you could apply that as a filter in the metric as you like

@callum-mcdata
Copy link
Contributor Author

@dave-connors-3 I'm not sure I follow - are you saying that is_current_{{ time_part }} would be used in the start/end_date inputs? Right now we don't have any way of aggregating agnostic of time grouping so that would still return date_{{grain, metric_name as the columns in the final dataset.

Or are you saying that adding this field to be used in the start/end_date inputs would be an additional way to enforce up_to_current with some form of all_time aggregation?

@dave-connors-3
Copy link
Contributor

I think I was thinking this could be passed in the where, but didn't consider that the date_{{ grain }} column would still be returned (and the query would be more resource intensive). Your suggestion is definitely more direct, as in the current implementation, start_date and end_date get passed directly to the aggregate macro, and we could skip the spining altogether.

What I'm thinking about on the user end is having to come back in to wherever this macro gets called and updating start and end dates for the next quarter, so some notion of an always correct current filter seems useful to me!

@drewbanin
Copy link
Contributor

+1 on adding an all_time grain that aggregates over... all time! I think the timestamp component of the metric should still be required. My big question is: what do we return as the date component for the period? Do we omit that field, or include the start/end timestamp value? Or null? or 2099-12-31 :p

@cafzal
Copy link

cafzal commented Aug 31, 2022

I believe all_time grain is the way to go!

Let's think of a scenario. A software company wants to track user activities. They often want to look at the metric on a weekly basis, but sometimes it's helpful to see the cumulative total. If we provide an all_time grain they can satisfy both use cases. Otherwise, they have to create a duplicative metric for temporal vs. cumulative calculation, which dilutes much of the value of dbt metrics as an efficient source of truth.

Returning the date range makes the most sense if possible so the user understands what all_time means for that metric.

@callum-mcdata
Copy link
Contributor Author

Sounds like we're all in alignment that all_time is the best path forward.

So the outstanding questions are:

  • Should the final dataset returned have a data column attached? Should it have start and end date?
    • I think I'm all for having it return a start/end timestamp for additional context. This is a different shape from what we usually return but I think makes sense

Here are the implementation details I think we've settled in this thread:

  • all_time needs to be defined on the metric like any other grain.
  • secondary_calculations won't be supported with all_time as they require date-spining that we won't perform.

As for the comment around current_time that you mentioned @dave-connors-3 , I wonder if this overlaps with #27 . That way users could provide date_trunc('day',current_timestamp()) as the end_date input.

@callum-mcdata callum-mcdata self-assigned this Sep 8, 2022
@callum-mcdata
Copy link
Contributor Author

Completed with #99 although an open issue for #102 will need to be addressed next week

@peleyal
Copy link

peleyal commented Nov 14, 2022

Hello all,
I tried to follow the conversation here, and in dbt-labs/dbt-core#4071 (comment) and #58, but I still have one tiny question...

Is it possible to get metrics for a table that doesn't have timestamp\date column (AFAIK timestamp is a required field)?
I just want to get the aggregations for everything in the table...

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants