-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/historical schedules #171
base: main
Are you sure you want to change the base?
Conversation
Explore/audit log spike
…chedules-jamie-redshift
…/github.com/fivetran/dbt_zendesk into feature/historical-schedules-jamie-redshift
…chedules-jamie-redshift
…mie-redshift Feature/historical schedules jamie redshift
# - package: fivetran/zendesk_source | ||
# version: [">=0.12.0", "<0.13.0"] | ||
- git: https://github.com/fivetran/dbt_zendesk_source.git | ||
revision: feature/historical-schedules | ||
warn-unpinned: false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# - package: fivetran/zendesk_source | |
# version: [">=0.12.0", "<0.13.0"] | |
- git: https://github.com/fivetran/dbt_zendesk_source.git | |
revision: feature/historical-schedules | |
warn-unpinned: false | |
- package: fivetran/zendesk_source | |
version: [">=0.13.0", "<0.14.0"] |
# dbt run-operation fivetran_utils.drop_schemas_automation --target "$db" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# dbt run-operation fivetran_utils.drop_schemas_automation --target "$db" | |
dbt run-operation fivetran_utils.drop_schemas_automation --target "$db" |
@@ -5,6 +5,7 @@ dbt_versions: ">=1.3.0 <2.0.0" | |||
|
|||
table_variables: | |||
using_schedules: | |||
- audit_log |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do I need to include this since you can use schedules without the audit log? Same for schedule_holiday now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm good question. We may need to adjust these quickstart variables to match the tables more accurately. We can discuss after my initial review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based off our discussion let's move forward with removing audit_log and schedule_history from the schedule variable and have them be their own. We can then rely on the combo of variables to control the enablement/disablement
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fivetran-catfritz great work on this so far! I have a few comments and then I believe the schedule spine needs to be investigated a bit more since it seems for our test Hong Kong schedule it isn't working out to be as expected. Let me know if you want to sync further on this.
-- Clean up the change_description, sometimes has random html stuff in it | ||
replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(change_description, | ||
'workweek changed from', ''), | ||
'"', '"'), | ||
'amp;', ''), | ||
'=>', ':'), ':mon:', '"mon":'), ':tue:', '"tue":'), ':wed:', '"wed":'), ':thu:', '"thu":'), ':fri:', '"fri":'), ':sat:', '"sat":'), ':sun:', '"sun":') | ||
as change_description_cleaned |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Woah, this is a lot of replace statements haha
select | ||
schedule_id, | ||
schedule_id_index, | ||
cast(valid_from as date) as valid_from, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we just cast this as date one line 6 of this model so we don't need to individually cast each of these times in this cte?
row_number() over ( | ||
partition by schedule_id, cast(valid_from as date) | ||
-- ordering to get the latest change when there are multiple on one day | ||
order by valid_from desc, coalesce(valid_until, {{ dbt.current_timestamp_backcompat() }}) desc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a recent issue we found there are some issues around the dbt.current_timestamp_backcompat()
macro due to it not always converting the timestamp to UTC timezone (for some warehouses it will cast the the timezone of the system).
Can we explore changing this to dbt.current_timestamp()
since this seems to be a more up to date way to grab the current timestamp.
-- now that the schedule changes are cleaned, we can split into the individual schedules periods | ||
), split_days as ( | ||
{% set days_of_week = {'sun': 0, 'mon': 1, 'tue': 2, 'wed': 3, 'thu': 4, 'fri': 5, 'sat': 6} %} | ||
{% for day, day_number in days_of_week.items() %} | ||
select | ||
schedule_id, | ||
schedule_id_index, | ||
valid_from, | ||
valid_until, | ||
schedule_change, | ||
'{{ day }}' as day_of_week, | ||
cast('{{ day_number }}' as {{ dbt.type_int() }}) as day_of_week_number, | ||
{{ zendesk.regex_extract('schedule_change', day) }} as day_of_week_schedule | ||
from consolidate_same_day_changes | ||
|
||
{% if not loop.last %}union all{% endif %} | ||
{% endfor %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a really interesting approach. Would you be able to elaborate in the comment regarding what exactly is happening here?
My understanding here is you are creating the relevant records for each day of the week and then unioning them together to generate the entire schedule.
json_parse('[' || replace(replace(day_of_week_schedule, ', ', ','), ',', '},{') || ']') as json_schedule | ||
|
||
from split_days | ||
where day_of_week_schedule != '{}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this filter do? Essentially what does '{}'
mean in the underlying data? Can we add a small inline comment here to call this out so we remember this in the future.
select | ||
schedule_id, | ||
group_id, | ||
schedule_change, | ||
max(schedule_id_index) as schedule_id_index, | ||
min(valid_from) as valid_from, | ||
max(valid_until) as valid_until | ||
from find_actual_changes | ||
{{ dbt_utils.group_by(3) }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Request to add a note in the DECISIONLOG regarding this opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also let's include in the DECISIONLOG the approach we are taking for the backfill of schedule history.
False as is_historical | ||
from schedule | ||
|
||
{# {% if var('using_schedule_histories', True) %} #} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this intentionally commented out?
lower(time_zone) as time_zone, | ||
schedule_name, | ||
cast(null as date) as valid_from, -- created_at is when the schedule was first ever created, so we'll fill the real value later | ||
cast({{ dbt.dateadd('year', 1, dbt.current_timestamp_backcompat()) }} as date) as valid_until, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason for having this valid until be forward in 1 year?
PR Overview
This PR will address the following Issue/Feature:
This PR will result in the following new package version:
Please provide the finalized CHANGELOG entry which details the relevant changes included in this PR:
PR Checklist
Basic Validation
Please acknowledge that you have successfully performed the following commands locally:
Before marking this PR as "ready for review" the following have been applied:
Detailed Validation
Please share any and all of your validation steps:
If you had to summarize this PR in an emoji, which would it be?
💃