Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/historical schedules #171

Open
wants to merge 68 commits into
base: main
Choose a base branch
from

Conversation

fivetran-catfritz
Copy link
Contributor

@fivetran-catfritz fivetran-catfritz commented Sep 11, 2024

PR Overview

This PR will address the following Issue/Feature:

This PR will result in the following new package version:

Please provide the finalized CHANGELOG entry which details the relevant changes included in this PR:

  • v0.18.0 - Addition of schedule history logic, which is enabled by default.

PR Checklist

Basic Validation

Please acknowledge that you have successfully performed the following commands locally:

  • dbt run –full-refresh && dbt test
  • dbt run (if incremental models are present) && dbt test

Before marking this PR as "ready for review" the following have been applied:

  • The appropriate issue has been linked, tagged, and properly assigned
  • All necessary documentation and version upgrades have been applied
  • docs were regenerated (unless this PR does not include any code or yml updates)
  • BuildKite integration tests are passing
  • Detailed validation steps have been provided below

Detailed Validation

Please share any and all of your validation steps:

  • See internal ticket

If you had to summarize this PR in an emoji, which would it be?

💃

@fivetran-catfritz fivetran-catfritz self-assigned this Oct 5, 2024
Comment on lines +2 to +6
# - package: fivetran/zendesk_source
# version: [">=0.12.0", "<0.13.0"]
- git: https://github.com/fivetran/dbt_zendesk_source.git
revision: feature/historical-schedules
warn-unpinned: false
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# - package: fivetran/zendesk_source
# version: [">=0.12.0", "<0.13.0"]
- git: https://github.com/fivetran/dbt_zendesk_source.git
revision: feature/historical-schedules
warn-unpinned: false
- package: fivetran/zendesk_source
version: [">=0.13.0", "<0.14.0"]

@fivetran-catfritz fivetran-catfritz marked this pull request as ready for review October 7, 2024 23:34
# dbt run-operation fivetran_utils.drop_schemas_automation --target "$db"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# dbt run-operation fivetran_utils.drop_schemas_automation --target "$db"
dbt run-operation fivetran_utils.drop_schemas_automation --target "$db"

@@ -5,6 +5,7 @@ dbt_versions: ">=1.3.0 <2.0.0"

table_variables:
using_schedules:
- audit_log
Copy link
Contributor Author

@fivetran-catfritz fivetran-catfritz Oct 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I need to include this since you can use schedules without the audit log? Same for schedule_holiday now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm good question. We may need to adjust these quickstart variables to match the tables more accurately. We can discuss after my initial review.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based off our discussion let's move forward with removing audit_log and schedule_history from the schedule variable and have them be their own. We can then rely on the combo of variables to control the enablement/disablement

Copy link
Contributor

@fivetran-joemarkiewicz fivetran-joemarkiewicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fivetran-catfritz great work on this so far! I have a few comments and then I believe the schedule spine needs to be investigated a bit more since it seems for our test Hong Kong schedule it isn't working out to be as expected. Let me know if you want to sync further on this.

Comment on lines +16 to +22
-- Clean up the change_description, sometimes has random html stuff in it
replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(change_description,
'workweek changed from', ''),
'&quot;', '"'),
'amp;', ''),
'=&gt;', ':'), ':mon:', '"mon":'), ':tue:', '"tue":'), ':wed:', '"wed":'), ':thu:', '"thu":'), ':fri:', '"fri":'), ':sat:', '"sat":'), ':sun:', '"sun":')
as change_description_cleaned
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woah, this is a lot of replace statements haha

select
schedule_id,
schedule_id_index,
cast(valid_from as date) as valid_from,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just cast this as date one line 6 of this model so we don't need to individually cast each of these times in this cte?

row_number() over (
partition by schedule_id, cast(valid_from as date)
-- ordering to get the latest change when there are multiple on one day
order by valid_from desc, coalesce(valid_until, {{ dbt.current_timestamp_backcompat() }}) desc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a recent issue we found there are some issues around the dbt.current_timestamp_backcompat() macro due to it not always converting the timestamp to UTC timezone (for some warehouses it will cast the the timezone of the system).

Can we explore changing this to dbt.current_timestamp() since this seems to be a more up to date way to grab the current timestamp.

Comment on lines +93 to +109
-- now that the schedule changes are cleaned, we can split into the individual schedules periods
), split_days as (
{% set days_of_week = {'sun': 0, 'mon': 1, 'tue': 2, 'wed': 3, 'thu': 4, 'fri': 5, 'sat': 6} %}
{% for day, day_number in days_of_week.items() %}
select
schedule_id,
schedule_id_index,
valid_from,
valid_until,
schedule_change,
'{{ day }}' as day_of_week,
cast('{{ day_number }}' as {{ dbt.type_int() }}) as day_of_week_number,
{{ zendesk.regex_extract('schedule_change', day) }} as day_of_week_schedule
from consolidate_same_day_changes

{% if not loop.last %}union all{% endif %}
{% endfor %}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a really interesting approach. Would you be able to elaborate in the comment regarding what exactly is happening here?

My understanding here is you are creating the relevant records for each day of the week and then unioning them together to generate the entire schedule.

json_parse('[' || replace(replace(day_of_week_schedule, ', ', ','), ',', '},{') || ']') as json_schedule

from split_days
where day_of_week_schedule != '{}'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this filter do? Essentially what does '{}' mean in the underlying data? Can we add a small inline comment here to call this out so we remember this in the future.

Comment on lines +83 to +91
select
schedule_id,
group_id,
schedule_change,
max(schedule_id_index) as schedule_id_index,
min(valid_from) as valid_from,
max(valid_until) as valid_until
from find_actual_changes
{{ dbt_utils.group_by(3) }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Request to add a note in the DECISIONLOG regarding this opinion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also let's include in the DECISIONLOG the approach we are taking for the backfill of schedule history.

False as is_historical
from schedule

{# {% if var('using_schedule_histories', True) %} #}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intentionally commented out?

lower(time_zone) as time_zone,
schedule_name,
cast(null as date) as valid_from, -- created_at is when the schedule was first ever created, so we'll fill the real value later
cast({{ dbt.dateadd('year', 1, dbt.current_timestamp_backcompat()) }} as date) as valid_until,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for having this valid until be forward in 1 year?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants