-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(DE-830) feat(dbt): add diversity metric in dbt #3655
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
# Diversity Booking | ||
description: Description of the `diversity booking` columns. | ||
title: Diversity Booking | ||
--- | ||
|
||
{% docs column_diversity__booking_entity_rank %} The rank of a booking entity for a user, determined by the order of booking creation. {% enddocs %} | ||
{% docs column__diversity_booked_entity_type %} The type of entity booked, which can be one of several categories such as OFFER_CATEGORY, VENUE_TYPE, OFFER_SUBCATEGORY, VENUE, or EXTRA_CATEGORY. {% enddocs %} | ||
{% docs column__diversity_booked_entity %} The specific entity booked, which can be an offer category ID, venue type label, offer subcategory ID, venue ID, or extra category. {% enddocs %} | ||
{% docs column__diversity_score %} A score assigned to a booking based on its rank and entity type, with a multiplier applied {% enddocs %} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
--- | ||
title: Diversification of Cultural Practices | ||
description: Description of the `int_metric__diversity_daily_booking` table. | ||
--- | ||
|
||
{% docs description__int_metric__diversity_daily_booking %} | ||
|
||
The `int_metric__diversity_daily_booking` table captures the diversification of cultural practices within the pass Culture application. Diversification is defined by booking an offer with different characteristics from those previously booked, indicating a cultural discovery by the user. | ||
|
||
For each reservation, the analyzed characteristics include: | ||
|
||
- **Diversity in category**: from book to cinema, from live performance to music. | ||
- **Diversity in subcategory**: from comic book to detective novel, from drama to comedy. | ||
- **Diversity in genre**: from science fiction to fantasy, from thriller to romance. | ||
- **Diversity in place**: from an independent bookstore to a large network, from a cinema to a performance hall. | ||
- **Diversity in type of place**: from a museum to a library, from a theater to a concert hall. | ||
|
||
{% enddocs %} | ||
|
||
## Table description | ||
|
||
{% docs table__int_metric__diversity_daily_booking %}{% enddocs %} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
version: 2 | ||
|
||
models: | ||
- name: int_metric__diversity_daily_booking | ||
description: "{{ doc('description__int_metric__diversity_daily_booking') }}" | ||
columns: | ||
- name: booking_id | ||
description: "{{ doc('column__booking_id') }}" | ||
- name: booking_created_at | ||
description: "{{ doc('column__booking_created_at') }}" | ||
- name: booking_creation_date | ||
description: "{{ doc('column__booking_creation_date') }}" | ||
- name: user_id | ||
description: "{{ doc('column__user_id') }}" | ||
- name: booking_rank | ||
description: "{{ doc('column__booking_rank') }}" | ||
- name: diversity_booking_entity_rank | ||
description: "{{ doc('column_diversity__booking_entity_rank') }}" | ||
- name: diversity_booked_entity_type | ||
description: "{{ doc('column__diversity_booked_entity_type') }}" | ||
- name: diversity_booked_entity | ||
description: "{{ doc('column__diversity_booked_entity') }}" | ||
- name: diversity_score | ||
description: "{{ doc('column__diversity_score') }}" |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,87 @@ | ||
{{ | ||
config( | ||
**custom_incremental_config( | ||
incremental_strategy="insert_overwrite", | ||
partition_by={"field": "booking_creation_date", "data_type": "date"}, | ||
) | ||
) | ||
}} | ||
|
||
{% set entities = [ | ||
{ | ||
"entity": "offer_category_id", | ||
"type": "OFFER_CATEGORY", | ||
"score_multiplier": 25, | ||
}, | ||
{ | ||
"entity": "venue_type_label", | ||
"type": "VENUE_TYPE", | ||
"score_multiplier": 20, | ||
}, | ||
{ | ||
"entity": "offer_subcategory_id", | ||
"type": "OFFER_SUBCATEGORY", | ||
"score_multiplier": 10, | ||
}, | ||
{"entity": "venue_id", "type": "VENUE", "score_multiplier": 5}, | ||
{ | ||
"entity": "extra_category", | ||
"type": "EXTRA_CATEGORY", | ||
"score_multiplier": 5, | ||
}, | ||
] %} | ||
|
||
with | ||
raw_data as ( | ||
select | ||
booking_id, | ||
booking_created_at, | ||
booking_creation_date, | ||
user_id, | ||
offer_subcategory_id, | ||
venue_type_label, | ||
offer_category_id, | ||
venue_id, | ||
coalesce(offer_type_label, venue_id) as extra_category, -- TODO: venue_id is used as extra_category when offer_type_label is null | ||
row_number() over ( | ||
partition by user_id order by booking_created_at | ||
) as booking_rank | ||
from {{ ref("int_global__booking") }} | ||
where booking_status != 'CANCELLED' | ||
), | ||
|
||
entity_calculations as ( | ||
{% for entity in entities %} | ||
select distinct | ||
booking_id, | ||
booking_created_at, | ||
booking_creation_date, | ||
booking_rank, | ||
user_id, | ||
{{ entity.entity }} as diversity_booked_entity, | ||
'{{ entity.type }}' as diversity_booked_entity_type, | ||
{{ entity.score_multiplier }} as score_multiplier, | ||
row_number() over ( | ||
partition by user_id, {{ entity.entity }} | ||
order by booking_created_at | ||
) as diversity_booking_entity_rank | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. malin |
||
from raw_data | ||
{% if not loop.last %} | ||
union all | ||
{% endif %} | ||
{% endfor %} | ||
) | ||
|
||
select | ||
booking_rank, | ||
booking_id, | ||
booking_created_at, | ||
diversity_booking_entity_rank, | ||
diversity_booked_entity_type, | ||
diversity_booked_entity, | ||
user_id, | ||
booking_creation_date, | ||
case | ||
when diversity_booking_entity_rank = 1 then score_multiplier else 0 | ||
end as diversity_score | ||
from entity_calculations |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
select | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Configurer la table incrémentale, partitionnée sur booking_id ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Les réservations pouvant être annulées, il me semble un peu plus compliqué de mettre la table en incrémentale. Par contre, nous pouvons tout à fait partitionner en fonction de l'usage. |
||
booking_id, | ||
booking_creation_date, | ||
booking_created_at, | ||
user_id, | ||
sum(diversity_score) as diversity_score | ||
from {{ ref("int_metric__diversity_daily_booking") }} | ||
group by booking_id, booking_creation_date, booking_created_at, user_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Format par défaut du linter