Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(DE-830) feat(dbt): add diversity metric in dbt #3655

Merged
merged 3 commits into from
Dec 31, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Diversity Booking
description: Description of the `diversity booking` columns.
title: Diversity Booking
---

{% docs column_diversity__booking_entity_rank %} The rank of a booking entity for a user, determined by the order of booking creation. {% enddocs %}
{% docs column__diversity_booked_entity_type %} The type of entity booked, which can be one of several categories such as OFFER_CATEGORY, VENUE_TYPE, OFFER_SUBCATEGORY, VENUE, or EXTRA_CATEGORY. {% enddocs %}
{% docs column__diversity_booked_entity %} The specific entity booked, which can be an offer category ID, venue type label, offer subcategory ID, venue ID, or extra category. {% enddocs %}
{% docs column__diversity_score %} A score assigned to a booking based on its rank and entity type, with a multiplier applied {% enddocs %}
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
---
title: Diversification of Cultural Practices
description: Description of the `int_metric__diversity_daily_booking` table.
---

{% docs description__int_metric__diversity_daily_booking %}

The `int_metric__diversity_daily_booking` table captures the diversification of cultural practices within the pass Culture application. Diversification is defined by booking an offer with different characteristics from those previously booked, indicating a cultural discovery by the user.

For each reservation, the analyzed characteristics include:

- **Diversity in category**: from book to cinema, from live performance to music.
- **Diversity in subcategory**: from comic book to detective novel, from drama to comedy.
- **Diversity in genre**: from science fiction to fantasy, from thriller to romance.
- **Diversity in place**: from an independent bookstore to a large network, from a cinema to a performance hall.
- **Diversity in type of place**: from a museum to a library, from a theater to a concert hall.

{% enddocs %}

## Table description

{% docs table__int_metric__diversity_daily_booking %}{% enddocs %}
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
version: 2

models:
- name: int_metric__diversity_daily_booking
description: "{{ doc('description__int_metric__diversity_daily_booking') }}"
columns:
- name: booking_id
description: "{{ doc('column__booking_id') }}"
- name: booking_created_at
description: "{{ doc('column__booking_created_at') }}"
- name: booking_creation_date
description: "{{ doc('column__booking_creation_date') }}"
- name: user_id
description: "{{ doc('column__user_id') }}"
- name: booking_rank
description: "{{ doc('column__booking_rank') }}"
- name: diversity_booking_entity_rank
description: "{{ doc('column_diversity__booking_entity_rank') }}"
- name: diversity_booked_entity_type
description: "{{ doc('column__diversity_booked_entity_type') }}"
- name: diversity_booked_entity
description: "{{ doc('column__diversity_booked_entity') }}"
- name: diversity_score
description: "{{ doc('column__diversity_score') }}"
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
{{
config(
**custom_incremental_config(
incremental_strategy="insert_overwrite",
partition_by={"field": "booking_creation_date", "data_type": "date"},
)
)
}}

{% set entities = [
{
"entity": "offer_category_id",
"type": "OFFER_CATEGORY",
"score_multiplier": 25,
},
{
"entity": "venue_type_label",
"type": "VENUE_TYPE",
"score_multiplier": 20,
},
{
"entity": "offer_subcategory_id",
"type": "OFFER_SUBCATEGORY",
"score_multiplier": 10,
},
{"entity": "venue_id", "type": "VENUE", "score_multiplier": 5},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
{"entity": "venue_id", "type": "VENUE", "score_multiplier": 5},
{
"entity": "venue_id",
"type": "VENUE",
"score_multiplier": 5,
},

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Format par défaut du linter

{
"entity": "extra_category",
"type": "EXTRA_CATEGORY",
"score_multiplier": 5,
},
] %}

with
raw_data as (
select
booking_id,
booking_created_at,
booking_creation_date,
user_id,
offer_subcategory_id,
venue_type_label,
offer_category_id,
venue_id,
coalesce(offer_type_label, venue_id) as extra_category, -- TODO: venue_id is used as extra_category when offer_type_label is null
row_number() over (
partition by user_id order by booking_created_at
) as booking_rank
from {{ ref("int_global__booking") }}
where booking_status != 'CANCELLED'
),

entity_calculations as (
{% for entity in entities %}
select distinct
booking_id,
booking_created_at,
booking_creation_date,
booking_rank,
user_id,
{{ entity.entity }} as diversity_booked_entity,
'{{ entity.type }}' as diversity_booked_entity_type,
{{ entity.score_multiplier }} as score_multiplier,
row_number() over (
partition by user_id, {{ entity.entity }}
order by booking_created_at
) as diversity_booking_entity_rank
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

malin

from raw_data
{% if not loop.last %}
union all
{% endif %}
{% endfor %}
)

select
booking_rank,
booking_id,
booking_created_at,
diversity_booking_entity_rank,
diversity_booked_entity_type,
diversity_booked_entity,
user_id,
booking_creation_date,
case
when diversity_booking_entity_rank = 1 then score_multiplier else 0
end as diversity_score
from entity_calculations
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
select
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Configurer la table incrémentale, partitionnée sur booking_id ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Les réservations pouvant être annulées, il me semble un peu plus compliqué de mettre la table en incrémentale. Par contre, nous pouvons tout à fait partitionner en fonction de l'usage.

booking_id,
booking_creation_date,
booking_created_at,
user_id,
sum(diversity_score) as diversity_score
from {{ ref("int_metric__diversity_daily_booking") }}
group by booking_id, booking_creation_date, booking_created_at, user_id
Loading