You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jan 21, 2023. It is now read-only.
I believe there is a problem with how you derive attribution. Currently I see in your session_tags code you extract distinct medium, source, campaign, & page_referrer from last non-direct values, but there is no testing for either a session level medium of 'cpc' or 'gclid' in the page_location indicating google/cpc which should override the event_parameter source/medium. It should be a medium of 'cpc', not 'organic' as it will show at the event_parameter level.
I'm running a few tests now to see if logic like the below helps, but perhaps you have a better solution (or perhaps I'm seeing weird GA4 data):
session_tags as (
select distinct sl.sl_key
, IF ((REGEXP_CONTAINS((select value.string_value from unnest(event_params) where key = 'page_location'), 'gclid|gclsrc') OR (sl.traffic_source.medium = 'cpc')),
'cpc', first_value((select value.string_value from unnest(event_params) where key = 'medium')) over (partition by sl.sl_key order by sl.event_timestamp desc)) AS medium
, IF (REGEXP_CONTAINS((select value.string_value from unnest(event_params) where key = 'page_location'), 'gclid|gclsrc'),
'google', first_value((select value.string_value from unnest(event_params) where key = 'source')) over (partition by sl.sl_key order by sl.event_timestamp desc)) AS source
, IF (REGEXP_CONTAINS((select value.string_value from unnest(event_params) where key = 'page_location'), 'gclid|gclsrc'),
traffic_source.name, first_value((select value.string_value from unnest(event_params) where key = 'campaign')) over (partition by sl.sl_key order by sl.event_timestamp desc)) AS campaign
, first_value((select value.string_value from unnest(event_params) where key = 'page_referrer')) over (partition by user_pseudo_id order by sl.event_timestamp desc) AS page_referrer
from session_list_with_event_history sl
where sl.event_name in ('page_view')
and (select value.string_value from unnest(sl.event_params) where key = 'medium') is not null -- NULL medium is direct, filtering out nulls to ensure last non-direct.
),
Please let me know if I'm off base with my observations.
Paul
The text was updated successfully, but these errors were encountered:
This is an interesting one. To the best of my understanding, the values populated in the traffic source fields are not session-specific, but user-specific. From the export schema docs (https://support.google.com/firebase/answer/7029846?hl=en):
"Name of the traffic source that first acquired the user."
In practice this seems to be the case, as traffic source values persist unchanged from one session to another across a user's journey. Due to this, I never utilized much from the traffic source elements, and instead preferred to isolate attribution data from the events. For your override scenario, we could grab the attribution from the first event in a session and if it is not gclid/cpc continue to look for last-non-direct-click. Would that resolve the unexpected output?
As a side note, do know offhand where the documentation to support this override is located? I want to point to it if we end up making this change.
I agree traffic_source data are user level dimensions, however for our company site, when I run an extract for a 'first_visit' and traffic_source.medium='cpc', I see the event_param level data never reflecting 'cpc'. It will only show event_param medium = 'organic' & event_param source = 'google' on page_views. This doesn't seem correct -- perhaps it's a ga4 bug?
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Hi David,
I believe there is a problem with how you derive attribution. Currently I see in your session_tags code you extract distinct medium, source, campaign, & page_referrer from last non-direct values, but there is no testing for either a session level medium of 'cpc' or 'gclid' in the page_location indicating google/cpc which should override the event_parameter source/medium. It should be a medium of 'cpc', not 'organic' as it will show at the event_parameter level.
I'm running a few tests now to see if logic like the below helps, but perhaps you have a better solution (or perhaps I'm seeing weird GA4 data):
Please let me know if I'm off base with my observations.
Paul
The text was updated successfully, but these errors were encountered: