Skip to content
This repository has been archived by the owner on Jan 21, 2023. It is now read-only.

Traffic Attribution Question #9

Open
plearatblast opened this issue Sep 27, 2022 · 2 comments
Open

Traffic Attribution Question #9

plearatblast opened this issue Sep 27, 2022 · 2 comments

Comments

@plearatblast
Copy link

plearatblast commented Sep 27, 2022

Hi David,

I believe there is a problem with how you derive attribution. Currently I see in your session_tags code you extract distinct medium, source, campaign, & page_referrer from last non-direct values, but there is no testing for either a session level medium of 'cpc' or 'gclid' in the page_location indicating google/cpc which should override the event_parameter source/medium. It should be a medium of 'cpc', not 'organic' as it will show at the event_parameter level.

I'm running a few tests now to see if logic like the below helps, but perhaps you have a better solution (or perhaps I'm seeing weird GA4 data):

session_tags as (
	select distinct sl.sl_key
        ,   IF ((REGEXP_CONTAINS((select value.string_value from unnest(event_params) where key = 'page_location'), 'gclid|gclsrc') OR (sl.traffic_source.medium = 'cpc')),
             'cpc', first_value((select value.string_value from unnest(event_params) where key = 'medium')) over (partition by sl.sl_key order by sl.event_timestamp desc)) AS medium
        ,   IF (REGEXP_CONTAINS((select value.string_value from unnest(event_params) where key = 'page_location'), 'gclid|gclsrc'), 
            'google', first_value((select value.string_value from unnest(event_params) where key = 'source')) over (partition by sl.sl_key order by sl.event_timestamp desc)) AS source
        ,   IF (REGEXP_CONTAINS((select value.string_value from unnest(event_params) where key = 'page_location'), 'gclid|gclsrc'), 
            traffic_source.name, first_value((select value.string_value from unnest(event_params) where key = 'campaign')) over (partition by sl.sl_key order by sl.event_timestamp desc)) AS campaign
        ,   first_value((select value.string_value from unnest(event_params) where key = 'page_referrer')) over (partition by user_pseudo_id order by sl.event_timestamp desc) AS page_referrer
  from session_list_with_event_history sl
  where sl.event_name in ('page_view')
    and (select value.string_value from unnest(sl.event_params) where key = 'medium') is not null -- NULL medium is direct, filtering out nulls to ensure last non-direct.
  ),

Please let me know if I'm off base with my observations.
Paul

@davidbrinegar
Copy link
Contributor

Hi Paul!

This is an interesting one. To the best of my understanding, the values populated in the traffic source fields are not session-specific, but user-specific. From the export schema docs (https://support.google.com/firebase/answer/7029846?hl=en):
"Name of the traffic source that first acquired the user."

In practice this seems to be the case, as traffic source values persist unchanged from one session to another across a user's journey. Due to this, I never utilized much from the traffic source elements, and instead preferred to isolate attribution data from the events. For your override scenario, we could grab the attribution from the first event in a session and if it is not gclid/cpc continue to look for last-non-direct-click. Would that resolve the unexpected output?

As a side note, do know offhand where the documentation to support this override is located? I want to point to it if we end up making this change.

Thanks again!

@plearatblast
Copy link
Author

I agree traffic_source data are user level dimensions, however for our company site, when I run an extract for a 'first_visit' and traffic_source.medium='cpc', I see the event_param level data never reflecting 'cpc'. It will only show event_param medium = 'organic' & event_param source = 'google' on page_views. This doesn't seem correct -- perhaps it's a ga4 bug?

ba-cpc

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants