Firebase data in BigQuery has incredible analytical potentential, but it's not easy to query. It's not easy because:
- Every day has a new table (tables named events_YYYYMMDD). This helps keep BigQuery costs down but adds complications in queries.
- Every event_name has nested key-value pairs like event_params.key and event_params.value meaning that every single query will require some complex UNNEST operation
- Same with user_properties key-value pairs, all highly nested
- Creates a native way to select time periods and magically selected the requred data partitioned tables below
- Adds sessionization (which is missing from Firebase data)
- Adds retention cohort analysis capabilities
- Creates a looker dimension for every combination of event_name / event_params.key and user_properties
- Creates a great starting point to start modeling the data further to get value out of it
Because every Firebase schema has some similarities but different event types and user properties, the block will generate a unique schema for your event structure. It uses a Python notebook to connect to Looker's API and query your database and output some lovely LookML.
- Start by creating a new project by cloning this public repo: https://docs.looker.com/data-modeling/getting-started/create-projects#cloning_a_public_git_repository
- Add your own Git repo, change the connection name and the name of the event table in events.view.lkml and the datagroup for sessions.view.lkml and push to production.
- Create yourself a new API3 key https://docs.looker.com/admin-options/settings/users#api3_keys
- Download the Firebase Python Notebook and open it with your favorite Notebook runner. I use Google Colab https://github.com/llooker/firebase_block_v3/blob/master/Firebase_Block_v3.ipynb
- Enter your base_url client_id and client_secret (this is your API3 key)
- Run the Python notebook
- Copy the results of the event_properties section into events_generated.view.lkml (replace the entire file)
- Copy the results of the user_properties section into user_properties_generated.view.lkml (replace the entire file)
- World domination.
We are querying the raw events log which might be expensive to query, we've added a SQL_ALWAYS_WHERE clause to the model and only query this year's data for the sessions PDT. Adjust accordingly
- If you have duplicate dimensions with the same name (firebase allows same event name , keys but different types)
Change the part of events “Event Properties” python code like this:
print("dimension: "+event_name+"."+key+ "_" + type[0]+" {")
(thanks @nishimix for the fix!)