-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add new user dim table #1376
base: main
Are you sure you want to change the base?
add new user dim table #1376
Conversation
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code runs fine, but I left a few comments.
Also, the residential users are not added to the new table, but it was mentioned that including xpro, bootcamps, edX.org, MITx Online, and Residential MITx.
in the field description . Can you take a look?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the user_
column prefix is redundant. The context of the table makes it clear that all of the data pertains to a user record.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed this from most of the columns now except of course something like "user_id" since that may be in the fact table and it's useful to have the prefix there
, user_is_active | ||
from mitxonline_user_view | ||
|
||
union all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using a union all
will generate a substantial number of duplicate records. The goal of this model is to create a single, canonical record of a user, not a cartesian product of users * platforms.
Given that requirement, we want to decide which platforms we trust to have the most accurate information and how to determine which information "wins". If possible, we also want this table to include information about when a user was "acquired" (or similar semantic construct).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@blarghmatey Do you think we could create a separate "master customer" dimension for this? Having a regular customer dimension might be useful I know when the teams see data that's different from what's displayed in the application(such as when we combine data) they sometimes view it as an error and this might prevent that. But if we have a master dim that would allow us options. The fact table might have platform id included and then it wouldn't add any additional records. If we do "master" customer info how would we go about combining it? We would also need to modify this table design to remove username from the table. I'd also be curious how often we have customers across different platforms.
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
What are the relevant tickets?
#1325
Description (What does it do?)
creates a user dimensional table (dim_user)
How can this be tested?
dbt build --select dim_user