-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor: Benefits Amplitude events #3468
Conversation
Warehouse report 📦 DAGLegend (in order of precedence)
|
4c73df3
to
7aa86fb
Compare
cca1169
to
7a794db
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the code walkthrough and explanations! These changes look good to me. 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you output the logs of dbt run
to ensure this works properly? See #3502 for an example of how this is done.
@evansiroky @vevetron I'm following these instructions: https://github.com/cal-itp/data-infra/blob/main/warehouse/README.md And I have to say, this is just a brutal developer experience...
Does everyone run this on a Mac? I've tried to update the
But I still get an error when running
Any idea how to get this working? |
Alternatively, if you all are already setup to run these DBT commands for verification, that would be really helpful. |
I think everyone who works with DBT right now either uses a local mac or jupyterhub to run and test changes. Linux should work as well, but I don't think anyone is using devcontainers. |
Thanks @vevetron. I got a hold of a Macbook and got as far as running (.venv) kegans-MBP:warehouse kegan$ poetry run dbt debug
20:17:52 Running with dbt=1.5.1
20:17:52 dbt version: 1.5.1
20:17:52 python version: 3.9.6
20:17:52 python path: /Users/kegan/git/data-infra/warehouse/.venv/bin/python
20:17:52 os info: macOS-14.2-arm64-arm-64bit
20:17:52 Using profiles.yml file at /Users/kegan/.dbt/profiles.yml
20:17:52 Using dbt_project.yml file at /Users/kegan/git/data-infra/warehouse/dbt_project.yml
20:17:52 Configuration:
20:17:52 Error importing adapter: No module named 'dbt.adapters.bigquery'
20:17:52 profiles.yml file [ERROR invalid]
20:17:52 dbt_project.yml file [OK found and valid]
20:17:52 Required dependencies:
20:17:52 - git [OK found]
20:17:52 1 check failed:
20:17:52 Profile loading failed for the following reason:
Runtime Error
Credentials in profile "calitp_warehouse", target "dev" invalid: Runtime Error
Could not find adapter type bigquery! My calitp_warehouse:
outputs:
dev:
dataproc_batch:
runtime_config:
container_image: gcr.io/cal-itp-data-infra/dbt-spark:2023.3.28
properties:
spark.dynamicAllocation.maxExecutors: '16'
spark.executor.cores: '4'
spark.executor.instances: '4'
spark.executor.memory: 4g
dataproc_region: us-west2
fixed_retries: 1
gcs_bucket: test-calitp-dbt-python-models
location: us-west2
maximum_bytes_billed: 2000000000000
method: oauth
priority: interactive
project: cal-itp-data-infra-staging
schema: kegan
submission_method: serverless
threads: 8
timeout_seconds: 3000
type: bigquery
target: dev And datasetId
----------------------------------------
airtable
amplitude
audit
calitp_py
charlie
charlie_dbt_test__audit
charlie_gtfs_schedule
charlie_gtfs_views_staging
charlie_intermediate
charlie_mart_ad_hoc
charlie_mart_agency_service
charlie_mart_feed_aggregator_checks
charlie_mart_gtfs
charlie_mart_gtfs_guidelines
charlie_mart_gtfs_quality
charlie_mart_ntd
charlie_mart_payments
charlie_mart_transit_database
charlie_payments
charlie_staging
charlie_views
christian
christian_mart_ad_hoc
christian_mart_audit
christian_mart_benefits
christian_mart_gtfs
christian_mart_gtfs_quality
christian_mart_gtfs_schedule_latest
christian_mart_ntd
christian_mart_payments
christian_mart_transit_database
christian_mart_transit_database_latest
christian_staging
ci_staging
eric
eric_mart_ad_hoc
eric_mart_audit
eric_mart_benefits
eric_mart_gtfs
eric_mart_gtfs_quality
eric_mart_gtfs_schedule_latest
eric_mart_ntd
eric_mart_payments
eric_mart_transit_database
eric_mart_transit_database_latest
eric_payments
eric_staging
eric_views
erika
erika_dbt_test__audit Will come back to this a little later and look into it more. |
Your profiles.yml looks exactly the same as mine. My debug statement is almost the same as well. Maybe retry |
Finally got it running! I am seeing the same error output that you showed: $ poetry run dbt run -s +fct_benefits_events
19:32:16 Running with dbt=1.5.1
19:32:16 [WARNING]: Configuration paths exist in your dbt_project.yml file which do not apply to any resources.
There are 1 unused configuration paths:
- models.calitp_warehouse.mart.ad_hoc
19:32:17 Found 420 models, 950 tests, 0 snapshots, 0 analyses, 852 macros, 0 operations, 12 seed files, 175 sources, 4 exposures, 0 metrics, 0 groups
19:32:17
19:32:20 Concurrency: 8 threads (target='dev')
19:32:20
19:32:20 1 of 2 START sql view model kegan_staging.stg_amplitude__benefits_events ....... [RUN]
19:32:21 1 of 2 OK created sql view model kegan_staging.stg_amplitude__benefits_events .. [CREATE VIEW (0 processed) in 1.26s]
19:32:21 2 of 2 START sql table model kegan_mart_benefits.fct_benefits_events ........... [RUN]
19:32:23 BigQuery adapter: https://console.cloud.google.com/bigquery?project=cal-itp-data-infra-staging&j=bq:us-west2:ee6d3a66-62ef-49c5-818c-709b8d75e98a&page=queryresults
19:32:23 2 of 2 ERROR creating sql table model kegan_mart_benefits.fct_benefits_events .. [ERROR in 2.17s]
19:32:23
19:32:23 Finished running 1 view model, 1 table model in 0 hours 0 minutes and 6.64 seconds (6.64s).
19:32:23
19:32:23 Completed with 1 error and 0 warnings:
19:32:23
19:32:23 Database Error in model fct_benefits_events (models/mart/benefits/fct_benefits_events.sql)
19:32:23 Unrecognized name: event_properties_claims_provider at [158:9]
19:32:23 compiled Code at target/run/calitp_warehouse/models/mart/benefits/fct_benefits_events.sql
19:32:23
19:32:23 Done. PASS=1 WARN=0 ERROR=1 SKIP=0 TOTAL=2 Will work on getting these corrected. |
@vevetron I updated the PR description with the results of running locally, which is now passing. |
deprecate old cols in mart definition
deprecate old cols in mart definition
deprecate old cols in mart definition
default to 'digital' for historical events
caused by a bug in the Docker build process
causing a DBT build error
83ff624
to
34aa4a0
Compare
Description
We recently completed a big refactor of the models in Benefits, see cal-itp/benefits#1666 for more background.
The last piece of this refactor is updating our new and historic analytics events. The following PRs update the logic for generating new events:
EnrollmentFlow
benefits#2379claims_provider
benefits#2401And this PR is for the warehouse side, to handle the new fields and adjust historical data already captured in GCS.
We don't want to merge this PR until all of the above PRs are merged and released to our
prod
environment.Closes cal-itp/benefits#2247
Closes cal-itp/benefits#2248
Closes cal-itp/benefits#2249
Closes cal-itp/benefits#2390
Type of change
How has this been tested?
Post-merge follow-ups
Document any actions that must be taken post-merge to deploy or otherwise implement the changes in this PR (for example, running a full refresh of some incremental model in dbt). If these actions will take more than a few hours after the merge or if they will be completed by someone other than the PR author, please create a dedicated follow-up issue and link it here to track resolution.
eligibility_verifier
, and update to the new values