You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{
config(
materialized='table'
}}
SELECT "foo" as foo, "bar" as bar
Running again both models soon (< 1h for standard BQ edition, < 24h for enterprise edition) would fail on the model B execution saying that the insertion failed with: Inserted row has wrong column count; Has 3, expected 2 at ...
Expected Behavior
You would expect the model B to be successful.
Now my gut feeling is that: we have this column scanning helper request that is requesting model B with a WHERE false LIMIT 0 which works great... but is apparently cached by BigQuery.
And usually BigQuery manages to figure out it should bust the cache... but practically, it isn't apparently.
BigQuery has an open issue for the problem: https://issuetracker.google.com/issues/343009002
So the fix would be to enforce NOT using the caching on that request which is pretty much free anyway.
I guess it should be reported to Google BigQuery team as well as it's a cache busting issue but I'm not sure how easy or hard it is to fix.
For company with BQ Enterprise editions, it's actually even more frustrating as the cache is 24h long.
So far, the workaround is either:
Not use a SELECT * so that downstream models are updated and the request wouldn't be cached
Rerun the query with WHERE false LIMIT 0 manually without the cache
It is at least affecting the time_ingestion_partitioning but maybe it could affect other modes for when we perform schema comparison for on_schema_change needs.
Steps To Reproduce
See "Current behavior"
It's also doable in pure SQL:
CREATE OR REPLACE TABLE `my_project.my_dataset.model_a`
AS select "foo" as foo
✅
CREATE OR REPLACE TABLE `my_project.my_dataset.model_b`
AS select * from `my_project.my_dataset.model_a`
✅
select * from `my_project.my_dataset.model_a` WHERE FALSE LIMIT 0
✅
CREATE OR REPLACE TABLE `my_project.my_dataset.model_a`
AS select "foo" as foo, "bar" as bar
✅
select * from `my_project.my_dataset.model_a` WHERE FALSE LIMIT 0
🔴 the result of the query is cached with the previous schema
@github-christophe-oudar have you reported this to the BQ team/opened up a support ticket with GCP? I'm not opposed to trying to resolve this on the adapter but it seems like the source is BQ and ideally it is resolved there.
@github-christophe-oudar yes - I also recommend opening up an actual ticket directly with your account team to push on this. Tickets are how we can push with our GCP contacts. Since this is directly related to a GCP issue, I'm going to close this ticket for now (appreciate you opening up this issue to flag it to us! We are also going to share it with our GCP contacts too)
Is this a new bug in dbt-bigquery?
Current Behavior
Let's consider a model A:
and a model B:
Then let's update
Running again both models soon (< 1h for standard BQ edition, < 24h for enterprise edition) would fail on the model B execution saying that the insertion failed with:
Inserted row has wrong column count; Has 3, expected 2 at ...
Expected Behavior
You would expect the model B to be successful.
Now my gut feeling is that: we have this column scanning helper request that is requesting model B with a
WHERE false LIMIT 0
which works great... but is apparently cached by BigQuery.And usually BigQuery manages to figure out it should bust the cache... but practically, it isn't apparently.
BigQuery has an open issue for the problem: https://issuetracker.google.com/issues/343009002
So the fix would be to enforce NOT using the caching on that request which is pretty much free anyway.
I guess it should be reported to Google BigQuery team as well as it's a cache busting issue but I'm not sure how easy or hard it is to fix.
For company with BQ Enterprise editions, it's actually even more frustrating as the cache is 24h long.
So far, the workaround is either:
SELECT *
so that downstream models are updated and the request wouldn't be cachedWHERE false LIMIT 0
manually without the cacheIt is at least affecting the
time_ingestion_partitioning
but maybe it could affect other modes for when we perform schema comparison foron_schema_change
needs.Steps To Reproduce
See "Current behavior"
It's also doable in pure SQL:
✅
✅
✅
✅
🔴 the result of the query is cached with the previous schema
Relevant log output
No response
Environment
Additional Context
No response
The text was updated successfully, but these errors were encountered: