Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Nested record field is not quoted and causes build error when contract enforced enabled. #1278

Open
2 tasks done
philBatardiere opened this issue Jul 17, 2024 · 1 comment
Labels
bug Something isn't working nested Related to BQ's nested and repeated columns (STRUCT / RECORD)

Comments

@philBatardiere
Copy link

Is this a new bug in dbt-bigquery?

  • I believe this is a new bug in dbt-bigquery
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

Hello,

I have the following model that use two reserved keywords groups and full as example:

{{
    config(
        materialized='table'
    )
}}

WITH data AS (
  SELECT 1 AS id, [STRUCT(11 AS id, 'Peter' AS name, ['a', 'b', 'c'] AS `full`)] AS `groups`
  UNION ALL
  SELECT 2 AS id, [STRUCT(22 AS id, 'Bob' AS name, ['a', 'b', 'c'] AS `full`)] AS `groups`
)

SELECT 
  id,
  `groups`
FROM data

and the following model schema:

version: 2

models:
  - name: model_bug
    config:
      contract:
        enforced: true
    columns:
      - name: id
        data_type: INTEGER
        constraints:
          - type: not_null
        description: "ID Description."
      - name: groups
        quote: true
        data_type: ARRAY
        description: "Groups description."
      - name: groups.id
        quote: true
        data_type: INTEGER
        constraints:
          - type: not_null
        description: "Group ID description."
      - name: groups.name
        quote: true
        data_type: STRING
        constraints:
          - type: not_null
        description: "Group name description."
      - name: groups.full
        quote: true
        data_type: ARRAY<STRING>
        description: "Group full description."

The model cannot be build because the nested column full is not quoted during contract validation step. However, the groups column is correctly quoted. Here is the generated query that causes the issue:

select * from (
        select
    
      
    cast(null as INT64)
 as id, 
      
    cast(null as array<struct<id INT64, name STRING, full ARRAY<STRING>>>)
 as `groups`
    ) as __dbt_sbq
    where false and current_timestamp() = current_timestamp()
    limit 0

Expected Behavior

The dbt-bigquery adapter should quote any reference of the full column during table schema validation and next steps.

Steps To Reproduce

  1. Open your DBT project

  2. Create the following model_bug.sql:

{{
    config(
        materialized='table'
    )
}}

WITH data AS (
  SELECT 1 AS id, [STRUCT(11 AS id, 'Peter' AS name, ['a', 'b', 'c'] AS `full`)] AS `groups`
  UNION ALL
  SELECT 2 AS id, [STRUCT(22 AS id, 'Bob' AS name, ['a', 'b', 'c'] AS `full`)] AS `groups`
)

SELECT 
  id,
  `groups`
FROM data
  1. Create the following schema.yml:
version: 2

models:
  - name: model_bug
    config:
      contract:
        enforced: true
    columns:
      - name: id
        data_type: INTEGER
        constraints:
          - type: not_null
        description: "ID Description."
      - name: groups
        quote: true
        data_type: ARRAY
        description: "Groups description."
      - name: groups.id
        quote: true
        data_type: INTEGER
        constraints:
          - type: not_null
        description: "Group ID description."
      - name: groups.name
        quote: true
        data_type: STRING
        constraints:
          - type: not_null
        description: "Group name description."
      - name: groups.full
        quote: true
        data_type: ARRAY<STRING>
        description: "Group full description."
  1. Run dbt build

Relevant log output

No response

Environment

- dbt-cloud: 2024.7.209

Additional Context

No response

@philBatardiere philBatardiere added bug Something isn't working triage labels Jul 17, 2024
@philBatardiere
Copy link
Author

After reviewing the code, it seems the issue is related to the methods get_nested_column_data_types and _update_nested_column_data_types, where the method is called from the adapter bigquery__get_empty_schema_sql method.

Problem: The current _update_nested_column_data_types method may not handle nested records where quoting is necessary (as indicated by the YAML configuration) to correctly represent data types.

Solution: Introducing a quote boolean parameter allows explicit control over whether values within the nested record should be quoted. This aligns the method's behavior with the YAML configuration.

@amychen1776 amychen1776 added nested Related to BQ's nested and repeated columns (STRUCT / RECORD) and removed triage labels Aug 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working nested Related to BQ's nested and repeated columns (STRUCT / RECORD)
Projects
None yet
Development

No branches or pull requests

2 participants