Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] dbt-dry-run Ignoring Partitions from dbt-external-tables #53

Open
betosaulo-cw opened this issue Dec 29, 2023 · 2 comments
Open
Labels
bug Something isn't working

Comments

@betosaulo-cw
Copy link

I've encountered an issue where dbt-dry-run fails to recognize the partitions defined in a YAML configuration for a BigQuery table. Despite the correct specification of partitions under the dbt-external-tables setup, the dry run process results in a BadRequest error, indicating that the partition could not be found.

Steps to Reproduce

  1. Define a table in YAML with the following structure:
version: 2

sources:
  - name: dataset_name
    tables:
      - name: table_name
        external:
          location:  "gs://bucket/path/to/table/table_name/*.json"
          options:
            format: json
            hive_partition_uri_prefix: "gs://bucket/path/to/table/table_name"
            ignore_unknown_values: true
          partitions:
            - name: ts
              data_type: STRING
        columns: 
          - name: uuid
            data_type: STRING
          - name: name
            data_type: STRING
          ...

Observe the error:
400 POST https://bigquery.googleapis.com/bigquery/v2/projects/project_id/jobs?prettyPrint=false: Name ts not found inside a at [15:9]

@ccharlesgb
Copy link
Collaborator

Hey 👋

Thanks for raising an issue and PR! Am I understanding this correctly that the final schema of the table is the union of the columns defined in partitions and defined in columns ? Is it possible for partitions to be defined and columns to be empty? I know that might not be very useful but the dry runner should handle that case as well if so.

@betosaulo-cw
Copy link
Author

betosaulo-cw commented Jan 3, 2024

Hey, I hope you're doing well!

Am I understanding this correctly that the final schema of the table is the union of the columns defined in partitions and defined in columns ?

Yes, you're right. The table columns are a combination of the columns and partitions. This happens because the Hive partitions are active. So, in storage, we'll see a pattern like this: gs://bucket/path/to/table/col_partition_1=value/col_partition_2=value/file_without_col_partitions.json. In this case, if you apply a 'where' condition like col_partition_1=value, BigQuery will only access the files in this folder.

You can see examples for all the databases that dbt-external-tables support here, which seems to be the default behavior.

Is it possible for partitions to be defined and columns to be empty? I know that might not be very useful but the dry runner should handle that case as well if so.

I've never tried this specific case, but I think the code covers situations where the columns or partitions are empty. Pydantic uses a default factory for these fields, so when they are empty, the model will have an empty list of partitions and an empty dictionary of columns. Then, the unpacking should work without any issues.

@ccharlesgb ccharlesgb added the bug Something isn't working label Jan 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants