[BUG] dbt-dry-run Ignoring Partitions from dbt-external-tables #53

betosaulo-cw · 2023-12-29T13:51:26Z

I've encountered an issue where dbt-dry-run fails to recognize the partitions defined in a YAML configuration for a BigQuery table. Despite the correct specification of partitions under the dbt-external-tables setup, the dry run process results in a BadRequest error, indicating that the partition could not be found.

Steps to Reproduce

Define a table in YAML with the following structure:

version: 2

sources:
  - name: dataset_name
    tables:
      - name: table_name
        external:
          location:  "gs://bucket/path/to/table/table_name/*.json"
          options:
            format: json
            hive_partition_uri_prefix: "gs://bucket/path/to/table/table_name"
            ignore_unknown_values: true
          partitions:
            - name: ts
              data_type: STRING
        columns: 
          - name: uuid
            data_type: STRING
          - name: name
            data_type: STRING
          ...

Observe the error:
400 POST https://bigquery.googleapis.com/bigquery/v2/projects/project_id/jobs?prettyPrint=false: Name ts not found inside a at [15:9]

The text was updated successfully, but these errors were encountered:

ccharlesgb · 2024-01-03T09:21:16Z

Hey 👋

Thanks for raising an issue and PR! Am I understanding this correctly that the final schema of the table is the union of the columns defined in partitions and defined in columns ? Is it possible for partitions to be defined and columns to be empty? I know that might not be very useful but the dry runner should handle that case as well if so.

betosaulo-cw · 2024-01-03T14:35:12Z

Hey, I hope you're doing well!

Am I understanding this correctly that the final schema of the table is the union of the columns defined in partitions and defined in columns ?

Yes, you're right. The table columns are a combination of the columns and partitions. This happens because the Hive partitions are active. So, in storage, we'll see a pattern like this: gs://bucket/path/to/table/col_partition_1=value/col_partition_2=value/file_without_col_partitions.json. In this case, if you apply a 'where' condition like col_partition_1=value, BigQuery will only access the files in this folder.

You can see examples for all the databases that dbt-external-tables support here, which seems to be the default behavior.

Is it possible for partitions to be defined and columns to be empty? I know that might not be very useful but the dry runner should handle that case as well if so.

I've never tried this specific case, but I think the code covers situations where the columns or partitions are empty. Pydantic uses a default factory for these fields, so when they are empty, the model will have an empty list of partitions and an empty dictionary of columns. Then, the unpacking should work without any issues.

betosaulo-cw mentioned this issue Dec 29, 2023

fix: dbt-external-tables partitions #54

Closed

3 tasks

ccharlesgb added the bug Something isn't working label Jan 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] dbt-dry-run Ignoring Partitions from dbt-external-tables #53

[BUG] dbt-dry-run Ignoring Partitions from dbt-external-tables #53

betosaulo-cw commented Dec 29, 2023

ccharlesgb commented Jan 3, 2024

betosaulo-cw commented Jan 3, 2024 •

edited

Loading

[BUG] dbt-dry-run Ignoring Partitions from dbt-external-tables #53

[BUG] dbt-dry-run Ignoring Partitions from dbt-external-tables #53

Comments

betosaulo-cw commented Dec 29, 2023

Steps to Reproduce

ccharlesgb commented Jan 3, 2024

betosaulo-cw commented Jan 3, 2024 • edited Loading

betosaulo-cw commented Jan 3, 2024 •

edited

Loading