Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Schema evolution error when denormalized and snake_case is enabled #66

Open
qgab-flowdesk opened this issue Jul 24, 2023 · 2 comments
Open

Comments

@qgab-flowdesk
Copy link

qgab-flowdesk commented Jul 24, 2023

I'm running into issues to load data with the denormalised parameter enabled.
Looking into the code, it seems that even when the table does not yet exists in BigQuery, we run into the following error:

google.api_core.exceptions.BadRequest: 400 PATCH https://bigquery.googleapis.com/bigquery/v2/projects/project-id/datasets/salesforce/tables/campaign?prettyPrint=false: Field id already exists in schema

The table schema is properly defined in BigQuery, but when it comes to load the actual data it looks like the table schema if not resolved correctly. The issue persists with both version 1 and 2 of the schema_resolver. Maybe there is an issue with the schema_translator?

If someone has Salesforce credentials, here is the corresponding Meltano configuration for reproducing the issue:

    plugins:
    
      extractors:
      - name: tap-salesforce
        config:
          api_type: REST
          start_date: '2023-06-01T00:00:00Z'
        select:
        - Campaign.*

      loaders:
      - name: target-bigquery
        variant: z3z1ma
        config:
          project: project-id
          dataset: salesforce
          method: batch_job
          location: us-west1
          batch_size: 500
          fail_fast: True
          append: False
          overwrite: True
          flattening_enabled: True
          denormalized: True
          schema_resolver_version: 2
          column_name_transforms:
            lower: True
            quote: False
            add_underscore_when_invalid: True
            snake_case: True

And the meltano commands:

meltano --log-level=debug invoke tap-salesforce > output.json
cat output.json | meltano --log-level=debug invoke target-bigquery
@qgab-flowdesk qgab-flowdesk changed the title Schema evolution error when denormalized is enabled Schema evolution error when denormalized and snake_case is enabled Jul 26, 2023
@qgab-flowdesk
Copy link
Author

qgab-flowdesk commented Jul 26, 2023

Alright so it looks like in my specific case the issue is caused because the mut_schema being a concatenation of non-transformed columns + metadata columns + transformed columns.
To generalise, that issue will happen when both denormalized = True and snake_case = True in situations where tap output contains any camel case single-world column name.
What happens is that we get the non transformed table schema, then check in the transformed table schema if any field is new: if input is camel case, all transformed fields are new.

For double word column names, an underscore is introduced so we'll just end up having a duplicated column (e.,g. StartDate -> start_date).
For any single world column though, BigQuery will consider the fields appears twice in the schema provided to update the table (e.g. Id -> id).

That's when we end up with the following error, regardless of the sink since the Denormalized.update_schema method overwrites the update_schema implementation of all of the 4 sinks:
google.api_core.exceptions.BadRequest: 400 PATCH https://bigquery.googleapis.com/bigquery/v2/projects/project/datasets/salesforce/tables/campaign?prettyPrint=false: Field id already exists in schema

@alexcartwright
Copy link

alexcartwright commented Oct 11, 2023

I am experiencing the same issue.

The Denormalized class calls

self.table.as_table()

Would a solution be to replace this line with:

table = self.table._table or
table = self.table.as_table(self.apply_transforms)

The former is the actual scheme as requested from bigquery table, the second is the internal representation of the table applying appropriate transforms.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants