Snapshotting silently fails on values greater than int max value when source dtype changes to bigint #759

jelstongreen · 2024-08-07T11:04:20Z

Describe the bug

We noticed one table that we snapshot that when the source data changes a col from int to bigint but the downstream snapshot is not amended that the snapshot will just continuously insert the old record whenever the snapshot is run and not error that the new value cannot be inserted.

Steps To Reproduce

Snapshot a source model with a regular int column.
Change the source model column to bigint and change the value in the row to a bigint value larger than max int.
Run the dbt snapshot on the source table. The snapshot should have the check strategy and have the column that's been amended in that check list.

Expected behavior

A clear and concise description of what you expected to happen.
If the target dtype is int for a bigint value then there should be an error rather than a silent failure.

Screenshots and log output

If applicable, add screenshots or log output to help explain your problem.

System information

The output of dbt --version:

<output goes here>

The operating system you're using:

The output of python --version:

Additional context

Add any other context about the problem here.

The text was updated successfully, but these errors were encountered:

henlue · 2024-10-29T18:06:20Z

I would have time to work on this.

It looks like currently the merge statement that is executed will always try to cast the datatypes. Even if it fails to cast the data (for example string to int) it will insert null and there will be no error from the databricks side.

I would propose to check for schema changes in dbt, by using the check_for_schema_changes macro from dbt core. It is already being used to check for schema changes in incremental models.

@benc-db should this check just happen, or would it make sense to make it configurable? Similar to incremental models there could be an on_schema_change config option. It could be "ignore" by default to keep the current behavior, and only if it is set to "fail" check_for_schema_changes would be executed.

benc-db · 2024-10-29T18:09:02Z

@mikealfare Would love to hear your thoughts on @henlue's comment.

mikealfare · 2024-10-29T18:58:06Z

Without thinking about it too much, adding on_schema_change to snapshots sounds like a good idea. We're in the process of updating snapshots to add more features in general. I would post some comments on this issue. It collects all of the snapshot work being done. I don't know if this particular feature is included, but there are certainly schema-related items on there.

jelstongreen added the bug Something isn't working label Aug 7, 2024

benc-db added the help wanted Extra attention is needed label Aug 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Snapshotting silently fails on values greater than int max value when source dtype changes to bigint #759

Snapshotting silently fails on values greater than int max value when source dtype changes to bigint #759

jelstongreen commented Aug 7, 2024

henlue commented Oct 29, 2024

benc-db commented Oct 29, 2024

mikealfare commented Oct 29, 2024

Snapshotting silently fails on values greater than int max value when source dtype changes to bigint #759

Snapshotting silently fails on values greater than int max value when source dtype changes to bigint #759

Comments

jelstongreen commented Aug 7, 2024

Describe the bug

Steps To Reproduce

Expected behavior

Screenshots and log output

System information

Additional context

henlue commented Oct 29, 2024

benc-db commented Oct 29, 2024

mikealfare commented Oct 29, 2024