Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Python module causing manifest generating different results without changes #10968

Open
2 tasks done
james-ho-cko opened this issue Nov 4, 2024 · 7 comments
Open
2 tasks done
Labels
awaiting_response bug Something isn't working state: modified state Stateful selection (state:modified, defer)

Comments

@james-ho-cko
Copy link

Is this a new bug in dbt-core?

  • I believe this is a new bug in dbt-core
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

I have some models with incremental predicate that looks back n days only. Something like

incremental_predicates = [
            "DBT_INTERNAL_DEST.ingestion_timestamp::date >= dateadd(day, -10, " ~ var('date_parameter', modules.datetime.datetime.now()) ~ "::date)"
        ],

What we're noticing is the python module would render an actual value in the manifest file, which means the timestamp could change without the model being modified. This is causing issue cause the state:modified would pick up all models with this predicate.

Expected Behavior

We would expect the manifest to be unchanged and the models with this setup wouldn't be flagged as changed when running state:modified

Steps To Reproduce

  1. Adding an incremental predicate to a model with module.datetime.datetime.now() in it.
incremental_predicates = ["DBT_INTERNAL_DEST.ingestion_timestamp::date >= dateadd(day, -10, " ~ modules.datetime.datetime.now() ~ "::date)"],
  1. Run compile to generate the manifest file
  2. The incremental predicate of the model in the manifest file would show the current timestamp
  3. Copy the manifest file to current directory mv ..../target/manifest.json .
  4. Switch to another branch and generate the manifest file again without changing anything
  5. run dbt ls -m state:modified.body --state . to compare the 2 branches. It shouldn't show anything as we didn't change any model, but the model with this predicate would show up. Looking into the 2 manifest files can see the timestamps in the incremental predicate are different

Relevant log output

No response

Environment

- OS:macOS 14.7, build 23H124
- Python:3.9.6
- dbt:1.6.1

Which database adapter are you using with dbt?

snowflake

Additional Context

No response

@james-ho-cko james-ho-cko added bug Something isn't working triage labels Nov 4, 2024
@dbeatty10 dbeatty10 added state Stateful selection (state:modified, defer) state: modified labels Nov 4, 2024
@dbeatty10
Copy link
Contributor

Thanks for reaching out @james-ho-cko !

We've been addressing some case of false positives within #9562.

Did you happen to try installing dbt-core 1.9.0b3 and see if it still has the issue or not?

For date_parameter, is it coming from vars: in dbt_project.yml or --vars passed on the command line?.

@james-ho-cko
Copy link
Author

james-ho-cko commented Nov 5, 2024

Thanks for getting back to me Doug, the epic you shared is really helpful.
We haven't upgraded to the latest version so I can't really tell. We currently get around with this by creating a macro as the predicate.
The date_parameter is what we use to be able to ad-hocly change the look back day if we needed to, so it'd be passed on the command line like --vars
I tried to add

flags:
  state_modified_compare_more_unrendered_values: true

to dbt_project.yml as the documentation suggests but seems not working. Is it only available in v1.9?

@dbeatty10
Copy link
Contributor

state_modified_compare_more_unrendered_values is available starting in v1.9 (but not for earlier versions than that).

@dbeatty10
Copy link
Contributor

@james-ho-cko

I tried to replicate the scenario that you reported using dbt v1.6, but I wasn't able. Could you try tweaking the file and commands below to replicate your bug report?

Create this file:

models/my_model.sql

{{
  config(
    materialized = 'incremental',
    unique_key = 'id',
    incremental_strategy = 'merge',
    incremental_predicates = ["DBT_INTERNAL_DEST.ingestion_timestamp::date <= '" ~ var('date_parameter', modules.datetime.datetime.now()) ~ "'::date"],
  )
}}

select 1 as id, {{ dbt.current_timestamp() }} as ingestion_timestamp

Run these commands:

dbt compile
mv target/manifest.json .
dbt list -s state:modified --state .

I got this output:

04:04:58  Running with dbt=1.6.5
04:04:59  Registered adapter: snowflake=1.6.6
04:04:59  Found 1 model, 0 sources, 0 exposures, 0 metrics, 375 macros, 0 groups, 0 semantic models
04:04:59  The selection criterion 'state:modified' does not match any nodes
04:04:59  No nodes selected!

@james-ho-cko
Copy link
Author

I think you need to change branch.

What we have is a model already in production with this predicate setup, and every time when we run CI on feature branch against the main branch's manifest, the model will show up even it's untouched.

It's a bit inconsistent when I try to replicate it locally by

dbt compile
mv target/manifest.json .

git checkout -b testing
dbt list -s state:modified --state .

This might now work as it's the same file you created locally. I think it needs to be a model that's already in 2 different branches.
Let me know if that makes sense!

@james-ho-cko
Copy link
Author

@dbeatty10 try this:

Create this file:

models/my_model.sql

{{
  config(
    materialized = 'incremental',
    unique_key = 'id',
    incremental_strategy = 'merge',
    incremental_predicates = ["DBT_INTERNAL_DEST.ingestion_timestamp::date <= '" ~ var('date_parameter', modules.datetime.datetime.now()) ~ "'::date"],
  )
}}

select 1 as id, {{ dbt.current_timestamp() }} as ingestion_timestamp

Run these commands:

dbt compile
mv target/manifest.json .

Delete the model file(keep the manifest at current directory)

Switch to another branch

git checkout -b testing

Create a file with the same name and the exactly same contents:

{{
  config(
    materialized = 'incremental',
    unique_key = 'id',
    incremental_strategy = 'merge',
    incremental_predicates = ["DBT_INTERNAL_DEST.ingestion_timestamp::date <= '" ~ var('date_parameter', modules.datetime.datetime.now()) ~ "'::date"],
  )
}}

select 1 as id, {{ dbt.current_timestamp() }} as ingestion_timestamp

Run

dbt ls -m state:modified --state .

This should give you my_model. And if you compare the 2 manifest, you should be able to see the only difference is the timestamp in the predicate.
Screenshot 2024-11-06 at 16 35 34

@dbeatty10
Copy link
Contributor

@james-ho-cko I tried the steps you outlined, but I still wasn't able to replicate what you are reporting with state:modified.

I can see that the "incremental_predicates": portions of manifest.json and target/manifiest.json are different, but I get the message:

04:27:41  The selection criterion 'state:modified' does not match any nodes
04:27:41  No nodes selected!

As an aside, I wouldn't expect changing branches to make a difference either way as it relates to state:modified.

Are you able to replicate the issue without changing branches?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting_response bug Something isn't working state: modified state Stateful selection (state:modified, defer)
Projects
None yet
Development

No branches or pull requests

2 participants