Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-3521] [Feature] Remove REFRESH from snowflake_refresh_snowpipe #250

Open
3 tasks done
clementmg-getaround opened this issue Jan 4, 2024 · 6 comments · May be fixed by #300
Open
3 tasks done

[CT-3521] [Feature] Remove REFRESH from snowflake_refresh_snowpipe #250

clementmg-getaround opened this issue Jan 4, 2024 · 6 comments · May be fixed by #300
Labels

Comments

@clementmg-getaround
Copy link

clementmg-getaround commented Jan 4, 2024

Remove REFRESH from here since the Snowflake docs say that:

The REFRESH functionality is intended for short term use to resolve specific issues when Snowpipe fails to load a subset of files and is not intended for regular use

The original feature request transferred from dbt-core follows below:

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion

Describe the feature

The idea is to create the famous sleep/wait command, that is present across multiple frameworks, and that simply allows the worker to wait a defined amount of time: dbt sleep 3

Describe alternatives you've considered

Alternatives were hacky really: launching some impactless commands to mimic a sleep behavior.
We also thought of splitting the dbt job in two, schedule their execution through an external tool, which supports sleep command (Airflow). We believe dbt should possess that power.

Who will this benefit?

This known command can be very useful in multiple situations.
In our case, a dbt job scheduled two distinct tasks.
And because of external shenanigans (related to snowflake pipes), the first job is marked as finished by dbt when in fact it is still running. Therefore, the second job starts too early.
A simple dbt sleep with a custom time would prevent that unwanted behavior.

I am positive this feature could help solve similar issues across various platforms.

Are you interested in contributing this feature?

Sure!

Anything else?

No response

@clementmg-getaround clementmg-getaround added enhancement New feature or request triage labels Jan 4, 2024
@github-actions github-actions bot changed the title [Feature] Implementing a sleep/wait command [CT-3521] [Feature] Implementing a sleep/wait command Jan 4, 2024
@graciegoheen
Copy link

Hey! It sounds like your dbt command is completing successfully, when actually the underlying execution statements have not yet completed.

Could you give us some more insight into your specific situation? Ideally, we'd want behavior such that when the dbt command has completed, it has actually completed.

@clementmg-getaround
Copy link
Author

Hello!

Hey! It sounds like your dbt command is completing successfully, when actually the underlying execution statements have not yet completed.

Yes precisely.
However, this seems to be more on an issue from Snowflake. It seems to send the wrong signal to dbt, saying the task completed (The command system$pipe_status command shows lastIngestedTimestamp the pipe finishes early (as planned conceptually). However, it is not completely: checking copy_history details shows that the pipe runs longer than the success linked above => it runs until a time which corresponds to the copy-command.

We opened a ticket in Snowflake customer support to try and fix that.
Adding a sleep command won't tackle the heart of the issue, but it would certainly be a useful temporary fix.

@olperri
Copy link

olperri commented Jan 29, 2024

Hello!
I worked with @clementmg-getaround into investigating this issue in our org.

In case it can help, so far the only lead we have for the shenanigan described above is :

  • When refreshing snowpipes, dbt-external-tables uses the REFRESH parameter when no auto-ingest is set up [link]
  • Current snowflake documentation on alter pipe mention the following warning about REFRESH => the emphasis on the warning make us wonder if there are some technical subtleties behind the scenes

    Important
    The REFRESH functionality is intended for short term use to resolve specific issues when Snowpipe fails to load a subset of files and is not intended for regular use.

We are uncertain wether this is the source of the current behaviour, but its our current most likely explanation with our current knowledge

@dbeatty10
Copy link
Contributor

Nice troubleshooting @clementmg-getaround and @olperri !

Given your findings, I'm going to transfer this issue to dbt-external-tables.

@dbeatty10 dbeatty10 transferred this issue from dbt-labs/dbt-core Jan 30, 2024
@dbeatty10 dbeatty10 changed the title [CT-3521] [Feature] Implementing a sleep/wait command [CT-3521] [Feature] Remove REFRESH from snowflake_refresh_snowpipe Jan 30, 2024
@dataders dataders linked a pull request May 1, 2024 that will close this issue
3 tasks
@dataders
Copy link
Collaborator

dataders commented May 1, 2024

hey @clementmg-getaround @olperri is my understanding here correct?

REFRESH should not be used in conjunction with Snowpipe, even when AUTO_REFRESH=FALSE

Rather than remove the SQL statement about REFRESH I propose a further solution. Remove snowflake_refresh_snowpipe() macro entirely. Instead rely on snowflake__refresh_external_table() which, unless manual_refresh is a no-op (i.e. returns zero SQL, so the package will execute just an empty string.

{% macro default__refresh_external_table(source_node) %}
{% do return([]) %}
{% endmacro %}

I've opened #300, would y'all please let me know if this solution works for you?

@olperri
Copy link

olperri commented May 7, 2024

Thanks @dataders for looking into this! 🙏

To be fully transparent, I cannot really confirm anything :

  • @clementmg-getaround and I believe that this might be a gotcha and we still aren't fully confident it's the source of the error we faced. We mostly wanted to share it with the dbt-community
  • We are not very-comfortable with snowpipe and dbt-internals to confirm if this is an appropriate solution or no

Sorry for this "non-answer" but I'll ping the analytics-engineers in our org though as they might have an opinion or could be able to confirm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants