snowpipe: exactly once semantics #3060

rockwotj · 2024-12-05T02:53:22Z

Support 2 new properties in snowflake_streaming:

offset_token: A new property to support exactly once delivery: https://docs.snowflake.com/en/user-guide/data-load-snowpipe-streaming-overview#offset-tokens
channel_name: The ability to explicitly assign a batch to a channel. The current channel_prefix option doesn't support explicitly picking a channel, this allows exactly once from Kafka.

mihaitodor

Nice one @rockwotj! 🏆 I spotted a few small typos and such, but looks OK otherwise. I didn't try to play with it locally, but please let me know if you'd like me to test it.

internal/impl/snowflake/streaming/streaming.go

internal/impl/snowflake/streaming/rest.go

internal/impl/snowflake/streaming/api_errors.go

internal/impl/snowflake/schema_evolution.go

internal/impl/snowflake/output_snowflake_streaming.go

internal/impl/snowflake/pool/indexed.go

mihaitodor

🐑 🚀

this is what is required for exactly once. We don't yet use it.

NOTE that since we can't ensure certain messages can go to a specific channel at the moment, this only really works with max_in_flight=1, which is probably fine for postgres, but another commit will support channel_name properly, so one can specify explicitly the mapping from data to channel.

This will help to re-use all this logic when we create the new output that specifies channel names explicitly.

To a seperate function so it can be used between different outputs.

To clarify it, instead of spreading it out all over, this also means the schema migration function can now be a free function

One that is responsible for coordination of schema evolution and other small pieces (like custom mappings). The purpose of this is to allow for another kind of inner output that can allow for a user to specifically set the channel name (instead of using a pool).

I'm not sure if this is 100% correct, but it will work for most cases.

See the examples on what this enables with a Redpanda/Kafka input (but not kafka_franz!).

This seems a bit clearer and has nice duality with the indexed pool

By holding a lock when doing this during WriteBatch, and not having the framework call Connect outside of pipeline creation, just handle it internally.

I think this is what was missing...

We should try not to always run a SQL query everytime we startup for cost reasons. Instead of running a query (which is likely flaky because of identifier normalization anyways), just open the channel lazily and catch the specific error for the table not existing, then create the table and retry.

rockwotj force-pushed the snow-once branch 4 times, most recently from e5423ce to 10a350f Compare December 9, 2024 21:38

rockwotj marked this pull request as ready for review December 10, 2024 02:53

rockwotj requested review from mihaitodor, ooesili and Jeffail December 10, 2024 02:53

rockwotj force-pushed the snow-once branch 5 times, most recently from f4c4952 to afe38a5 Compare December 17, 2024 05:23

mihaitodor approved these changes Jan 6, 2025

View reviewed changes

rockwotj added 15 commits January 7, 2025 01:40

snowflake: add a test about reopening a channel invalidates old channels

3c08229

snowpipe: add lint rules for build options

943b56a

snowpipe: rename capped package to pool

ebc62d5

snowpipe: add indexed pool utility

6a64660

snowpipe: plumb offset_token around in snowpipe streaming api

1128f6c

this is what is required for exactly once. We don't yet use it.

snowpipe: extract out schema evolution to seperate struct

6b427ae

This will help to re-use all this logic when we create the new output that specifies channel names explicitly.

snowpipe: fix linter errors

8b032bd

snowpipe: update docs

a89aa6c

snowpipe: cleanup integration test

22f5ed6

snowpipe: fix pool tests

605bdf6

snowpipe: extract exactly once processing

4fa00bc

To a seperate function so it can be used between different outputs.

snowpipe: move schema migration locking to one place

c79a6b7

To clarify it, instead of spreading it out all over, this also means the schema migration function can now be a free function

snowpipe: extract metrics to another file

cf11494

rockwotj added 15 commits January 7, 2025 01:40

snowpipe: do some basic normalization for table names

7ec691b

I'm not sure if this is 100% correct, but it will work for most cases.

snowpipe: support explicitly specifying the channel name

2821239

See the examples on what this enables with a Redpanda/Kafka input (but not kafka_franz!).

snowflake: move id generation into pool

0fc52c6

This seems a bit clearer and has nice duality with the indexed pool

chore: fmt

57e4684

snowpipe: prevent races with Close/Connect

e863883

By holding a lock when doing this during WriteBatch, and not having the framework call Connect outside of pipeline creation, just handle it internally.

snowpipe: fix cleanup

7b8592b

snowflake: clarify docs

03464a4

snowflake: add some totally wicked examples

3492d7c

snowpipe: remove application query param which does nothing

4d5994c

snowpipe: add metadata for SNOWPIPE_STREAMING_CLIENT_HISTORY

fc6646a

I think this is what was missing...

snowpipe: support interpolated tables

dcb5dc5

update changelog

8522061

snowpipe: fix table creation for dynamic tables

9910657

snowpipe: review feedback

620afd2

rockwotj force-pushed the snow-once branch from 54e6fe7 to 620afd2 Compare January 7, 2025 01:41

snowpipe: make docs

1975bcd

rockwotj merged commit ccf4086 into main Jan 7, 2025
4 checks passed

rockwotj deleted the snow-once branch January 7, 2025 02:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

snowpipe: exactly once semantics #3060

snowpipe: exactly once semantics #3060

rockwotj commented Dec 5, 2024 •

edited

Loading

mihaitodor left a comment

mihaitodor left a comment

snowpipe: exactly once semantics #3060

snowpipe: exactly once semantics #3060

Conversation

rockwotj commented Dec 5, 2024 • edited Loading

mihaitodor left a comment

Choose a reason for hiding this comment

mihaitodor left a comment

Choose a reason for hiding this comment

rockwotj commented Dec 5, 2024 •

edited

Loading