You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For streaming use cases, an always-on tap should be able to "tail" the data source and continually run the extract. This is possible today without changes to the spec, but there is not yet any guidance or "best practice" in terms of how to accomplish this.
A reference implementation would likely entail one or more of the following:
A setting like tail (boolean) which, when true would put the tap into an infinite loop of extract, sleep, extract, and so on.
To honor resumability on interrupt, something like a state_message_min_cadence setting could be established, which forces a state message to flush at least every 1-5 minutes, for instance, assuming 1 or more records have been sent since the last STATE message.
For incremental streams, something like a new_record_polling_interval setting could indicated how much time to sleep between calls to check for new records.
To keep non-incremental streams whole while running a tap in tail mode, we'll want to have a max_full_table_age or similar, which would drive new extracts of streams pulled with FULL_TABLE after the set amount of time has elapsed.
Receiving a SIGTERM message (before SIGKILL) should immediately trigger a flush the latest STATE message. In an ideal scenario, sending the termination command (like ctrl+c) should give the tap a few seconds to send on the last viable state bookmark and give an optimal chance of not having to repeat records on a subsequent execution.
Note: While the priority here is probably the tail setting, it should be fair to assume that a method that needs to run indefinitely should also be ready to be interrupted at any point; since the only way to "stop" the stream is to kill the process, we will need to be ready for interruption and plan for how STATE tracking will be affected by the worst-case scenario interruption timing.
The text was updated successfully, but these errors were encountered:
aaronsteers
changed the title
Reference implementation for tail=True setting, enabling infinitely-running, interruptible processes
Reference implementation for tail=true setting, enabling infinitely-running, interruptible processes
Sep 7, 2021
aaronsteers
changed the title
Reference implementation for tail=true setting, enabling infinitely-running, interruptible processes
Reference implementation for tail=true setting, enabling indefinitely-running, interruptible processes
Sep 7, 2021
Hello, I really need this and plan to implement my own version of it in my tap. Where is the current status of this issue?
We have desire to run a tap in "batch" mode to dump records as well as "streaming" (or "tail") mode which listens for new events and pushes them to the target accordingly.
Question, is there any reason the singer wrappers don't expose a simple optional REST interface (think management backplane) for start, stop, shutdown etc. instead of using signals which can get dicey depending on language/context?
So when a tap starts it also starts a simple REST interface that allows you to control the tap by sending simple JSON commands (messages, just like the spec does for RECORDs).
For streaming use cases, an always-on tap should be able to "tail" the data source and continually run the extract. This is possible today without changes to the spec, but there is not yet any guidance or "best practice" in terms of how to accomplish this.
A reference implementation would likely entail one or more of the following:
tail
(boolean) which, whentrue
would put the tap into an infinite loop of extract, sleep, extract, and so on.state_message_min_cadence
setting could be established, which forces a state message to flush at least every 1-5 minutes, for instance, assuming 1 or more records have been sent since the last STATE message.new_record_polling_interval
setting could indicated how much time to sleep between calls to check for new records.tail
mode, we'll want to have amax_full_table_age
or similar, which would drive new extracts of streams pulled withFULL_TABLE
after the set amount of time has elapsed.SIGTERM
message (beforeSIGKILL
) should immediately trigger a flush the latest STATE message. In an ideal scenario, sending the termination command (likectrl+c
) should give the tap a few seconds to send on the last viable state bookmark and give an optimal chance of not having to repeat records on a subsequent execution.Note: While the priority here is probably the
tail
setting, it should be fair to assume that a method that needs to run indefinitely should also be ready to be interrupted at any point; since the only way to "stop" the stream is to kill the process, we will need to be ready for interruption and plan for how STATE tracking will be affected by the worst-case scenario interruption timing.The text was updated successfully, but these errors were encountered: