Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reference implementation for tail=true setting, enabling indefinitely-running, interruptible processes #7

Open
aaronsteers opened this issue Sep 7, 2021 · 2 comments

Comments

@aaronsteers
Copy link
Contributor

aaronsteers commented Sep 7, 2021

For streaming use cases, an always-on tap should be able to "tail" the data source and continually run the extract. This is possible today without changes to the spec, but there is not yet any guidance or "best practice" in terms of how to accomplish this.

A reference implementation would likely entail one or more of the following:

  1. A setting like tail (boolean) which, when true would put the tap into an infinite loop of extract, sleep, extract, and so on.
  2. To honor resumability on interrupt, something like a state_message_min_cadence setting could be established, which forces a state message to flush at least every 1-5 minutes, for instance, assuming 1 or more records have been sent since the last STATE message.
  3. For incremental streams, something like a new_record_polling_interval setting could indicated how much time to sleep between calls to check for new records.
  4. To keep non-incremental streams whole while running a tap in tail mode, we'll want to have a max_full_table_age or similar, which would drive new extracts of streams pulled with FULL_TABLE after the set amount of time has elapsed.
  5. Receiving a SIGTERM message (before SIGKILL) should immediately trigger a flush the latest STATE message. In an ideal scenario, sending the termination command (like ctrl+c) should give the tap a few seconds to send on the last viable state bookmark and give an optimal chance of not having to repeat records on a subsequent execution.

Note: While the priority here is probably the tail setting, it should be fair to assume that a method that needs to run indefinitely should also be ready to be interrupted at any point; since the only way to "stop" the stream is to kill the process, we will need to be ready for interruption and plan for how STATE tracking will be affected by the worst-case scenario interruption timing.

@aaronsteers aaronsteers changed the title Reference implementation for tail=True setting, enabling infinitely-running, interruptible processes Reference implementation for tail=true setting, enabling infinitely-running, interruptible processes Sep 7, 2021
@aaronsteers aaronsteers changed the title Reference implementation for tail=true setting, enabling infinitely-running, interruptible processes Reference implementation for tail=true setting, enabling indefinitely-running, interruptible processes Sep 7, 2021
@aaronsteers
Copy link
Contributor Author

@pandemicsyn has graciously offered to draft a proposal doc for this item. 🙌

pandemicsyn pushed a commit to pandemicsyn/Singer-Working-Group that referenced this issue Oct 25, 2021
@aaronsteers aaronsteers linked a pull request Jun 20, 2022 that will close this issue
@alexandersack-iex
Copy link

Hello, I really need this and plan to implement my own version of it in my tap. Where is the current status of this issue?

We have desire to run a tap in "batch" mode to dump records as well as "streaming" (or "tail") mode which listens for new events and pushes them to the target accordingly.

Question, is there any reason the singer wrappers don't expose a simple optional REST interface (think management backplane) for start, stop, shutdown etc. instead of using signals which can get dicey depending on language/context?

So when a tap starts it also starts a simple REST interface that allows you to control the tap by sending simple JSON commands (messages, just like the spec does for RECORDs).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants