Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: VStream client reference implementation #17221

Open
derekperkins opened this issue Nov 13, 2024 · 0 comments · May be fixed by #17222
Open

RFC: VStream client reference implementation #17221

derekperkins opened this issue Nov 13, 2024 · 0 comments · May be fixed by #17222
Assignees
Labels

Comments

@derekperkins
Copy link
Member

derekperkins commented Nov 13, 2024

VStream: the hidden superpower

We all use VStream primitives via their targeted implementations with VReplication and friends, but there is limited usage of VStream natively. At the other end of the spectrum, there are very specific implementations using VStream:

I think there are so many great use cases for VStream, but it is intimidating, with a lot of edge cases that need to be handled correctly. The best example we have today is in the local example, that doesn't show much more than how to get it started and print the events. Obviously all the primitives exist to translate the events into usable Go data structures, but those aren't well documented. If it were easier to get started, I believe VStream usage would increase significantly.

Use Cases

There are lots of things that can be done by consuming the vstream, but the most common use case is syncing changes to a data warehouse. Today, that is complicated by:

  1. MySQL replication is widely supported as a data sync mechanism, but Vitess makes this generally unfeasible (see below)
  2. Even if we did manage to convert a VStream into a standard MySQL replication stream for compatibility with tools mentioned in 1, those still often require additional infra, like Kafka, Flink, etc.
  3. Debezium is great, but is a very heavy dependency requiring Kafka or similar message queue, which significantly increases the cost and complexity overhead. It's also written in Java, which may not be as accessible to Vitess users more familiar with Go

Solution

Teams already have Vitess expertise, so lean into that. I propose adding a reference implementation of a client framework for consuming VStream.

Goals

  1. Provide a simple, production ready, semi-opinionated way to consume vstream without needing deep expertise
  2. Translate vstream events back into Go structs
  3. Be a starting point for anyone interested in implementing vstream, like a custom data warehouse connector
  4. Have pervasive comments / documentation about edge cases, gotchas, and good patterns

Non-Goals

  1. Require any special server or access to Vitess internals
  2. Be necessary to use vstream
  3. Target any specific data warehouse or use case

One way to think about this is like an pluggable Materialize stream. In fact, that was one route I had considered for connecting to a data warehouse - to materialize rows into an unsharded MySQL instance, then expose that replication stream to one of the aforementioned CDC consumers. That still didn't solve the issue of added cost and overhead, while also requiring extra MySQL infrastructure to write rows. Instead of Materialize writing rows to the database, the idea here is to use a VStream framework to convert binlogs into the same Go structs that represent the db rows, but that can be programmatically exported anywhere.

Open Questions

  1. Where should this live?
  2. How should state be managed?
    I think state should be stored in Vitess itself, using a user-supplied keyspace / table. Since this isn't supposed to be privileged, it would live with their tables, not in _vt

Implementation

I have already built a working implementation. I started just fleshing out the local vstream example, but it quickly ballooned in complexity past what should be expected in an example. Usage is as shown in the tests. Feedback very appreciated.

Related Resources

cc @mattlord

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants