You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We all use VStream primitives via their targeted implementations with VReplication and friends, but there is limited usage of VStream natively. At the other end of the spectrum, there are very specific implementations using VStream:
I think there are so many great use cases for VStream, but it is intimidating, with a lot of edge cases that need to be handled correctly. The best example we have today is in the local example, that doesn't show much more than how to get it started and print the events. Obviously all the primitives exist to translate the events into usable Go data structures, but those aren't well documented. If it were easier to get started, I believe VStream usage would increase significantly.
Use Cases
There are lots of things that can be done by consuming the vstream, but the most common use case is syncing changes to a data warehouse. Today, that is complicated by:
MySQL replication is widely supported as a data sync mechanism, but Vitess makes this generally unfeasible (see below)
Even if we did manage to convert a VStream into a standard MySQL replication stream for compatibility with tools mentioned in 1, those still often require additional infra, like Kafka, Flink, etc.
Debezium is great, but is a very heavy dependency requiring Kafka or similar message queue, which significantly increases the cost and complexity overhead. It's also written in Java, which may not be as accessible to Vitess users more familiar with Go
Solution
Teams already have Vitess expertise, so lean into that. I propose adding a reference implementation of a client framework for consuming VStream.
Goals
Provide a simple, production ready, semi-opinionated way to consume vstream without needing deep expertise
Translate vstream events back into Go structs
Be a starting point for anyone interested in implementing vstream, like a custom data warehouse connector
Have pervasive comments / documentation about edge cases, gotchas, and good patterns
Non-Goals
Require any special server or access to Vitess internals
Be necessary to use vstream
Target any specific data warehouse or use case
One way to think about this is like an pluggable Materialize stream. In fact, that was one route I had considered for connecting to a data warehouse - to materialize rows into an unsharded MySQL instance, then expose that replication stream to one of the aforementioned CDC consumers. That still didn't solve the issue of added cost and overhead, while also requiring extra MySQL infrastructure to write rows. Instead of Materialize writing rows to the database, the idea here is to use a VStream framework to convert binlogs into the same Go structs that represent the db rows, but that can be programmatically exported anywhere.
How should state be managed?
I think state should be stored in Vitess itself, using a user-supplied keyspace / table. Since this isn't supposed to be privileged, it would live with their tables, not in _vt
Implementation
I have already built a working implementation. I started just fleshing out the local vstream example, but it quickly ballooned in complexity past what should be expected in an example. Usage is as shown in the tests. Feedback very appreciated.
VStream: the hidden superpower
We all use VStream primitives via their targeted implementations with VReplication and friends, but there is limited usage of VStream natively. At the other end of the spectrum, there are very specific implementations using VStream:
I think there are so many great use cases for VStream, but it is intimidating, with a lot of edge cases that need to be handled correctly. The best example we have today is in the local example, that doesn't show much more than how to get it started and print the events. Obviously all the primitives exist to translate the events into usable Go data structures, but those aren't well documented. If it were easier to get started, I believe VStream usage would increase significantly.
Use Cases
There are lots of things that can be done by consuming the vstream, but the most common use case is syncing changes to a data warehouse. Today, that is complicated by:
Solution
Teams already have Vitess expertise, so lean into that. I propose adding a reference implementation of a client framework for consuming VStream.
Goals
Non-Goals
One way to think about this is like an pluggable
Materialize
stream. In fact, that was one route I had considered for connecting to a data warehouse - to materialize rows into an unsharded MySQL instance, then expose that replication stream to one of the aforementioned CDC consumers. That still didn't solve the issue of added cost and overhead, while also requiring extra MySQL infrastructure to write rows. Instead ofMaterialize
writing rows to the database, the idea here is to use a VStream framework to convert binlogs into the same Go structs that represent the db rows, but that can be programmatically exported anywhere.Open Questions
I think state should be stored in Vitess itself, using a user-supplied keyspace / table. Since this isn't supposed to be privileged, it would live with their tables, not in
_vt
Implementation
I have already built a working implementation. I started just fleshing out the local vstream example, but it quickly ballooned in complexity past what should be expected in an example. Usage is as shown in the tests. Feedback very appreciated.
Related Resources
cc @mattlord
The text was updated successfully, but these errors were encountered: