RFC: VStream client reference implementation #17221

derekperkins · 2024-11-13T00:32:59Z

VStream: the hidden superpower

We all use VStream primitives via their targeted implementations with VReplication and friends, but there is limited usage of VStream natively. At the other end of the spectrum, there are very specific implementations using VStream:

I think there are so many great use cases for VStream, but it is intimidating, with a lot of edge cases that need to be handled correctly. The best example we have today is in the local example, that doesn't show much more than how to get it started and print the events. Obviously all the primitives exist to translate the events into usable Go data structures, but those aren't well documented. If it were easier to get started, I believe VStream usage would increase significantly.

Use Cases

There are lots of things that can be done by consuming the vstream, but the most common use case is syncing changes to a data warehouse. Today, that is complicated by:

MySQL replication is widely supported as a data sync mechanism, but Vitess makes this generally unfeasible (see below)
Even if we did manage to convert a VStream into a standard MySQL replication stream for compatibility with tools mentioned in 1, those still often require additional infra, like Kafka, Flink, etc.
Debezium is great, but is a very heavy dependency requiring Kafka or similar message queue, which significantly increases the cost and complexity overhead. It's also written in Java, which may not be as accessible to Vitess users more familiar with Go

Solution

Teams already have Vitess expertise, so lean into that. I propose adding a reference implementation of a client framework for consuming VStream.

Goals

Provide a simple, production ready, semi-opinionated way to consume vstream without needing deep expertise
Translate vstream events back into Go structs
Be a starting point for anyone interested in implementing vstream, like a custom data warehouse connector
Have pervasive comments / documentation about edge cases, gotchas, and good patterns

Non-Goals

Require any special server or access to Vitess internals
Be necessary to use vstream
Target any specific data warehouse or use case

One way to think about this is like an pluggable Materialize stream. In fact, that was one route I had considered for connecting to a data warehouse - to materialize rows into an unsharded MySQL instance, then expose that replication stream to one of the aforementioned CDC consumers. That still didn't solve the issue of added cost and overhead, while also requiring extra MySQL infrastructure to write rows. Instead of Materialize writing rows to the database, the idea here is to use a VStream framework to convert binlogs into the same Go structs that represent the db rows, but that can be programmatically exported anywhere.

Open Questions

Where should this live?
- The current examples directory feels too limiting
- https://github.com/vitessio/contrib feels abandoned, and is bad for discovery
- https://github.com/vitessio/vitess/tree/main/go/vt/vstreamclient seems the best to me, but also isn't quite the same as the other packages
How should state be managed?
I think state should be stored in Vitess itself, using a user-supplied keyspace / table. Since this isn't supposed to be privileged, it would live with their tables, not in _vt

Implementation

I have already built a working implementation. I started just fleshing out the local vstream example, but it quickly ballooned in complexity past what should be expected in an example. Usage is as shown in the tests. Feedback very appreciated.

vstreamclient: framework for robust + simple usage #17222

Related Resources

cc @mattlord

The text was updated successfully, but these errors were encountered:

derekperkins added Component: VReplication Type: RFC Request For Comment labels Nov 13, 2024

This was referenced Nov 13, 2024

vstreamclient: framework for robust + simple usage #17222

Draft

flesh out vstream client example #17185

Closed

shlomi-noach assigned mattlord and rohit-nayak-ps Nov 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: VStream client reference implementation #17221

RFC: VStream client reference implementation #17221

derekperkins commented Nov 13, 2024 •

edited

Loading

RFC: VStream client reference implementation #17221

RFC: VStream client reference implementation #17221

Comments

derekperkins commented Nov 13, 2024 • edited Loading

VStream: the hidden superpower

Use Cases

Solution

Goals

Non-Goals

Open Questions

Implementation

Related Resources

derekperkins commented Nov 13, 2024 •

edited

Loading