Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sink does not support update operation #16

Open
haaawk opened this issue Jun 12, 2020 · 8 comments
Open

Sink does not support update operation #16

haaawk opened this issue Jun 12, 2020 · 8 comments
Labels
enhancement New feature or request

Comments

@haaawk
Copy link

haaawk commented Jun 12, 2020

It would be highly useful to support not only insert and delete but also update.
That would allow live migrations using Change Data Capture as a source of data.

@haaawk
Copy link
Author

haaawk commented Jun 12, 2020

It seems that headers in kafka message could be use to distinguish inserts and updates. That would also allow implementation of other operations like partition deletion or range deletion.

It seems that we could also implement TTL per column using headers.

@mailmahee
Copy link
Contributor

Is this related to Counters @haaawk - what is the difference between an insert and update from a scylla POV?

@haaawk
Copy link
Author

haaawk commented Jul 6, 2020

It's related to all types - not only counters.

Difference between inserts and updates in Scylla is described here -> https://stackoverflow.com/questions/17348558/does-an-update-become-an-implied-insert/60075479#60075479

The problem also is that sink connector does not support unset values. It sets unset columns to null.

For example, when there is a row

pk |  ck |     v1 | v2
 1 |   2 |  test1 | test2

and you perform insert into ks.tb(pk, ck, v1) values (1, 2, test3) using sink connector then the result will be

pk |  ck |     v1  | v2
 1 |   2 |   test3 | <null>

and it should be:

pk |  ck |     v1  | v2
 1 |   2 |   test3 | test2

This means it is impossible to represent update ks.tb set v1 = 'test3' where pk = 1 and ck = 2 as an insert in the sink connector. Even if we wanted to.

@mailmahee
Copy link
Contributor

mailmahee commented Jul 6, 2020

So it needs to do a read before write and update all the values that are supplied and retain the values that were not supplied? I am assuming this is what will be useful for CDC?

@haaawk
Copy link
Author

haaawk commented Jul 7, 2020

No. Reading before writing would be wasteful and non-performant. It just needs to support updates and unset values. In CQL operation each column can be in one of three states: have value, be null, or not be set at all. Sink connector supports only the first two.

@haaawk
Copy link
Author

haaawk commented Jul 7, 2020

Another thing is that even reading before writing won't cut it because it would screw up timestamps of cells. Single CQL operation can have only one timestamp for all affected columns so you can't have unmodified columns with an old timestamp and updated columns with a new timestamp.

@mailmahee
Copy link
Contributor

ok - @haaawk is it possible to give me an example on how to do a partial update to a row ? what would the CQL/prepared statement look like?

@haaawk
Copy link
Author

haaawk commented Jul 7, 2020

You have examples above. Both
insert into ks.tb(pk, ck, v1) values (1, 2, test3) and
update ks.tb set v1 = 'test3' where pk = 1 and ck = 2
are setting only part of the row. Assuming the schema is:

create table ks.tb(
pk int,
ck int,
v1 text,
v2 text,
primary key(pk, ck)
)

@Bouncheck Bouncheck added the enhancement New feature or request label Aug 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants