Sink does not support update operation #16

haaawk · 2020-06-12T05:38:15Z

It would be highly useful to support not only insert and delete but also update.
That would allow live migrations using Change Data Capture as a source of data.

haaawk · 2020-06-12T06:12:28Z

It seems that headers in kafka message could be use to distinguish inserts and updates. That would also allow implementation of other operations like partition deletion or range deletion.

It seems that we could also implement TTL per column using headers.

mailmahee · 2020-07-06T19:42:49Z

Is this related to Counters @haaawk - what is the difference between an insert and update from a scylla POV?

haaawk · 2020-07-06T19:53:12Z

It's related to all types - not only counters.

Difference between inserts and updates in Scylla is described here -> https://stackoverflow.com/questions/17348558/does-an-update-become-an-implied-insert/60075479#60075479

The problem also is that sink connector does not support unset values. It sets unset columns to null.

For example, when there is a row

pk |  ck |     v1 | v2
 1 |   2 |  test1 | test2

and you perform insert into ks.tb(pk, ck, v1) values (1, 2, test3) using sink connector then the result will be

pk |  ck |     v1  | v2
 1 |   2 |   test3 | <null>

and it should be:

pk |  ck |     v1  | v2
 1 |   2 |   test3 | test2

This means it is impossible to represent update ks.tb set v1 = 'test3' where pk = 1 and ck = 2 as an insert in the sink connector. Even if we wanted to.

mailmahee · 2020-07-06T22:23:03Z

So it needs to do a read before write and update all the values that are supplied and retain the values that were not supplied? I am assuming this is what will be useful for CDC?

haaawk · 2020-07-07T05:21:03Z

No. Reading before writing would be wasteful and non-performant. It just needs to support updates and unset values. In CQL operation each column can be in one of three states: have value, be null, or not be set at all. Sink connector supports only the first two.

haaawk · 2020-07-07T05:23:53Z

Another thing is that even reading before writing won't cut it because it would screw up timestamps of cells. Single CQL operation can have only one timestamp for all affected columns so you can't have unmodified columns with an old timestamp and updated columns with a new timestamp.

mailmahee · 2020-07-07T10:34:04Z

ok - @haaawk is it possible to give me an example on how to do a partial update to a row ? what would the CQL/prepared statement look like?

haaawk · 2020-07-07T12:02:58Z

You have examples above. Both
insert into ks.tb(pk, ck, v1) values (1, 2, test3) and
update ks.tb set v1 = 'test3' where pk = 1 and ck = 2
are setting only part of the row. Assuming the schema is:

create table ks.tb(
pk int,
ck int,
v1 text,
v2 text,
primary key(pk, ck)
)

Bouncheck added the enhancement New feature or request label Aug 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sink does not support update operation #16

Sink does not support update operation #16

haaawk commented Jun 12, 2020

haaawk commented Jun 12, 2020

mailmahee commented Jul 6, 2020

haaawk commented Jul 6, 2020

mailmahee commented Jul 6, 2020 •

edited

Loading

haaawk commented Jul 7, 2020

haaawk commented Jul 7, 2020

mailmahee commented Jul 7, 2020

haaawk commented Jul 7, 2020 •

edited

Loading

Sink does not support update operation #16

Sink does not support update operation #16

Comments

haaawk commented Jun 12, 2020

haaawk commented Jun 12, 2020

mailmahee commented Jul 6, 2020

haaawk commented Jul 6, 2020

mailmahee commented Jul 6, 2020 • edited Loading

haaawk commented Jul 7, 2020

haaawk commented Jul 7, 2020

mailmahee commented Jul 7, 2020

haaawk commented Jul 7, 2020 • edited Loading

mailmahee commented Jul 6, 2020 •

edited

Loading

haaawk commented Jul 7, 2020 •

edited

Loading