Avoid relying on barrier read in executor #13687

hzxa21 · 2023-11-28T07:05:58Z

As mentioned in #12393, we realize that barrier and checkpoint decoupling is orthogonal to enable read on non-ckpt barrier. Storage can enable try_flush instead of force_flush on high-frequency non-checkpoint barrier to reduce the chance of OOM while avoid producing too many small immutable memtables if read on non-ckpt barrier is not allowed.

The text was updated successfully, but these errors were encountered:

hzxa21 · 2023-11-28T07:10:54Z

IIUC, arrangement backfill and log store is currently relying on non-ckpt barrier read to work correctly if checkpoint frequency > 1. @wenym1 @kwannoel @chenzl25 Can you help to confirm? Is it possible to avoid reading on non-ckpt barrier read in these cases?

kwannoel · 2023-11-28T07:20:59Z

IIUC, arrangement backfill and log store is currently relying on non-ckpt barrier read to work correctly if checkpoint frequency > 1. @wenym1 @kwannoel @chenzl25 Can you help to confirm? Is it possible to avoid reading on non-ckpt barrier read in these cases?

For arrangement backfill, on every barrier (checkpoint or not), it will indeed still do a snapshot read.

We could delay read to every checkpoint barrier, but that could lead to checkpoint barrier taking a long time, if there's lots of data between 2 checkpoint barriers, and we also need to buffer updates between those two barriers.

Could you elaborate more on why we should "avoid reading on non-ckpt barrier"?

kwannoel · 2023-11-28T07:22:54Z

Storage can enable try_flush instead of force_flush on high-frequency non-checkpoint barrier to reduce the chance of OOM while avoid producing too many small immutable memtables if read on non-ckpt barrier is not allowed.

This seems to suggest if we have any non-ckpt read, then we cannot have the stated optimization. Is my understanding correct?

kwannoel · 2023-11-28T07:24:12Z

no_shuffle_backfill and arrangement_backfill both do snapshot read whenever there's a new barrier (regardless of checkpoint).

hzxa21 · 2023-11-28T07:41:10Z

Storage can enable try_flush instead of force_flush on high-frequency non-checkpoint barrier to reduce the chance of OOM while avoid producing too many small immutable memtables if read on non-ckpt barrier is not allowed.

This seems to suggest if we have any non-ckpt read, then we cannot have the stated optimization. Is my understanding correct?

Correct

Could you elaborate more on why we should "avoid reading on non-ckpt barrier"?

If we still need to support reading on non-ckpt barrier, executor needs to force flush its memtable to storage, which will result in many small immutable memtables. If the number of memtables are too big, it will hurt read performance and that is why we introduce extra logics to merge these memtables in addition to memtable spill and SST compaction. That being said, if we want to enjoy the benefit of spilling out states on non-ckpt barrier as well as maintaining the non-ckpt barrier read capability, we need complicated logic to make the read performance of non-ckpt barrier read performance stable.

There are 3 options here:

Disallow non-ckpt barrier at all. Meaning that ckpt frequency can only be 1.
Do force flush on each barrier and enable non-ckpt barrier read optimizations (i.e. merge small memtables). This is the current main's implementation
Do force flush on each barrier and disable non-ckpt barrier read optimizations. This maintains the non-ckpt barrier read capability but provides no guarantees on its performance.
Do try flush on each barrier and do not support non-ckpt barrier read at all.

Given that the usage of barrier read for batch query is narrow and only our executors rely on non-ckpt barrier read, if it is not hard to modify our executors, my preference is 4 > 3 > 2 > 1

chenzl25 · 2023-11-28T08:33:30Z

Besides backfill, non-ckpt read is used by DeltaJoin and TemporalJoin right now. DeltaJoin needs to read current epoch and the previous epoch data which means that it must contain a non-ckpt barrier read. If we don't support non-ckpt barrier read anymore, I think we should deprecate DeltaJoin as well.

As for backfill, removing non-ckpt barrier read seems easy, we just need to buffer upstream data between 2 checkpoint barriers and don't need to handle barrier alignment required by joins.

wenym1 · 2023-11-29T04:07:51Z

For log store, I can try let log reader to read the same local state store as the log writer. The drawback is that we will need to have an async mutex on the shared local state store. I will investigate on the potential of doing it.

kwannoel · 2023-11-29T04:29:35Z

Besides backfill, non-ckpt read is used by DeltaJoin and TemporalJoin right now. DeltaJoin needs to read current epoch and the previous epoch data which means that it must contain a non-ckpt barrier read. If we don't support non-ckpt barrier read anymore, I think we should deprecate DeltaJoin as well.

As for backfill, removing non-ckpt barrier read seems easy, we just need to buffer upstream data between 2 checkpoint barriers and don't need to handle barrier alignment required by joins.

For ckpt read, the data must have been committed. So it seems more buffer until previous checkpoint barrier complete, which could extend past 2 checkpoint barriers.

This means we need to either:

Let state store notify us when ckpt complete, and discard the buffer
OR on ckpt, wait for epoch to be committed before reading from the snapshot.

kwannoel · 2023-11-29T06:07:09Z

This means we need to either:
Let state store notify us when ckpt complete, and discard the buffer
OR on ckpt, wait for epoch to be committed before reading from the snapshot.

@chenzl25 @hzxa21 will there be any issues for 2.? That's the approach I would prefer. Could checkpoint take a long time to complete?

hzxa21 · 2023-11-29T07:07:23Z

Besides backfill, non-ckpt read is used by DeltaJoin and TemporalJoin right now. DeltaJoin needs to read current epoch and the previous epoch data which means that it must contain a non-ckpt barrier read. If we don't support non-ckpt barrier read anymore, I think we should deprecate DeltaJoin as well.

As for backfill, removing non-ckpt barrier read seems easy, we just need to buffer upstream data between 2 checkpoint barriers and don't need to handle barrier alignment required by joins.

That is true. It seems that there are many places to change and deprecate. How about only deprecating barrier read for batch query and still provide barrier read for internal usage? We can simply choose

Do force flush on each barrier and disable non-ckpt barrier read optimizations. This maintains the non-ckpt barrier read capability but provides no guarantees on its performance.

because in most cases decoupling will be disabled with checkpoint_frequency=1

wcy-fdu · 2023-11-29T07:29:57Z

3. disable non-ckpt barrier read optimizations

Is there any other kind of optimization besides merge imm?

hzxa21 · 2023-12-04T07:20:45Z

disable non-ckpt barrier read optimizations

Is there any other kind of optimization besides merge imm?

That is the only one I can think of.

github-actions · 2024-06-12T08:59:19Z

This issue has been open for 60 days with no activity. Could you please update the status? Feel free to continue discussion or close as not planned.

hzxa21 mentioned this issue Nov 28, 2023

Enable high frequency non-checkpoint barrier without barrier read #12393

Closed

3 tasks

github-actions bot added this to the release-1.5 milestone Nov 28, 2023

hzxa21 changed the title ~~Code cleanup for barrier read~~ Avoid relying on barrier read in executor Nov 28, 2023

wcy-fdu self-assigned this Nov 29, 2023

fuyufjh modified the milestones: release-1.5, release-1.6 Dec 4, 2023

wcy-fdu modified the milestones: release-1.6, release-1.7 Jan 10, 2024

wcy-fdu removed this from the release-1.7 milestone Mar 6, 2024

github-actions bot added the no-issue-activity label Jun 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid relying on barrier read in executor #13687

Avoid relying on barrier read in executor #13687

hzxa21 commented Nov 28, 2023 •

edited

Loading

hzxa21 commented Nov 28, 2023

kwannoel commented Nov 28, 2023 •

edited

Loading

kwannoel commented Nov 28, 2023 •

edited

Loading

kwannoel commented Nov 28, 2023

hzxa21 commented Nov 28, 2023

chenzl25 commented Nov 28, 2023

wenym1 commented Nov 29, 2023

kwannoel commented Nov 29, 2023 •

edited

Loading

kwannoel commented Nov 29, 2023

hzxa21 commented Nov 29, 2023

wcy-fdu commented Nov 29, 2023

hzxa21 commented Dec 4, 2023

github-actions bot commented Jun 12, 2024

Avoid relying on barrier read in executor #13687

Avoid relying on barrier read in executor #13687

Comments

hzxa21 commented Nov 28, 2023 • edited Loading

hzxa21 commented Nov 28, 2023

kwannoel commented Nov 28, 2023 • edited Loading

kwannoel commented Nov 28, 2023 • edited Loading

kwannoel commented Nov 28, 2023

hzxa21 commented Nov 28, 2023

chenzl25 commented Nov 28, 2023

wenym1 commented Nov 29, 2023

kwannoel commented Nov 29, 2023 • edited Loading

kwannoel commented Nov 29, 2023

hzxa21 commented Nov 29, 2023

wcy-fdu commented Nov 29, 2023

hzxa21 commented Dec 4, 2023

github-actions bot commented Jun 12, 2024

hzxa21 commented Nov 28, 2023 •

edited

Loading

kwannoel commented Nov 28, 2023 •

edited

Loading

kwannoel commented Nov 28, 2023 •

edited

Loading

kwannoel commented Nov 29, 2023 •

edited

Loading