Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid relying on barrier read in executor #13687

Open
Tracked by #12393
hzxa21 opened this issue Nov 28, 2023 · 13 comments
Open
Tracked by #12393

Avoid relying on barrier read in executor #13687

hzxa21 opened this issue Nov 28, 2023 · 13 comments
Assignees

Comments

@hzxa21
Copy link
Collaborator

hzxa21 commented Nov 28, 2023

As mentioned in #12393, we realize that barrier and checkpoint decoupling is orthogonal to enable read on non-ckpt barrier. Storage can enable try_flush instead of force_flush on high-frequency non-checkpoint barrier to reduce the chance of OOM while avoid producing too many small immutable memtables if read on non-ckpt barrier is not allowed.

@github-actions github-actions bot added this to the release-1.5 milestone Nov 28, 2023
@hzxa21 hzxa21 changed the title Code cleanup for barrier read Avoid relying on barrier read in executor Nov 28, 2023
@hzxa21
Copy link
Collaborator Author

hzxa21 commented Nov 28, 2023

IIUC, arrangement backfill and log store is currently relying on non-ckpt barrier read to work correctly if checkpoint frequency > 1. @wenym1 @kwannoel @chenzl25 Can you help to confirm? Is it possible to avoid reading on non-ckpt barrier read in these cases?

@kwannoel
Copy link
Contributor

kwannoel commented Nov 28, 2023

IIUC, arrangement backfill and log store is currently relying on non-ckpt barrier read to work correctly if checkpoint frequency > 1. @wenym1 @kwannoel @chenzl25 Can you help to confirm? Is it possible to avoid reading on non-ckpt barrier read in these cases?

For arrangement backfill, on every barrier (checkpoint or not), it will indeed still do a snapshot read.

We could delay read to every checkpoint barrier, but that could lead to checkpoint barrier taking a long time, if there's lots of data between 2 checkpoint barriers, and we also need to buffer updates between those two barriers.

Could you elaborate more on why we should "avoid reading on non-ckpt barrier"?

@kwannoel
Copy link
Contributor

kwannoel commented Nov 28, 2023

Storage can enable try_flush instead of force_flush on high-frequency non-checkpoint barrier to reduce the chance of OOM while avoid producing too many small immutable memtables if read on non-ckpt barrier is not allowed.

This seems to suggest if we have any non-ckpt read, then we cannot have the stated optimization. Is my understanding correct?

@kwannoel
Copy link
Contributor

no_shuffle_backfill and arrangement_backfill both do snapshot read whenever there's a new barrier (regardless of checkpoint).

@hzxa21
Copy link
Collaborator Author

hzxa21 commented Nov 28, 2023

Storage can enable try_flush instead of force_flush on high-frequency non-checkpoint barrier to reduce the chance of OOM while avoid producing too many small immutable memtables if read on non-ckpt barrier is not allowed.

This seems to suggest if we have any non-ckpt read, then we cannot have the stated optimization. Is my understanding correct?

Correct

Could you elaborate more on why we should "avoid reading on non-ckpt barrier"?

If we still need to support reading on non-ckpt barrier, executor needs to force flush its memtable to storage, which will result in many small immutable memtables. If the number of memtables are too big, it will hurt read performance and that is why we introduce extra logics to merge these memtables in addition to memtable spill and SST compaction. That being said, if we want to enjoy the benefit of spilling out states on non-ckpt barrier as well as maintaining the non-ckpt barrier read capability, we need complicated logic to make the read performance of non-ckpt barrier read performance stable.

There are 3 options here:

  1. Disallow non-ckpt barrier at all. Meaning that ckpt frequency can only be 1.
  2. Do force flush on each barrier and enable non-ckpt barrier read optimizations (i.e. merge small memtables). This is the current main's implementation
  3. Do force flush on each barrier and disable non-ckpt barrier read optimizations. This maintains the non-ckpt barrier read capability but provides no guarantees on its performance.
  4. Do try flush on each barrier and do not support non-ckpt barrier read at all.

Given that the usage of barrier read for batch query is narrow and only our executors rely on non-ckpt barrier read, if it is not hard to modify our executors, my preference is 4 > 3 > 2 > 1

@chenzl25
Copy link
Contributor

Besides backfill, non-ckpt read is used by DeltaJoin and TemporalJoin right now. DeltaJoin needs to read current epoch and the previous epoch data which means that it must contain a non-ckpt barrier read. If we don't support non-ckpt barrier read anymore, I think we should deprecate DeltaJoin as well.

As for backfill, removing non-ckpt barrier read seems easy, we just need to buffer upstream data between 2 checkpoint barriers and don't need to handle barrier alignment required by joins.

@wenym1
Copy link
Contributor

wenym1 commented Nov 29, 2023

For log store, I can try let log reader to read the same local state store as the log writer. The drawback is that we will need to have an async mutex on the shared local state store. I will investigate on the potential of doing it.

@kwannoel
Copy link
Contributor

kwannoel commented Nov 29, 2023

Besides backfill, non-ckpt read is used by DeltaJoin and TemporalJoin right now. DeltaJoin needs to read current epoch and the previous epoch data which means that it must contain a non-ckpt barrier read. If we don't support non-ckpt barrier read anymore, I think we should deprecate DeltaJoin as well.

As for backfill, removing non-ckpt barrier read seems easy, we just need to buffer upstream data between 2 checkpoint barriers and don't need to handle barrier alignment required by joins.

For ckpt read, the data must have been committed. So it seems more buffer until previous checkpoint barrier complete, which could extend past 2 checkpoint barriers.

This means we need to either:

  1. Let state store notify us when ckpt complete, and discard the buffer
  2. OR on ckpt, wait for epoch to be committed before reading from the snapshot.

@kwannoel
Copy link
Contributor

This means we need to either:

  1. Let state store notify us when ckpt complete, and discard the buffer
    
  2. OR on ckpt, wait for epoch to be committed before reading from the snapshot.

@chenzl25 @hzxa21 will there be any issues for 2.? That's the approach I would prefer. Could checkpoint take a long time to complete?

@wcy-fdu wcy-fdu self-assigned this Nov 29, 2023
@hzxa21
Copy link
Collaborator Author

hzxa21 commented Nov 29, 2023

Besides backfill, non-ckpt read is used by DeltaJoin and TemporalJoin right now. DeltaJoin needs to read current epoch and the previous epoch data which means that it must contain a non-ckpt barrier read. If we don't support non-ckpt barrier read anymore, I think we should deprecate DeltaJoin as well.

As for backfill, removing non-ckpt barrier read seems easy, we just need to buffer upstream data between 2 checkpoint barriers and don't need to handle barrier alignment required by joins.

That is true. It seems that there are many places to change and deprecate. How about only deprecating barrier read for batch query and still provide barrier read for internal usage? We can simply choose

  1. Do force flush on each barrier and disable non-ckpt barrier read optimizations. This maintains the non-ckpt barrier read capability but provides no guarantees on its performance.

because in most cases decoupling will be disabled with checkpoint_frequency=1

@wcy-fdu
Copy link
Contributor

wcy-fdu commented Nov 29, 2023

3. disable non-ckpt barrier read optimizations

Is there any other kind of optimization besides merge imm?

@fuyufjh fuyufjh modified the milestones: release-1.5, release-1.6 Dec 4, 2023
@hzxa21
Copy link
Collaborator Author

hzxa21 commented Dec 4, 2023

  1. disable non-ckpt barrier read optimizations

Is there any other kind of optimization besides merge imm?

That is the only one I can think of.

@wcy-fdu wcy-fdu modified the milestones: release-1.6, release-1.7 Jan 10, 2024
@wcy-fdu wcy-fdu removed this from the release-1.7 milestone Mar 6, 2024
Copy link
Contributor

This issue has been open for 60 days with no activity. Could you please update the status? Feel free to continue discussion or close as not planned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants