-
Notifications
You must be signed in to change notification settings - Fork 594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid relying on barrier read in executor #13687
Comments
For arrangement backfill, on every barrier (checkpoint or not), it will indeed still do a snapshot read. We could delay read to every checkpoint barrier, but that could lead to checkpoint barrier taking a long time, if there's lots of data between 2 checkpoint barriers, and we also need to buffer updates between those two barriers. Could you elaborate more on why we should "avoid reading on non-ckpt barrier"? |
This seems to suggest if we have any non-ckpt read, then we cannot have the stated optimization. Is my understanding correct? |
|
Correct
If we still need to support reading on non-ckpt barrier, executor needs to force flush its memtable to storage, which will result in many small immutable memtables. If the number of memtables are too big, it will hurt read performance and that is why we introduce extra logics to merge these memtables in addition to memtable spill and SST compaction. That being said, if we want to enjoy the benefit of spilling out states on non-ckpt barrier as well as maintaining the non-ckpt barrier read capability, we need complicated logic to make the read performance of non-ckpt barrier read performance stable. There are 3 options here:
Given that the usage of barrier read for batch query is narrow and only our executors rely on non-ckpt barrier read, if it is not hard to modify our executors, my preference is 4 > 3 > 2 > 1 |
Besides backfill, non-ckpt read is used by As for backfill, removing non-ckpt barrier read seems easy, we just need to buffer upstream data between 2 checkpoint barriers and don't need to handle barrier alignment required by joins. |
For log store, I can try let log reader to read the same local state store as the log writer. The drawback is that we will need to have an async mutex on the shared local state store. I will investigate on the potential of doing it. |
For ckpt read, the data must have been committed. So it seems more buffer until previous checkpoint barrier complete, which could extend past 2 checkpoint barriers. This means we need to either:
|
@chenzl25 @hzxa21 will there be any issues for 2.? That's the approach I would prefer. Could checkpoint take a long time to complete? |
That is true. It seems that there are many places to change and deprecate. How about only deprecating barrier read for batch query and still provide barrier read for internal usage? We can simply choose
because in most cases decoupling will be disabled with |
Is there any other kind of optimization besides merge imm? |
That is the only one I can think of. |
This issue has been open for 60 days with no activity. Could you please update the status? Feel free to continue discussion or close as not planned. |
As mentioned in #12393, we realize that barrier and checkpoint decoupling is orthogonal to enable read on non-ckpt barrier. Storage can enable
try_flush
instead offorce_flush
on high-frequency non-checkpoint barrier to reduce the chance of OOM while avoid producing too many small immutable memtables if read on non-ckpt barrier is not allowed.The text was updated successfully, but these errors were encountered: