Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add support for asynchronous flushing in DiskManager::WriteLog() && improve overall structure of DiskManager #580

Open
wants to merge 14 commits into
base: master
Choose a base branch
from

Conversation

xzhseh
Copy link
Contributor

@xzhseh xzhseh commented Jun 21, 2023

This PR does four things in general:

  1. Fix the issue mutex: why is there no lock for the log I/O ? #449 by adding a log_io_latch_ to ensure logging operations' security under concurrency access.(Though the log-related functions are not used except in test at present🤪)
  2. Move the definitions of several functions from disk_manager.cpp to disk_manager.h to improve readability.
  3. Add has_shut_down_ flag and change the destructor implementation (and this also includes support for asynchronous flushing), to ensure everything goes well if the ShutDown() is not manually called by the programmer :)
  4. Add support for asynchronous flushing in DiskManager::WriteLog(), also make the semantic of flush_log_ be consistent across all stage.

Current problem (To be reviewed):
Just as I mentioned in the TODO section, the program will crash if the log flushing thread is not done after 10s, which can be a rare case, but this can also be handle more softly, like log a warning...etc

@xzhseh
Copy link
Contributor Author

xzhseh commented Jun 21, 2023

Also, I think the asynchronous version of the four functions WritePage, ReadPage, WriteLog, ReadLog can be of help if bustub future will consider in providing WAL related functionalities..🤤
(Actually I found WAL supports in the early versions of 15-445, like in 19 or 18 version, there actually was a project for students to complete, but now this part seems broken with new features coming in..? I think I can be of help to refactor the code related to recovery, and this also will not be conflicted with the possible upcoming MVCC version.)
If it's okay, I can write the asynchronous version of these functions🥰

@infdahai
Copy link

I think it's good to throw a warning before an exit.

@xzhseh
Copy link
Contributor Author

xzhseh commented Jun 22, 2023

I revisited the original code design and found it confusing for logging operations, so I redesigned some of the functions & variables related to Log.

  1. In the original design, it looks like the designer wants to separate the write and flush operation when dealing with fstream in logging, and that's where the first few commits of this PR comes from, to support asynchronous flushing. That's to say, performs sequential write and maintains multiple asynchronous threads flushing in the background at the same time. This sounds good and reasonable before considering the actual use of fstream.
  2. fstream is not thread-safe at all, which means not only multiple accesses to fstream::write at the same time will lead to race condition, but accesses to fstream::write and fstream::flush at the same time by different threads will lead to race condition(or other synchronization problems) as well.
  3. Thus, if the only fstream object log_io_ is accessed both in WriteLog() (by the main thread) and flushing asynchronously (by another background thread), the result is undefined, and this will inevitably happen if sticks to original design.
  4. To solve the above problems, I separate the WriteLog() to two version —— The first one will block until all the asynchronous operations are done, and then start to perform sequential write. The second one is the asynchronous version of WriteLog, also with a future deque to ensure the completion of each background task. The ReadLog() is not changed, since the log contents to be read should always be the one after all flushing.

If the above changes look good, I'll write test cases in disk_manager_test.cpp to ensure the correctness. (Also add comments to the newly added functions)

@xzhseh
Copy link
Contributor Author

xzhseh commented Jun 22, 2023

This part may also be associated with LogManager. If anything related to LogManager or the currently disabled recovery test should also be changed, I'm glad to help with the refactoring :)

@xzhseh
Copy link
Contributor Author

xzhseh commented Jun 23, 2023

And according to Discussion #516, I agree with the use of fsync() to achieve full recovery functionality in the future, but for now, since the std::fstream::flush will synchronize the stream buffer with the underlying OS buffer, the subsequent ReadLog() or the recovery should be good (If not under extreme situation).

@skyzh
Copy link
Member

skyzh commented Aug 29, 2023

Thanks for the PR but we are not writing any WALs currently in the system. The overall design looks good, but I'm actually in favor of dropping WALs for now until we have a concrete plan for the recovery project 🤣 We can keep this PR and see how it goes in the future.

@xzhseh
Copy link
Contributor Author

xzhseh commented Aug 29, 2023

Sure, let's see if the recovery project will be brought back in the future🤣
I'll focus on assisting & participating through the implementations (or refactoring) of other parts in bustub🥰

@prashanthduvvada prashanthduvvada added the deferred This issue is documented here and someone might want to fix it later, but it takes no priority label Aug 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
deferred This issue is documented here and someone might want to fix it later, but it takes no priority
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants