-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(cloud): Add File Cache Consistency Check (#41280) #43440
base: master
Are you sure you want to change the base?
Conversation
Thank you for your contribution to Apache Doris. Since 2024-03-18, the Document has been moved to doris-website. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
@@ -588,6 +588,114 @@ void FSFileCacheStorage::load_cache_info_into_memory(BlockFileCache* _mgr) const | |||
TEST_SYNC_POINT_CALLBACK("BlockFileCache::TmpFile2"); | |||
} | |||
|
|||
void FSFileCacheStorage::check_consistency( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: function 'check_consistency' has cognitive complexity of 59 (threshold 50) [readability-function-cognitive-complexity]
void FSFileCacheStorage::check_consistency(
^
Additional context
be/src/io/cache/fs_file_cache_storage.cpp:596: nesting level increased to 1
auto check = [_mgr, &cache_lock, &confirmed_blocks, &inconsistent_cache_context,
^
be/src/io/cache/fs_file_cache_storage.cpp:598: +2, including nesting penalty of 1, nesting level increased to 2
for (; key_it != std::filesystem::directory_iterator(); ++key_it) {
^
be/src/io/cache/fs_file_cache_storage.cpp:607: +3, including nesting penalty of 2, nesting level increased to 3
if (ec) [[unlikely]] {
^
be/src/io/cache/fs_file_cache_storage.cpp:613: +3, including nesting penalty of 2, nesting level increased to 3
for (; offset_it != std::filesystem::directory_iterator(); ++offset_it) {
^
be/src/io/cache/fs_file_cache_storage.cpp:618: +4, including nesting penalty of 3, nesting level increased to 4
if (!this->parse_filename_suffix_to_cache_type(
^
be/src/io/cache/fs_file_cache_storage.cpp:627: +4, including nesting penalty of 3, nesting level increased to 4
if (!cell || cell->is_deleted) {
^
be/src/io/cache/fs_file_cache_storage.cpp:627: +1
if (!cell || cell->is_deleted) {
^
be/src/io/cache/fs_file_cache_storage.cpp:636: +4, including nesting penalty of 3, nesting level increased to 4
? cell->dowloading_size()
^
be/src/io/cache/fs_file_cache_storage.cpp:635: +1
(is_tmp && cell->file_block->state() == FileBlock::State::DOWNLOADING)
^
be/src/io/cache/fs_file_cache_storage.cpp:639: +4, including nesting penalty of 3, nesting level increased to 4
if (size != expected_size) {
^
be/src/io/cache/fs_file_cache_storage.cpp:643: +4, including nesting penalty of 3, nesting level increased to 4
if (cache_type != cell->file_block->cache_type()) {
^
be/src/io/cache/fs_file_cache_storage.cpp:647: +4, including nesting penalty of 3, nesting level increased to 4
if (expiration_time != cell->file_block->expiration_time()) {
^
be/src/io/cache/fs_file_cache_storage.cpp:652: +4, including nesting penalty of 3, nesting level increased to 4
if (inconsistent_type != InconsistentCacheContext::InconsistentType::NONE) {
^
be/src/io/cache/fs_file_cache_storage.cpp:659: +1, including nesting penalty of 0, nesting level increased to 1
if constexpr (USE_CACHE_VERSION2) {
^
be/src/io/cache/fs_file_cache_storage.cpp:661: +2, including nesting penalty of 1, nesting level increased to 2
if (ec) {
^
be/src/io/cache/fs_file_cache_storage.cpp:665: +2, including nesting penalty of 1, nesting level increased to 2
for (; key_prefix_it != std::filesystem::directory_iterator(); ++key_prefix_it) {
^
be/src/io/cache/fs_file_cache_storage.cpp:666: +3, including nesting penalty of 2, nesting level increased to 3
if (!key_prefix_it->is_directory()) {
^
be/src/io/cache/fs_file_cache_storage.cpp:670: +3, including nesting penalty of 2, nesting level increased to 3
if (key_prefix_it->path().filename().native().size() != KEY_PREFIX_LENGTH) {
^
be/src/io/cache/fs_file_cache_storage.cpp:675: +4, including nesting penalty of 3, nesting level increased to 4
if (ec) {
^
be/src/io/cache/fs_file_cache_storage.cpp:682: +3, including nesting penalty of 2, nesting level increased to 3
if (ec) {
^
be/src/io/cache/fs_file_cache_storage.cpp:688: +1, nesting level increased to 1
} else {
^
be/src/io/cache/fs_file_cache_storage.cpp:690: +2, including nesting penalty of 1, nesting level increased to 2
if (ec) {
^
@@ -588,6 +588,114 @@ | |||
TEST_SYNC_POINT_CALLBACK("BlockFileCache::TmpFile2"); | |||
} | |||
|
|||
void FSFileCacheStorage::check_consistency( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: function 'check_consistency' exceeds recommended size/complexity thresholds [readability-function-size]
void FSFileCacheStorage::check_consistency(
^
Additional context
be/src/io/cache/fs_file_cache_storage.cpp:590: 102 lines including whitespace and comments (threshold 80)
void FSFileCacheStorage::check_consistency(
^
@@ -21,6 +21,7 @@ | |||
#include <gen_cpp/Types_types.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: 'gen_cpp/Types_types.h' file not found [clang-diagnostic-error]
#include <gen_cpp/Types_types.h>
^
@@ -5284,6 +5285,145 @@ | |||
} | |||
} | |||
|
|||
TEST_F(BlockFileCacheTest, check_fs_file_cache_consistency) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: function 'TEST_F' exceeds recommended size/complexity thresholds [readability-function-size]
^
Additional context
be/test/io/cache/block_file_cache_test.cpp:5287: 137 lines including whitespace and comments (threshold 80)
^
@@ -102,6 +102,12 @@ class FSFileCacheStorage : public FileCacheStorage { | |||
|
|||
void load_cache_info_into_memory(BlockFileCache* _mgr) const; | |||
|
|||
void check_consistency( | |||
BlockFileCache* _mgr, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better not couple this interface with FileCacheManager. The method of checking here is that the storage can return a list with comprehensive "file" information (including path, size, creation time, etc.), preferably in an ordered manner. Let the FileCacheManager, which depends on this storage, perform its own forward and reverse checks to see if there is any extra or missing data in the storage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the correction, I will make the modifications promptly.
run buildall |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
greaaat work! thanks for your contribution!
@@ -50,6 +50,8 @@ constexpr static std::string_view CLEAR = "clear"; | |||
constexpr static std::string_view RESET = "reset"; | |||
constexpr static std::string_view HASH = "hash"; | |||
constexpr static std::string_view LIST_CACHE = "list_cache"; | |||
constexpr static std::string_view VIEW = "view"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
list_base_paths might be more explicit
*json_metrics = json_array.ToString(); | ||
} else if (operation == CHECK) { | ||
const std::string& cache_base_path = req->param(BASE_PATH.data()); | ||
auto* block_file_cache = io::FileCacheFactory::instance()->get_by_path(cache_base_path); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
handle the error if the path is invalid. otherwise this will crash into nullptr.
be/src/io/cache/block_file_cache.cpp
Outdated
std::vector<std::string> BlockFileCache::check_file_cache_consistency() { | ||
std::unordered_set<AccessKeyAndOffset, KeyAndOffsetHash> confirmed_blocks; | ||
std::vector<InconsistentCacheContext> inconsistent_cache_context; | ||
std::lock_guard<std::mutex> lock(_mutex); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
naming advice: lock -> cache_lock for efficient code search.
be/src/io/cache/file_cache_common.h
Outdated
uint32_t type; | ||
|
||
public: | ||
enum : uint32_t { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very compact, that's nice...
be/src/io/cache/file_cache_common.h
Outdated
public: | ||
enum : uint32_t { | ||
NONE = 0, | ||
FILES_INCONSISTENT = 1 << 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the description for each type is a necessary here
be/src/io/cache/file_cache_common.h
Outdated
enum : uint32_t { | ||
NONE = 0, | ||
FILES_INCONSISTENT = 1 << 0, | ||
STORAGE_INCONSISTENT = 1 << 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"INCONSISTENT" is bidirectional. Might 'MISSING_IN_STORAGE' be more explicit?
<< "Expiration Time: " << expiration_time << "\n" | ||
<< "Offset: " << offset << "\n" | ||
<< "Cache Type: " << file_cache_type_to_string(cache_type) << "\n" | ||
<< "Inconsistent Type: " << inconsistent_type.to_string(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Reason:"
FileCacheType cache_type = FileCacheType::NORMAL; | ||
if (!this->parse_filename_suffix_to_cache_type( | ||
fs, offset_it->path().filename().native(), expiration_time, size, | ||
&offset, &is_tmp, &cache_type)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if is_tmp == true, the type will never be assigned (always be the default value FileCacheType::NORMAL
). check the internal of parse_filename_suffix_to_cache_type
for details.
maybe add TMP
as another 'type' in file_cache_type_to_string(be/src/io/cache/file_cache_common.cpp )
. As I see it, there is no counterpart in management structure in memory. Maybe we can deal with TMP like this:
if a tmp file with state 'DOWNLOADING', it's consistent. otherwise, inconsistent.
std::string offset_path = offset_it->path(); | ||
auto* cell = _mgr->get_cell(hash, offset, cache_lock); | ||
|
||
if (!cell || cell->is_deleted) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is_deleted works as a flag, the file is not deleted yet
continue; | ||
} | ||
if (key_prefix_it->path().filename().native().size() != KEY_PREFIX_LENGTH) { | ||
LOG(WARNING) << "Unknown directory " << key_prefix_it->path().native() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no deletion here. this is just a checking tool, don't touch anything!
check(key_it); | ||
} | ||
} else { | ||
std::filesystem::directory_iterator key_it {_cache_base_path, ec}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you sure this check works for both VERSION1 and VERSION2? if not, just warning here "only support version2"
be/src/io/cache/block_file_cache.cpp
Outdated
_storage->check_consistency(this, confirmed_blocks, inconsistent_cache_context, lock); | ||
for (const auto& [hash, offset_to_cell] : _files) { | ||
for (const auto& [offset, cell] : offset_to_cell) { | ||
if (cell.is_deleted || confirmed_blocks.contains({hash, offset})) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is_deleted works as a flag, the file is not deleted yet
be/src/io/cache/file_cache_common.h
Outdated
public: | ||
enum : uint32_t { | ||
NONE = 0, | ||
FILES_INCONSISTENT = 1 << 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may be 'NOT_LOADED'?
assert_range(2, blocks[0], io::FileBlock::Range(10, 18), io::FileBlock::State::DOWNLOADING); | ||
download(blocks[0]); | ||
mgr._files[key1].erase(10); | ||
mgr.check_file_cache_consistency(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unnecessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's for DEBUG, forgot to remove :D
std::string dir_path = fs_file_cache_storage->get_path_in_local_cache(key1, 0); | ||
fs::path block_file_path = std::filesystem::path(dir_path) / "20"; | ||
fs::remove(block_file_path); | ||
mgr.check_file_cache_consistency(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
std::ofstream out_file(block_file_path, std::ios::out | std::ios::app); | ||
out_file << data; | ||
out_file.close(); | ||
mgr.check_file_cache_consistency(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
assert_range(2, blocks[0], io::FileBlock::Range(40, 48), io::FileBlock::State::DOWNLOADING); | ||
download(blocks[0]); | ||
blocks[0]->_key.meta.type = io::FileCacheType::INDEX; | ||
mgr.check_file_cache_consistency(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
assert_range(2, blocks[0], io::FileBlock::Range(0, 8), io::FileBlock::State::DOWNLOADING); | ||
download(blocks[0]); | ||
blocks[0]->_key.meta.expiration_time = 0; | ||
mgr.check_file_cache_consistency(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
@@ -5284,6 +5285,155 @@ TEST_F(BlockFileCacheTest, file_cache_path_storage_parse) { | |||
} | |||
} | |||
|
|||
TEST_F(BlockFileCacheTest, check_fs_file_cache_consistency) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: function 'TEST_F' exceeds recommended size/complexity thresholds [readability-function-size]
^
Additional context
be/test/io/cache/block_file_cache_test.cpp:5287: 147 lines including whitespace and comments (threshold 80)
^
b2e2c54
to
590dffa
Compare
run buildall |
run buildall |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (cache_base_path.empty()) { | ||
st = Status::InvalidArgument("missing parameter: {} is required", VALUE.data()); | ||
} else { | ||
auto* block_file_cache = io::FileCacheFactory::instance()->get_by_path(cache_base_path); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (!block_file_cache) {
st = Status::InvalidArgument("invalid path: {}", VALUE.data());
return st;
}
run external |
Add a feature to verify the consistency of the file cache, checking whether the disk cache is under the control of the file cache management system.
run buildall |
TeamCity be ut coverage result: |
run buildall |
TeamCity be ut coverage result: |
run p0 |
run buildall |
TeamCity be ut coverage result: |
Add a feature to verify the consistency of the file cache, checking whether the disk cache is under the control of the file cache management system.
What problem does this PR solve?
Occasionally, we found that there have been cases of disk cache data escaping from the management of Doris file cache, causing disk space leaks. To make it easier for debugging, we need a checking tool that compares the contents in the Doris file cache memory management structure with the current disk contents to identify the differences between the two (which are potential problematic data).
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Release note
None
Check List (For Reviewer who merge this PR)