Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the cache configurable by cpp API #950

Open
wants to merge 16 commits into
base: main
Choose a base branch
from
Open

Make the cache configurable by cpp API #950

wants to merge 16 commits into from

Conversation

mgautierfr
Copy link
Collaborator

This fix #946.

Contrary to what proposed in #946, this doesn't introduce a cache config structure but use simple method:

This PR probably have some small conflict with #949, I will resolve them as soon as a PR is merged.

@mgautierfr
Copy link
Collaborator Author

CI seems to be broken for other reasons than this PR.

include/zim/archive.h Outdated Show resolved Hide resolved
include/zim/archive.h Outdated Show resolved Hide resolved
include/zim/archive.h Outdated Show resolved Hide resolved
Comment on lines 565 to 573
/** Set the size of the dirent lookup cache.
*
* Contrarly to other `set_<foo>_cache_max_size`, this method is useless once
* the lookup cache is created.
* The lookup cache is created at first access to a entry in the archive.
* So this method must be called before any access to content (including metadata).
* It is best to call this method first, just after the archive creation.
*/
void set_dirent_lookup_cache_max_size(size_t new_size);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it better to through an exception if this method is called when it's too late?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure. Another option would be to add a new argument to Archive constructor and remove this function.

src/fileimpl.cpp Show resolved Hide resolved
test/archive.cpp Outdated Show resolved Hide resolved
test/archive.cpp Outdated Show resolved Hide resolved
src/fileimpl.cpp Outdated Show resolved Hide resolved
src/fileimpl.h Outdated Show resolved Hide resolved
Copy link
Collaborator

@veloman-yunkan veloman-yunkan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's have #949 merged first. I will review the rebased&fixed-up version of this PR again after conflict resolution.

@mgautierfr
Copy link
Collaborator Author

PR is now rebased&fixed-up and conflict are resolved.

@mgautierfr mgautierfr force-pushed the cache_config branch 2 times, most recently from fe9ee9d to 4a1881a Compare February 12, 2025 14:38
Copy link
Collaborator

@veloman-yunkan veloman-yunkan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All my feedback was restricted to the test code, which means that generally the PR is good to merge.

test/lrucache.cpp Outdated Show resolved Hide resolved
test/archive.cpp Outdated Show resolved Hide resolved
test/archive.cpp Outdated Show resolved Hide resolved
@mgautierfr
Copy link
Collaborator Author

Last push updates with your comments.
I have directly modified the commits to be ready to merge as soon as you agree (which I hope is soon)

Only test/lru_cache.cpp and test/archive.cpp files have been modified.

Comment on lines +307 to +314
if (REF_IT->isRedirect()) { \
ASSERT_EQ(REF_IT->getRedirectEntryIndex(), test_entry.getRedirectEntryIndex()); \
} \
auto ref_item = REF_IT->getItem(true); \
auto test_item = test_entry.getItem(true); \
ASSERT_EQ(ref_item.getClusterIndex(), test_item.getClusterIndex()); \
ASSERT_EQ(ref_item.getBlobIndex(), test_item.getBlobIndex()); \
ASSERT_EQ(ref_item.getData(), test_item.getData()); \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the item equivalence testing be in an else branch of the if statement? Shouldn't your tests fail otherwise? Why don't they fail then?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it's because the test archive doesn't contain redirect entries.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

´getItem(true)´ resolve redirect(s) and return the final item.
We load the data even if the entry is a redirect to force loading of all cluster.
Else the test fails with withns category at line 366 because only one cluster is present.

test/archive.cpp Outdated
Comment on lines 364 to 376
ASSERT_ARCHIVE_EQUIVALENT_IT_LIMIT(ref_it, range.end(), archive, 50)
EXPECT_EQ(archive.get_dirent_cache_current_size(), 30);
EXPECT_EQ(archive.get_cluster_cache_current_size(), 2); // Only 2 clusters in the file

// Reduce cache size
archive.set_dirent_cache_max_size(10);
archive.set_cluster_cache_max_size(1);

EXPECT_EQ(archive.get_dirent_cache_current_size(), 10);
EXPECT_EQ(archive.get_cluster_cache_current_size(), 1);

ASSERT_ARCHIVE_EQUIVALENT_IT_LIMIT(ref_it, range.end(), archive, 50)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fragment of the test scenario is somewhat obscure and may be misleading. The ASSERT_ARCHIVE_EQUIVALENT_IT_LIMIT() macro has a side effect - it advances its iterator argument. Therefore the second invocation of that macro on the same iterator object continues the iteration (or performs no iteration at all if the end has already been reached before that) and it's not obvious if that is intentional or not. Please add comments explaining what is being tested or change the test accordingly.

Copy link
Collaborator Author

@mgautierfr mgautierfr Feb 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is intentional, we want to change the cache size as we are iterating on it.
I will add a comment.

Comment on lines +380 to +389
// Clean cache
// (More than testing the value, this is needed as we want to be sure the cache is actually populated later)
archive.set_dirent_cache_max_size(0);
archive.set_cluster_cache_max_size(0);

EXPECT_EQ(archive.get_dirent_cache_current_size(), 0);
EXPECT_EQ(archive.get_cluster_cache_current_size(), 0);

// Increase the cache
Copy link
Collaborator

@veloman-yunkan veloman-yunkan Feb 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With an ASSERT_ARCHIVE_EQUIVALENT() easy/concise check at your service, doesn't it make sense to check that things still work after decreasing the cache size to 0?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of this test is to test than changing the cache size has an effect. Not that it doesn't change the returned value. ASSERT_ARCHIVE_EQUIVALENT is "just" a nice to have to be sure, but not the main purpose of the test.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of this test is to test than changing the cache size has an effect.

That's prone to misinterpretation and I also wanted to ask if the commit's description was wrong. Yes, we want to be sure that setting the cache size actually changes the cache size, but I think that we also want to test that changing the cache size on a live archive doesn't break its operation (i.e. in that sense, it has no observable effect other than on performance).

test/archive.cpp Outdated Show resolved Hide resolved
@mgautierfr mgautierfr force-pushed the cache_config branch 2 times, most recently from 55616af to f3ea99f Compare February 17, 2025 15:59
@mgautierfr
Copy link
Collaborator Author

mgautierfr commented Feb 17, 2025

I have added a small comment about no reset the ref_it.
Please note the test will kind of become obsolete with PR #956 which introduce a global cluster cache. So I don't it worth have a perfect test here.

This make fail the test as it appears we create the dirent lookup cache
at initialization.
This will be fixed in next commits.
There is no reason this setting is done while reading the mimetype.
Let's use a temp dirent lookup (without cache) to access
titleOrdered v1 and 'C' namespace.
Reading and writing in the same time to a unique_ptr is not thread safe.
We introduce a atomicBool to know if the unique_ptr has been initialized.

Fix #945
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make cache configuration available from cpp api.
2 participants