Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About cache limited by memory #947

Open
mgautierfr opened this issue Jan 20, 2025 · 6 comments
Open

About cache limited by memory #947

mgautierfr opened this issue Jan 20, 2025 · 6 comments

Comments

@mgautierfr
Copy link
Collaborator

Today, the internal cache system is limited by the number of item in the cache.
And this is a cache per zim file. So opening several zim files increase the limit of item saved.
On top of that, limiting per number of item is kind of useless for the user as we don't know the memory used by each item.

From a lot of discussion, it seems that we all agree that it would be better to limit the cache with a memory limit.

This issue tries to list all change that implies in libzim.

We have to compute the size of each item.

For dirents, it is the sum of:

  • a fixed size (dirent structure, depending of the arch)
  • a size known at initialization (depending of len of url and title)

For clusters, it is the sum of:

  • a fixed size (cluster structure, depending of the arch)
  • a size known at initialization (nb of offset, depending of the number of blobs in the cluster)
  • a variable size ((pre)allocated buffers to store uncompressed data)
  • a variable size (internal memory allocated by decompressor)

The variable size of the cluster means that used memory store in the cache may increase after we have added the items to the cache and without modification of the cache itself.

We have to create a global cache.

As we want to limit the cache memory size of the whole process, we want only one cache for the whole process.
(Or maybe only one limit shared by all caches)

There is two reunification to do:

  • Per zim file cache to one cache
  • Per data (dirent vs cluster) to one cache

So we have to extend the key from a u32 (entry_index_type or cluster_index_type) to a tuple (u8[16], u8, u32) (zim_uuid, type, index).
So the key size goes from 4 bytes to 21 bytes (likely 24 bytes with padding).
As we have a cache to avoid parsing a (16 bytes + title + url) size buffer and the cache key will have to be hashed and compared each time we want to access an item, we may want to investigate a bit here.

If, in the future, we implement a cache url->dirent (as suggested in #438), the key itself will have to change again

Cache eviction

There are three moments when we want to remove entries from the cache:

  • When we destroy a zim reader, we want to remove all its entries from the cache.
  • When we want to add an entry in it and the cache is full
  • When cache memory increase because user is reading from a cluster.

First point is pretty easy. Other points need us to define a algorithm to use to select which item to remove.
A basic lru cache eviction, removing items until we have free enough memory, may be enough. Or not.


Out of scope:

This issue is only about limiting the memory usage.
Our cache system also contains other caches whom are (mainly) causing other issues than memory.
FastDirentLookup and yet to implement xapian preloading (#617) are consuming time at zim creation. #946 may still be necessary to configure them.

This will of course not limit the memory usage of the whole process. Opening too many zims and reading from them means that we have to keep at least several cluster in memory (not necessarily in cache). On low memory device it can be a problem. This is not new and pretty rare. It is just that limiting the cache will not magically fix that.

@mgautierfr
Copy link
Collaborator Author

After discussion with @kelson42, @rgaudin and @benoit74, here what we agree on and what I propose.

We concentrate on cluster cache only for now (question about preloading cache configuration would be handled in #946). We assume that Dirent cache is pretty small in comparison to cluster cache and so we ignore it.

  • We create a global cluster cache that all zim readers will share.
  • We compute the memory footprint of each cluster [1] at runtime and we track total memory footprint of all clusters in cache.
  • We limit the cluster cache by used memory instead of number of stored clusters.
  • This cache is still a simple lru_cache, we drop the last used cluster(s) from the cache to free some memory.
  • When a reader is destructed, we remove all its clusters from the cache.

[1]
To compute this memory we use:

  • Size of the structure
  • Size of the offset array
  • Size of the compressed data
  • Size of the uncompressed data / 2 (we assume that, on average, all cluster are half uncompressed)
  • Size of the decompression context (either by providing our own allocation function which will track memory, or by using a fixed value after tracing/computing allocation made by decompression context)

The idea is to use a fixed size (at the time of we add the cluster to the cache) and not have to recompute its size when we simply use the cluster.

We change the configuration of cluster cache from number of cached cluster to "maximum" memory limit.
This limit will have a default value of 512MB. "User" will be able to change this value either by env variable or by cpp api when #946 will be done.

Please note that:

  • We will not compute and track the exact size of the clusters. But the average on a large number of clusters should be correct
  • Even if this tracked memory is not exact, we are anyway ignoring memory used by dirent and xapian databases. So in any cases, this limitation is more a hint that a strong limit.

@kelson42
Copy link
Contributor

* We create a global cluster cache that all zim readers will share.

OK to me but what seems to me mandatory is a global monitoring of memory consumption, not a global cache. If doing a global cache - which is "disconnected" of each ZIM reader' bring a lot of challenges then this should be considered carefully.

* We compute the memory footprint of each cluster [1] at runtime and we track total memory footprint of all clusters in cache.

* We limit the cluster cache by used memory instead of number of stored clusters.

* This cache is still a simple lru_cache, we drop the last used cluster(s) from the cache to free some memory.

* When a reader is destructed, we remove all its clusters from the cache.

OK to me if it delivers a reasonably functionning cache ie: (1) prevent efficiently memory exhaustion (2) brings the value we expect from a cache=delivering immediatly often requested content.

[1] To compute this memory we use:

* Size of the structure

* Size of the offset array

* Size of the compressed data

* Size of the uncompressed data / 2 (we assume that, on average, all cluster are half uncompressed)

* Size of the decompression context (either by providing our own allocation function which will track memory, or by using a fixed value after tracing/computing allocation made by decompression context)

The idea is to use a fixed size (at the time of we add the cluster to the cache) and not have to recompute its size when we simply use the cluster.

OK to me if it allow a proper following of the cache size. I can not judge what implied exactly what is written.

We change the configuration of cluster cache from number of cached cluster to "maximum" memory limit. This limit will have a default value of 512MB. "User" will be able to change this value either by env variable or by cpp api when #946 will be done.

No variable env. only API. This is feature only for the devs using the libzim.

Please note that:

* We will not compute and track the exact size of the clusters. But the average on a large number of clusters should be correct

This is 100% new to me, I would have expect the exact size of the cluster. What is here the worse case scenario? How many percent could be this approximation wrong?

@mgautierfr
Copy link
Collaborator Author

OK to me but what seems to me mandatory is a global monitoring of memory consumption, not a global cache. If doing a global cache - which is "disconnected" of each ZIM reader' bring a lot of challenges then this should be considered carefully.

This is the whole purpose of this issue and the main reason I didn't want to do it right now without thinking twice about it.
We need a global cache. A global monitoring with "local" cache is useless (or at least, it will produce absolutely wrong behavior)

This is 100% new to me, I would have expect the exact size of the cluster.

This is due to:

To compute this memory we use:

  • Size of the structure
  • Size of the offset array
  • Size of the compressed data
  • Size of the uncompressed data / 2 (we assume that, on average, all cluster are half uncompressed)
  • Size of the decompression context (either by providing our own allocation function which will track memory, or by using a fixed value after tracing/computing allocation made by decompression context)

What is here the worse case scenario?

The worse case scenario is the one when we always reading the last blob of all clusters in cache.

How many percent could be this approximation wrong?

Difficult to say right now but:
In this case, we have take half size on to compute the size in cache but we are storing full size. With a average of 2MB per cluster, we have an extra 1MB per cluster. If compressed data is roughly 10% of uncompressed data

1MB / (2MB * 1,1) => 45%
The real percent will be:
1MB / (2MB * 1.1 + structure size + memory allocated by decompression context) => less than 45%

memory allocated by decompression context is not a known value. It depends of the algorithm and the compression level used. But for zstd level 19 (our default), the window is of 8MB. So the memory allocated as at least 8MB.

So 1MB / (2MB *1.1 + structure size + 8MB + other memory decompression context) => less than 10%

@kelson42
Copy link
Contributor

@mgautierfr We need to stop talking about any memory needed to run the software. If we talk about the memory cache size, this is only the cache size, not the memory allocated to fill it. I don't understand your computation, but basically it seems that between the cache size measured/tracked and the real cache size used in memory, the error can be max 10%. Do I understand right?

@mgautierfr
Copy link
Collaborator Author

We need to stop talking about any memory needed to run the software. If we talk about the memory cache size, this is only the cache size, not the memory allocated to fill it.

I don't speak about the memory needed to run the software. I speak about the memory that we will store (indirectly or not) in the cache.
What is the memory cache size if this is not the the memory allocated that we store in it ?

I don't understand your computation, but basically it seems that between the cache size measured/tracked and the real cache size used in memory, the error can be max 10%. Do I understand right?

This value depends of the algorithm used to compress the cluster and the cluster size. They are decided at zim file creation so, at reading, we cannot guaranty anything.
The 10% is back-of-the-envelope calculation for our default algorithm : zstd level 19.
I cannot answer you if zim file is created with a different algorithm/level/cluster size.

@kelson42
Copy link
Contributor

kelson42 commented Jan 27, 2025

I don't understand your computation, but basically it seems that between the cache size measured/tracked and the real cache size used in memory, the error can be max 10%. Do I understand right?

This value depends of the algorithm used to compress the cluster and the cluster size. They are decided at zim file creation so, at reading, we cannot guaranty anything. The 10% is back-of-the-envelope calculation for our default algorithm : zstd level 19. I cannot answer you if zim file is created with a different algorithm/level/cluster size.

OK, but considering the ZIM file landscape we have out there... there is almost 0 chance that we are wrong than more than 10% which is acceptable to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants