Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a cache for samples #7497

Draft
wants to merge 27 commits into
base: master
Choose a base branch
from
Draft

Conversation

sakertooth
Copy link
Contributor

@sakertooth sakertooth commented Sep 14, 2024

Adds an in-memory cache for samples. Callers can fetch data from the cache using two overloads: one for audio files, and another for Base64 strings. The cache stores weak pointers to the samples and returns shared pointers to the callers. However, the cache in the future will probably need to store the weak pointers along with sample thumbnails and possibly other kinds of metadata for each sample, which should be fairly easy to do as all that needs to be done is to store a collection of the data we need in a struct/class instead of just the weak pointer to the buffer.

For audio files, we check the last write time for the file to see if it needs updating each time it is fetched, and update it if necessary. For Base64 strings, we just compare the contents.

SampleLoader loads audio data by querying the cache for it, rather than creating them manually. As such, the SampleLoader::create* functions where renamed to SampleLoader::load*.

The memory usage has dropped significantly when using duplicate samples (htop readings went from 28.5% to 3.6% for me with a project that had a number of sample clips, each ~3 minutes long), and projects load faster (There is still some delay because of the waveform drawing, but this should be addressed soon. The speedup is more noticeable when you zoom all the way out before loading a project as a result).

Should supersede #7058 I believe.

Copy link
Member

@messmerd messmerd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gave it a quick look-over.

Also,

  • Sample's move constructor and move assignment operator should be marked noexcept
  • Sample::s_interpolationMargins should be renamed to Sample::InterpolationMargins since it is public
  • sample_rate_t could be used instead of int for the sample rate (See this commit for what needed to be changed: 10162ec)

include/FileSystemHelpers.h Outdated Show resolved Hide resolved
include/SampleLoader.h Outdated Show resolved Hide resolved
include/SampleDatabase.h Outdated Show resolved Hide resolved
@sakertooth
Copy link
Contributor Author

sakertooth commented Nov 13, 2024

Sample's move constructor and move assignment operator should be marked noexcept
Sample::s_interpolationMargins should be renamed to Sample::InterpolationMargins since it is public
sample_rate_t could be used instead of int for the sample rate (See this commit for what needed to be changed: 10162ec)

Thanks, I will make another PR to address these issues separately. However, about the int -> sample_rate_t change, I never really understood why we have those types when basic types like int will work just as well. We already have to convert the sample rate to an int when processing the project file anyways. I can't really see the benefit using sample_rate_t brings, and it adds more complexity than int does (not in terms how long the type is.. well maybe, but more so having to know what is the actual type is that we are dealing with here on top of adding more code that we don't really need in the codebase).

@sakertooth sakertooth changed the title Add a database for samples Add a cache for samples Nov 13, 2024
@sakertooth
Copy link
Contributor Author

Any updates @messmerd?

include/SampleCache.h Outdated Show resolved Hide resolved
@messmerd
Copy link
Member

This PR looks pretty good to me code-wise. I'll test it next.

include/SampleCache.h Outdated Show resolved Hide resolved
include/SampleCache.h Outdated Show resolved Hide resolved
src/core/SampleCache.cpp Outdated Show resolved Hide resolved
src/core/SampleCache.cpp Outdated Show resolved Hide resolved
sakertooth and others added 2 commits November 21, 2024 12:48
Co-authored-by: Dalton Messmer <[email protected]>
Co-authored-by: Dalton Messmer <[email protected]>
src/core/SampleCache.cpp Outdated Show resolved Hide resolved
@Spacemagehq
Copy link

I'm testing, and I'm finding no issues or bugs so far on Windows 11. I had 20 samples and samples randomly throughout lmms and the memory and the storage seems fine and not affected. The CPU meter inside lmms is not going up by that much and not heavy on the cpu.

Co-authored-by: Dalton Messmer <[email protected]>
@sakertooth
Copy link
Contributor Author

sakertooth commented Feb 8, 2025

@messmerd, I believe I understand the benefit of using UUIDs better now. If the file path changes, then we only would need to update the mapping from UUIDs to file paths at one place, while all the other clients don't have to worry about anything and can continue using the UUID, which hasn't changed.

As of right now my PRs handles it using the "last modified time", which has two problems. One, this can create a lot of dead entries if they are not being actively "garbage collected". Two, since clients still store the file path, if it changes, they would have to still manage the sample by themselves, and that would have to be done possibly everywhere. This is a problem in #7366, where we have check if the Thumbnail needs to be updated or a new one to be created because we store the file paths directly, which can become invalidated at any point on the file system.

It seems like the overall idea here is to map UUIDs to assets/resources, which can be a sample file, sample Bas64 string, project file, preset file, etc. Assets can then specify if they are loaded from disk, or something else like as a string in the case of Base64 samples. Each asset can be updated as necessary to when the asset manager/cache feels like it should (changes on file system or something else possibly). This will not only bolster the sample caching implementation since it truly centralizes everything, but will also help make forward strides in simplifying and improving asset management (which may be needed for features like showing a popup to the user to load missing assets when loading the project, among other things).

I am going to try to implement some of these ideas in a new PR (actually I might just do it here instead).

@messmerd
Copy link
Member

messmerd commented Feb 8, 2025

@sakertooth Yep, that's exactly it.

This PR doesn't make any changes to the project file as far as I'm aware, so it won't introduce any backwards incompatible changes if we merge it now. And since this PR is very useful as is, I think we should merge it then explore the UUID / resource manager idea in a follow-up PR.

@sakertooth
Copy link
Contributor Author

I'll at least move to using QFileSystemWatcher in the current implementation to fix some of the problems mentioned by actively keeping the table in sync with the file system. Other than that I think this is safe to merge. I agree that the UUID asset idea might need to be explored in a different PR since its implementation is far more lengthy than what I have here, and I remember you were planning to do this already to some extent.

@messmerd messmerd mentioned this pull request Feb 9, 2025
@sakertooth
Copy link
Contributor Author

sakertooth commented Feb 9, 2025

I read up on the docs for QFileSystemWatcher

The act of monitoring files and directories for modifications consumes system resources. This implies there is a limit to the number of files and directories your process can monitor simultaneously. On all BSD variants, for example, an open file descriptor is required for each monitored file. Some system limits the number of open file descriptors to 256 by default. This means that addPath() and addPaths() will fail if your process tries to add more than 256 files or directories to the file system monitor. Also note that your process may have other file descriptors open in addition to the ones for files being monitored, and these other open descriptors also count in the total. macOS uses a different backend and does not suffer from this issue.

I was a bit scared off making the switch because of this and had to weigh the pros and cons, so I'll probably leave the timestamp checking to be safe as it works well enough. A counter argument to having dead paths in the caches is that its rare and have static duration, so it really not that big of a problem as I made it out to be. The main issue is exposing the file path information that has to be made in sync everywhere else in the codebase, but we will deal with this later as already discussed.

One change I should make though is that instead of making completely new entries, update preexisting ones if they are fetched a second time and are still valid on the file system. This should keep the number of invalid entries in the cache down by a fair amount.

@sakertooth sakertooth marked this pull request as draft February 9, 2025 05:36
@sakertooth sakertooth marked this pull request as ready for review February 9, 2025 09:27
Comment on lines +37 to +48
const auto it = std::find_if(s_audioFileMap.begin(), s_audioFileMap.end(),
[&](const auto& entry) { return entry.first.path == PathUtil::pathFromQString(path); });

auto lastWriteTime = fs::last_write_time(PathUtil::pathFromQString(path));

if (it == s_audioFileMap.end() || it->first.lastWriteTime != lastWriteTime)
{
const auto buffer = std::make_shared<SampleBuffer>(path);
const auto key = AudioFileEntry{PathUtil::pathFromQString(path), lastWriteTime};
s_audioFileMap[std::move(key)] = buffer;
return buffer;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const auto it = std::find_if(s_audioFileMap.begin(), s_audioFileMap.end(),
[&](const auto& entry) { return entry.first.path == PathUtil::pathFromQString(path); });
auto lastWriteTime = fs::last_write_time(PathUtil::pathFromQString(path));
if (it == s_audioFileMap.end() || it->first.lastWriteTime != lastWriteTime)
{
const auto buffer = std::make_shared<SampleBuffer>(path);
const auto key = AudioFileEntry{PathUtil::pathFromQString(path), lastWriteTime};
s_audioFileMap[std::move(key)] = buffer;
return buffer;
}
const auto qPath = PathUtil::pathFromQString(path);
const auto it = std::find_if(s_audioFileMap.begin(), s_audioFileMap.end(),
[&](const auto& entry) { return entry.first.path == qPath; });
auto lastWriteTime = fs::last_write_time(qPath);
if (it == s_audioFileMap.end() || it->first.lastWriteTime != lastWriteTime)
{
const auto buffer = std::make_shared<SampleBuffer>(path);
const auto key = AudioFileEntry{qPath, lastWriteTime};
s_audioFileMap[std::move(key)] = buffer;
return buffer;
}

Moved the PathUtil::pathFromQString(path) outside the lambda so it doesn't call it every time the lambda is called.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use fsPath instead personally, but yeah I'll move it out.

@sakertooth sakertooth marked this pull request as draft February 12, 2025 13:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants