Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very slow Indexing of Shared Files and a possible solution #281

Open
rolandinus opened this issue Sep 2, 2024 · 0 comments
Open

Very slow Indexing of Shared Files and a possible solution #281

rolandinus opened this issue Sep 2, 2024 · 0 comments

Comments

@rolandinus
Copy link

When a file is updated in the index (e.g., renamed), the share names for all users with access to this file are updated, even if the user has not changed the share name. This process is extremely slow when there are many users with access to the file.
It's likely related to Issue #256, which would be resolved if this process were faster. There reports in the nextcloud forums which seem to be related.

Current Behavior

Updating a single file triggers share name updates for all users with access.
Renaming a folder updates all files in all subfolders.
On large systems with many files and users, this can lead to an indexing queue that takes an excessive amount of time to complete (e.g., a week).

Details
I identified the performance bottleneck in the following function in the FilesService:

private function getPathFromViewerId(int $fileId, string $viewerId): string {
    $viewerFiles = $this->rootFolder->getUserFolder($viewerId)
        ->getById($fileId);
}

Specifically, the ->getById($fileId) call is causing the slowdown.

Proposed Solution:
I tried using the file path of the owner as a guess for other users with access, since this is the default in most cases.
Using nodeExists($path) in each user's folder to check if it is valid, is approximately 50-100 times faster than calling getPathFromViewerId.
(In case the file is allready in the index, the current share names might be a better first guess)
If the guessed path is not valid for a user, fall back to the current method.

This approach should work well since the fulltext index only stores one access path per user anyway.

I have created a test implementation of the proposed solution. From an initial test, it seems to work fine and it is a lot faster.

Questions for Maintainers
Are there any potential side effects or edge cases to consider?
Do you have an idea for a better approach? I am happy to come up with a different solution, if someone can give me hint.

I am happy to create a pull request, but first I would like to have some feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant