Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix libraries performance problem #34523

Merged
merged 1 commit into from
May 20, 2024

Commits on May 20, 2024

  1. fix: libraries performance problem

    This is an attempt to fix a performance problem on the libraries home page. When you go to studio home and click on the libraries tab, on prod it will be quick for admins but extremely slow for course instructors (> 12 seconds) and leads to timeouts. It grows with the number of libraries that are assigned to the instructor.
    
    The Python code for the request to load libraries for a particular user goes through all existing libraries and then checks all of the user's roles for each library, which results in a complexity of O(l*r), l=libraries, r=roles. This PR improves the complexity to O(l).
    
    The BulkRoleCache and RoleCache classes were using a python set to store all roles for a particular user. A user can have a large number of roles, and lookup speed of iterating through a set is slow (O(n)). Most roles don't have the same course id, however. So if you have the course id of the role you're looking for, we can use a dict of course ids that contain related roles. The number of roles per course id is negligible, so we arrive at a lookup speed of O(1) when looking up a user's roles that belong to a specific course id.
    
    The BulkRoleCache now caches and stores user roles in a data structure like this:
        {
            user_id_1: {
                course_id_1: {role1, role2, role3},  # Set of roles associated with course_id_1
                course_id_2: {role4, role5, role6},  # Set of roles associated with course_id_2
                [ROLE_CACHE_UNGROUPED_ROLES_KEY]: {role7, role8}  # Set of roles not tied to any specific course or library. For example, Global Staff roles.
            },
            user_id_2: { ... }  # Similar structure for another user
        }
    
    While this changes the data structure used to store roles under the hood and adds the new property `roles_by_course_id` to the RoleCache,
    when initializing the RoleCache will store roles additionally in the previous data structure - as a flat set - in the `_roles` property accessible via `all_roles_set`. This establishes
    backwards compatibility.
    
    We are now storing roles twice in the RoleCache (in each of the two data structures), which means this takes twice as much memory, but only in the scope of a request.
    jesperhodge committed May 20, 2024
    Configuration menu
    Copy the full SHA
    e524349 View commit details
    Browse the repository at this point in the history