Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: massStoreRunAsynchronous() #4326

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Commits on Sep 18, 2024

  1. feat(server): Asynchronous server-side background task execution

    This patch implements the whole support ecosystem for server-side
    background tasks, in order to help lessen the load (and blocking) of API
    handlers in the web-server for long-running operations.
    
    A **Task** is represented by two things in strict co-existence: a
    lightweight, `pickle`-able implementation in the server's code (a
    subclass of `AbstractTask`) and a corresponding `BackgroundTask`
    database entity, which resides in the "configuration" database (shared
    across all products).
    A Task is created by API request handlers and then the user is
    instructed to retain the `TaskToken`: the task's unique identifier.
    Following, the server will dispatch execution of the object into a
    background worker process, and keep status synchronisation via the
    database.
    Even in a service cluster deployment, load balancing will not interfere
    with users' ability to query a task's status.
    
    While normal users can only query the status of a single task (which is
    usually automatically done by client code, and not the user manually
    executing something); product administrators, and especially server
    administrators have the ability to query an arbitrary set of tasks using
    the potential filters, with a dedicated API function (`getTasks()`) for
    this purpose.
    
    Tasks can be cancelled only by `SUPERUSER`s, at which point a special
    binary flag is set in the status record.
    However, to prevent complicating inter-process communication,
    cancellation is supposed to be implemented by `AbstractTask` subclasses
    in a co-operative way.
    The execution of tasks in a process and a `Task`'s ability to
    "communicate" with its execution environment is achieved through the new
    `TaskManager` instance, which is created for every process of a server's
    deployment.
    
    Unfortunately, tasks can die gracelessly if the server is terminated
    (either internally, or even externally).
    For this reason, the `DROPPED` status will indicate that the server has
    terminated prior to, or during a task's execution, and it was unable to
    produce results.
    The server was refactored significantly around the handling of subprocesses
    in order to support various server shutdown scenarios.
    
    Servers will start `background_worker_processes` number of task handling
    subprocesses, which are distinct from the already existing "API
    handling" subprocesses.
    By default, if unconfigured, `background_worker_processes` is equal to
    `worker_processes` (the number of API processes to spawn), which is
    equal to `$(nproc)` (CPU count in the system).
    
    This patch includes a `TestingDummyTask` demonstrative subclass of
    `AbstractTask` which counts up to an input number of seconds, and each
    second it gracefully checks whether it is being killed.
    The corresponding testing API endpoint, `createDummyTask()` can specify
    whether the task should simulate a failing status.
    This endpoint can only be used from, but is used extensively, the unit
    testing of the project.
    
    This patch does not include "nice" or "ergonomic" facilities for admins
    to manage the tasks, and so far, only the server-side of the
    corresponding API calls are supported.
    whisperity committed Sep 18, 2024
    Configuration menu
    Copy the full SHA
    a75244f View commit details
    Browse the repository at this point in the history
  2. feat(cmd): Implemented a CLI for task management

    This patch extends `CodeChecker cmd` with a new sub-command,
    `serverside-tasks`, which lets users and administrators deal with
    querying the status of running server-side tasks.
    
    By default, the CLI queries the information of the task(s) specified by
    their token(s) in the `--token` argument from the server using
    `getTaskInfo(token)`, and shows this information in either verbose
    "plain text" (available if precisely **one** task was specified), "table"
    or JSON formats.
    
    In addition to `--token`, it also supports 19 more parameters, each of
    which correspond to a filter option in the `TaskFilter` API type.
    If any filters in addition to `--token` is specified, it will exercise
    `getTasks(filter)` instead.
    This mode is only available to administrators.
    The resulting, more detailed information structs are printed in "table"
    or JSON formats.
    
    Apart from querying the current status, two additional flags are
    available, irrespective of which query method is used to obtain a list
    of "matching tasks":
    
      * `--kill` will call `cancelTask(token)` for each task.
      * `--await` will block execution until the specified task(s) terminate
        (in one way or another).
    
    `--await` is implemented by calling the new **`await_task_termination`**
    library function, which is implemented with the goal of being reusable
    by other clients later.
    whisperity committed Sep 18, 2024
    Configuration menu
    Copy the full SHA
    d73c1da View commit details
    Browse the repository at this point in the history
  3. refactor(server): massStoreRun() as a background task

    Separate the previously blocking execution of `massStoreRun()`, which
    was done in the context of the "API handler process", into a
    "foreground" and a "background" part, exploiting the previously
    implemented background task library support.
    
    The foreground part remains executed in the context of the API handler
    process, and deals with receiving and unpacking the to-be-stored data,
    saving configuration and checking constraints that are cheap to check.
    The foreground part can report issues synchronously to the client.
    
    Everything else that was part of the previous `massStoreRun()` pipeline,
    as implemented by the `mass_store_run.MassStoreRun` class becomes a
    background task, such as the parsing of the uploaded reports and the
    storing of data to the database.
    This background task, implemented using the new library, executes in a
    separate background worker process, and can not communicate directly
    with the user.
    Errors are logged to the `comments` fields.
    
    The `massStoreRun()` API handler will continue to work as previously,
    and block while waiting for the background task to terminate.
    In case of an error, it synchronously reports a `RequestFailed` exception,
    passing the `comments` field (into which the background process had
    written the exception details) to the client.
    
    Due to the inability for most of the exceptions previously caused in
    `MassStoreRun` to "escape" as `RequestFailed`s, some parts of the API
    had been deprecated and removed.
    Namely, `ErrorCode.SOURCE_FILE` and `ErrorCode.REPORT_FORMAT` are no
    longer sent over the API.
    This does not break existing behaviour and does not cause an
    incompatibility with clients: in cases where the request exceptions were
    raised earlier, now a different type of exception is raised, but the
    error message still precisely explains the problem as it did previously.
    whisperity committed Sep 18, 2024
    Configuration menu
    Copy the full SHA
    d42164b View commit details
    Browse the repository at this point in the history

Commits on Sep 20, 2024

  1. feat: massStoreRunAsynchronous()

    Even though commit d915473 introduced a
    socket-level TCP keepalive support into the server's implementation,
    this was observed multiple times to not be enough to
    **deterministically** fix the issues with the `CodeChecker store` client
    hanging indefinitely when the server takes a long time processing the
    to-be-stored data.
    The underlying reasons are not entirely clear and the issue only pops
    up sporadically, but we did observe a few similar scenarios (such as
    multi-million report storage from analysing LLVM and then storing
    between datacentres) where it almost reliably reproduces.
    The symptoms (even with a configure `kepalive`) generally include the
    server not becoming notified about the client's disconnect, while the
    client process is hung on a low-level system call `read(4, ...)`, trying
    to get the Thrift response of `massStoreRun()` from the HTTP socket.
    Even if the server finishes the storage processing "in time" and sent
    the Thrift reply, it never reaches the client, which means it never
    exits from the waiting, which means it keeps either the terminal or,
    worse, a CI script occupied, blocking execution.
    
    This is the "more proper solution" foreshadowed in
    commit 15af7d8.
    
    Implemented the server-side logic to spawn a `MassStoreRun` task and
    return its token, giving the `massStoreRunAsynchronous()` API call full
    force.
    
    Implemented the client-side logic to use the new `task_client` module
    and the same logic as
    `CodeChecker cmd serverside-tasks --await --token TOKEN...`
    to poll the server for the task's completion and status.
    whisperity committed Sep 20, 2024
    Configuration menu
    Copy the full SHA
    e30fcc2 View commit details
    Browse the repository at this point in the history