feat: `massStoreRunAsynchronous()` #4326

This patch implements the whole support ecosystem for server-side background tasks, in order to help lessen the load (and blocking) of API handlers in the web-server for long-running operations. A **Task** is represented by two things in strict co-existence: a lightweight, `pickle`-able implementation in the server's code (a subclass of `AbstractTask`) and a corresponding `BackgroundTask` database entity, which resides in the "configuration" database (shared across all products). A Task is created by API request handlers and then the user is instructed to retain the `TaskToken`: the task's unique identifier. Following, the server will dispatch execution of the object into a background worker process, and keep status synchronisation via the database. Even in a service cluster deployment, load balancing will not interfere with users' ability to query a task's status. While normal users can only query the status of a single task (which is usually automatically done by client code, and not the user manually executing something); product administrators, and especially server administrators have the ability to query an arbitrary set of tasks using the potential filters, with a dedicated API function (`getTasks()`) for this purpose. Tasks can be cancelled only by `SUPERUSER`s, at which point a special binary flag is set in the status record. However, to prevent complicating inter-process communication, cancellation is supposed to be implemented by `AbstractTask` subclasses in a co-operative way. The execution of tasks in a process and a `Task`'s ability to "communicate" with its execution environment is achieved through the new `TaskManager` instance, which is created for every process of a server's deployment. Unfortunately, tasks can die gracelessly if the server is terminated (either internally, or even externally). For this reason, the `DROPPED` status will indicate that the server has terminated prior to, or during a task's execution, and it was unable to produce results. The server was refactored significantly around the handling of subprocesses in order to support various server shutdown scenarios. Servers will start `background_worker_processes` number of task handling subprocesses, which are distinct from the already existing "API handling" subprocesses. By default, if unconfigured, `background_worker_processes` is equal to `worker_processes` (the number of API processes to spawn), which is equal to `$(nproc)` (CPU count in the system). This patch includes a `TestingDummyTask` demonstrative subclass of `AbstractTask` which counts up to an input number of seconds, and each second it gracefully checks whether it is being killed. The corresponding testing API endpoint, `createDummyTask()` can specify whether the task should simulate a failing status. This endpoint can only be used from, but is used extensively, the unit testing of the project. This patch does not include "nice" or "ergonomic" facilities for admins to manage the tasks, and so far, only the server-side of the corresponding API calls are supported.

This patch extends `CodeChecker cmd` with a new sub-command, `serverside-tasks`, which lets users and administrators deal with querying the status of running server-side tasks. By default, the CLI queries the information of the task(s) specified by their token(s) in the `--token` argument from the server using `getTaskInfo(token)`, and shows this information in either verbose "plain text" (available if precisely **one** task was specified), "table" or JSON formats. In addition to `--token`, it also supports 19 more parameters, each of which correspond to a filter option in the `TaskFilter` API type. If any filters in addition to `--token` is specified, it will exercise `getTasks(filter)` instead. This mode is only available to administrators. The resulting, more detailed information structs are printed in "table" or JSON formats. Apart from querying the current status, two additional flags are available, irrespective of which query method is used to obtain a list of "matching tasks": * `--kill` will call `cancelTask(token)` for each task. * `--await` will block execution until the specified task(s) terminate (in one way or another). `--await` is implemented by calling the new **`await_task_termination`** library function, which is implemented with the goal of being reusable by other clients later.

Separate the previously blocking execution of `massStoreRun()`, which was done in the context of the "API handler process", into a "foreground" and a "background" part, exploiting the previously implemented background task library support. The foreground part remains executed in the context of the API handler process, and deals with receiving and unpacking the to-be-stored data, saving configuration and checking constraints that are cheap to check. The foreground part can report issues synchronously to the client. Everything else that was part of the previous `massStoreRun()` pipeline, as implemented by the `mass_store_run.MassStoreRun` class becomes a background task, such as the parsing of the uploaded reports and the storing of data to the database. This background task, implemented using the new library, executes in a separate background worker process, and can not communicate directly with the user. Errors are logged to the `comments` fields. The `massStoreRun()` API handler will continue to work as previously, and block while waiting for the background task to terminate. In case of an error, it synchronously reports a `RequestFailed` exception, passing the `comments` field (into which the background process had written the exception details) to the client. Due to the inability for most of the exceptions previously caused in `MassStoreRun` to "escape" as `RequestFailed`s, some parts of the API had been deprecated and removed. Namely, `ErrorCode.SOURCE_FILE` and `ErrorCode.REPORT_FORMAT` are no longer sent over the API. This does not break existing behaviour and does not cause an incompatibility with clients: in cases where the request exceptions were raised earlier, now a different type of exception is raised, but the error message still precisely explains the problem as it did previously.

Even though commit d915473 introduced a socket-level TCP keepalive support into the server's implementation, this was observed multiple times to not be enough to **deterministically** fix the issues with the `CodeChecker store` client hanging indefinitely when the server takes a long time processing the to-be-stored data. The underlying reasons are not entirely clear and the issue only pops up sporadically, but we did observe a few similar scenarios (such as multi-million report storage from analysing LLVM and then storing between datacentres) where it almost reliably reproduces. The symptoms (even with a configure `kepalive`) generally include the server not becoming notified about the client's disconnect, while the client process is hung on a low-level system call `read(4, ...)`, trying to get the Thrift response of `massStoreRun()` from the HTTP socket. Even if the server finishes the storage processing "in time" and sent the Thrift reply, it never reaches the client, which means it never exits from the waiting, which means it keeps either the terminal or, worse, a CI script occupied, blocking execution. This is the "more proper solution" foreshadowed in commit 15af7d8. Implemented the server-side logic to spawn a `MassStoreRun` task and return its token, giving the `massStoreRunAsynchronous()` API call full force. Implemented the client-side logic to use the new `task_client` module and the same logic as `CodeChecker cmd serverside-tasks --await --token TOKEN...` to poll the server for the task's completion and status.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: `massStoreRunAsynchronous()` #4326

feat: `massStoreRunAsynchronous()` #4326

Commits on Sep 18, 2024

Commits on Sep 20, 2024

feat: massStoreRunAsynchronous() #4326

Are you sure you want to change the base?

feat: massStoreRunAsynchronous() #4326

Commits on Sep 18, 2024

Commits on Sep 20, 2024

feat: `massStoreRunAsynchronous()` #4326

feat: `massStoreRunAsynchronous()` #4326