-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrong running time #58
Comments
benchpress/src/core/Run_proc.ml Lines 9 to 41 in 2d84e9d
Does |
I'm not sure what the clean solution is. Maybe @Gbury can chime in about using cgroups to ensure that subprocesses get dedicated CPU resources? I think François Bobot does this in his benchmark tool. |
This is all a bit complicated, but I'd say:
|
I am trying another solution given by a colleague ;) |
This patch does multiple things to try to improve stability of running times: - Introduce cpu affinity support in benchpress. This is configured with on the command-line with the new `--cpus` option, which replaces `-j` and allows to specify a list of cpus or cpu ranges in taskset format, such as `0-3,7-12,15` (strides are not supported). When the `--cpus` option is provided, provers will be run in parallel on the provided cpus, with at most one prover running on a given cpu at once. Note that the `--cpus` setting *only* concerns the prover runs (and some glue code around one specific prover run): it does *not* otherwise restrict the cpus used by benchpress itself. It is recommended to use an external method such as the `taskset` utility to assign to the toplevel benchpress process a CPU affinity that does not overlap with the prover CPUs provided in `--cpus`. When using the `--cpus` setting, users should be aware that *there may still be other processes on the system using these cpus* (obviously)! It is thus recommended to use one of the existing techniques to isolate these CPUs from the rest of the system. I know of two ways to do this: the `isolcpus` kernel parameter, which is deprecated but is slightly easier to use, and cpusets, which are not deprecated but harder to use. To use `isolcpus`, simply set the cpus to use for benchmarking as the isolated CPUs on the kernel cmdline and reboot (OK, maybe not that simple for one-shot benchmarking, but fairly easy for a machine that is mostly used for benchmarking). There is no need to set the CPU affinity for the `benchpress` binary: it will never be scheduled on the isolated CPUs, and neither will any other processes (unless manually required to). To use cpusets (which is the solution I employed on our benchmark machine at OCamlPro), you should create a `system` cpuset that contains only the CPUs that will *NOT* be used by benchpress, and move all processes to that cpuset (this can be done on a running system, consult the cpuset documentation). Then, create another cpuset that contains the CPUs to use for benchpress, including the CPUs to use for the `benchpress` binary (in practice I use the root cpuset that contains all the CPUs), and run `benchpress` in that cpuset. You must not forget to use `taskset` to prevent the `benchpress` binary from using the CPUs destined for the provers. In practice, I move the shell that I use to run benchpress to that second cpuset. - Input files are copied into RAM (through `/dev/shm`) before running the provers, and the input to the prover is the copy in RAM. This ensures we don't benchmark the time it takes to load the problem from disk, and hels minimise noise in situations of high disk contention (such as when tens of prover instances are trying to read their input at the same time). We are still limited by the RAM bandwidth but… what can you do. - Standard output and standard error of the prover are written to temporary files in RAM instead of using pipes. This ensures that the prover never gets stuck waiting for benchpress to read from the pipes (although this shouldn't be an issue due to huge default pipe buffer sizes on Linux), but more importantly minimises context switches between the prover and benchpress. With these changes we observe much less noise in benchmark results internally at OCamlPro, and this all but fixes sneeuwballen#58 (that said, it would still be nice to add support for getting execution times through getrusage).
This patch does multiple things to try to improve stability of running times: - Introduce cpu affinity support in benchpress. This is configured with on the command-line with the new `--cpus` option, which replaces `-j` and allows to specify a list of cpus or cpu ranges in taskset format, such as `0-3,7-12,15` (strides are not supported). When the `--cpus` option is provided, provers will be run in parallel on the provided cpus, with at most one prover running on a given cpu at once. Note that the `--cpus` setting *only* concerns the prover runs (and some glue code around one specific prover run): it does *not* otherwise restrict the cpus used by benchpress itself. It is recommended to use an external method such as the `taskset` utility to assign to the toplevel benchpress process a CPU affinity that does not overlap with the prover CPUs provided in `--cpus`. When using the `--cpus` setting, users should be aware that *there may still be other processes on the system using these cpus* (obviously)! It is thus recommended to use one of the existing techniques to isolate these CPUs from the rest of the system. I know of two ways to do this: the `isolcpus` kernel parameter, which is deprecated but is slightly easier to use, and cpusets, which are not deprecated but harder to use. To use `isolcpus`, simply set the cpus to use for benchmarking as the isolated CPUs on the kernel cmdline and reboot (OK, maybe not that simple for one-shot benchmarking, but fairly easy for a machine that is mostly used for benchmarking). There is no need to set the CPU affinity for the `benchpress` binary: it will never be scheduled on the isolated CPUs, and neither will any other processes (unless manually required to). To use cpusets (which is the solution I employed on our benchmark machine at OCamlPro), you should create a `system` cpuset that contains only the CPUs that will *NOT* be used by benchpress, and move all processes to that cpuset (this can be done on a running system, consult the cpuset documentation). Then, create another cpuset that contains the CPUs to use for benchpress, including the CPUs to use for the `benchpress` binary (in practice I use the root cpuset that contains all the CPUs), and run `benchpress` in that cpuset. You must not forget to use `taskset` to prevent the `benchpress` binary from using the CPUs destined for the provers. In practice, I move the shell that I use to run benchpress to that second cpuset. - Input files are copied into RAM (through `/dev/shm`) before running the provers, and the input to the prover is the copy in RAM. This ensures we don't benchmark the time it takes to load the problem from disk, and hels minimise noise in situations of high disk contention (such as when tens of prover instances are trying to read their input at the same time). We are still limited by the RAM bandwidth but… what can you do. - Standard output and standard error of the prover are written to temporary files in RAM instead of using pipes. This ensures that the prover never gets stuck waiting for benchpress to read from the pipes (although this shouldn't be an issue due to huge default pipe buffer sizes on Linux), but more importantly minimises context switches between the prover and benchpress. With these changes we observe much less noise in benchmark results internally at OCamlPro, and this all but fixes sneeuwballen#58 (that said, it would still be nice to add support for getting execution times through getrusage).
This patch does multiple things to try to improve stability of running times: - Introduce cpu affinity support in benchpress. This is configured with on the command-line with the new `--cpus` option, which replaces `-j` and allows to specify a list of cpus or cpu ranges in taskset format, such as `0-3,7-12,15` (strides are not supported). When the `--cpus` option is provided, provers will be run in parallel on the provided cpus, with at most one prover running on a given cpu at once. Note that the `--cpus` setting *only* concerns the prover runs (and some glue code around one specific prover run): it does *not* otherwise restrict the cpus used by benchpress itself. It is recommended to use an external method such as the `taskset` utility to assign to the toplevel benchpress process a CPU affinity that does not overlap with the prover CPUs provided in `--cpus`. When using the `--cpus` setting, users should be aware that *there may still be other processes on the system using these cpus* (obviously)! It is thus recommended to use one of the existing techniques to isolate these CPUs from the rest of the system. I know of two ways to do this: the `isolcpus` kernel parameter, which is deprecated but is slightly easier to use, and cpusets, which are not deprecated but harder to use. To use `isolcpus`, simply set the cpus to use for benchmarking as the isolated CPUs on the kernel cmdline and reboot (OK, maybe not that simple for one-shot benchmarking, but fairly easy for a machine that is mostly used for benchmarking). There is no need to set the CPU affinity for the `benchpress` binary: it will never be scheduled on the isolated CPUs, and neither will any other processes (unless manually required to). To use cpusets (which is the solution I employed on our benchmark machine at OCamlPro), you should create a `system` cpuset that contains only the CPUs that will *NOT* be used by benchpress, and move all processes to that cpuset (this can be done on a running system, consult the cpuset documentation). Then, create another cpuset that contains the CPUs to use for benchpress, including the CPUs to use for the `benchpress` binary (in practice I use the root cpuset that contains all the CPUs), and run `benchpress` in that cpuset. You must not forget to use `taskset` to prevent the `benchpress` binary from using the CPUs destined for the provers. In practice, I move the shell that I use to run benchpress to that second cpuset. - Input files are copied into RAM (through `/dev/shm`) before running the provers, and the input to the prover is the copy in RAM. This ensures we don't benchmark the time it takes to load the problem from disk, and hels minimise noise in situations of high disk contention (such as when tens of prover instances are trying to read their input at the same time). We are still limited by the RAM bandwidth but… what can you do. - Standard output and standard error of the prover are written to temporary files in RAM instead of using pipes. This ensures that the prover never gets stuck waiting for benchpress to read from the pipes (although this shouldn't be an issue due to huge default pipe buffer sizes on Linux), but more importantly minimises context switches between the prover and benchpress. With these changes we observe much less noise in benchmark results internally at OCamlPro, and this all but fixes sneeuwballen#58 (that said, it would still be nice to add support for getting execution times through getrusage).
This is configured with on the command-line with the new `--cpus` option, which replaces `-j` and allows to specify a list of cpus or cpu ranges in taskset format, such as `0-3,7-12,15` (strides are not supported). When the `--cpus` option is provided, provers will be run in parallel on the provided cpus, with at most one prover running on a given cpu at once. Note that the `--cpus` setting *only* concerns the prover runs (and some glue code around one specific prover run): it does *not* otherwise restrict the cpus used by benchpress itself. It is recommended to use an external method such as the `taskset` utility to assign to the toplevel benchpress process a CPU affinity that does not overlap with the prover CPUs provided in `--cpus`. When using the `--cpus` setting, users should be aware that *there may still be other processes on the system using these cpus* (obviously)! It is thus recommended to use one of the existing techniques to isolate these CPUs from the rest of the system. I know of two ways to do this: the `isolcpus` kernel parameter, which is deprecated but is slightly easier to use, and cpusets, which are not deprecated but harder to use. To use `isolcpus`, simply set the cpus to use for benchmarking as the isolated CPUs on the kernel cmdline and reboot (OK, maybe not that simple for one-shot benchmarking, but fairly easy for a machine that is mostly used for benchmarking). There is no need to set the CPU affinity for the `benchpress` binary: it will never be scheduled on the isolated CPUs, and neither will any other processes (unless manually required to). To use cpusets (which is the solution I employed on our benchmark machine at OCamlPro), you should create a `system` cpuset that contains only the CPUs that will *NOT* be used by benchpress, and move all processes to that cpuset (this can be done on a running system, consult the cpuset documentation). Then, create another cpuset that contains the CPUs to use for benchpress, including the CPUs to use for the `benchpress` binary (in practice I use the root cpuset that contains all the CPUs), and run `benchpress` in that cpuset. You must not forget to use `taskset` to prevent the `benchpress` binary from using the CPUs destined for the provers. In practice, I move the shell that I use to run benchpress to that second cpuset. This helps with sneeuwballen#58
I am using Benchpress to track regressions in Alt-Ergo but sometimes I got regressions that I cannot reproduce manually. My guess is that the way Benchpress computes running times is not appropriate for this purpose and the current times can be wrong if the server is overloaded. Thus, I tried to run Benchpress with less jobs to prevent this behavior, but still I have non-replicable regressions.
I could solve my problem by adding the running time without IO operations (the CPU time?) in the database for each tests.
I would be glad to add the feature by myself ;)
The text was updated successfully, but these errors were encountered: