Wrong running time #58

Halbaroth · 2023-03-15T11:35:59Z

I am using Benchpress to track regressions in Alt-Ergo but sometimes I got regressions that I cannot reproduce manually. My guess is that the way Benchpress computes running times is not appropriate for this purpose and the current times can be wrong if the server is overloaded. Thus, I tried to run Benchpress with less jobs to prevent this behavior, but still I have non-replicable regressions.

I could solve my problem by adding the running time without IO operations (the CPU time?) in the database for each tests.

I would be glad to add the feature by myself ;)

Halbaroth · 2023-03-15T13:14:39Z

benchpress/src/core/Run_proc.ml

Lines 9 to 41 in 2d84e9d

    
           let run cmd : Run_proc_result.t = 
        
             let start = Ptime_clock.now() in 
        
             (* call process and block *) 
        
             let p = 
        
               try 
        
                 let oc, ic, errc = Unix.open_process_full cmd (Unix.environment()) in 
        
                 close_out ic; 
        
                 (* read out and err *) 
        
                 let err = ref "" in 
        
                 let t_err = Thread.create (fun e -> err := CCIO.read_all e) errc in 
        
                 let out = CCIO.read_all oc in 
        
                 Thread.join t_err; 
        
                 let status = Unix.close_process_full (oc, ic, errc) in 
        
                 object 
        
                   method stdout= out 
        
                   method stderr= !err 
        
                   method errcode=int_of_process_status status 
        
                   method status=status 
        
                 end 
        
               with e -> 
        
                 object 
        
                   method stdout="" 
        
                   method stderr="process died: " ^ Printexc.to_string e 
        
                   method errcode=1 
        
                   method status=Unix.WEXITED 1 
        
                 end 
        
             in 
        
             let errcode = p#errcode in 
        
             Log.debug 
        
               (fun k->k "(@[run.done@ :errcode %d@ :cmd %a@]" errcode Misc.Pp.pp_str cmd); 
        
             (* Compute time used by the command *) 
        
             let rtime = Ptime.diff (Ptime_clock.now ()) start |> Ptime.Span.to_float_s in 
        
             let utime = 0. in

Does Ptime_clock.now () return the elapsed time? As an experimentation, I will replace Ptime_clock.now () by Sys.time which returns the cpu time. It should fix my issue. If it do so, I will add an extra field in the DB to keep the total cpu time also.

c-cube · 2023-03-15T15:02:34Z

Sys.time will return the CPU time for the current process (i.e benchpress), not for the solver. Ptime_clock.now returns the posix wallclock, which indeed might vary if the server is overloaded.

I'm not sure what the clean solution is. Maybe @Gbury can chime in about using cgroups to ensure that subprocesses get dedicated CPU resources? I think François Bobot does this in his benchmark tool.

Gbury · 2023-03-15T15:10:12Z

This is all a bit complicated, but I'd say:

to ensure a set amount of cpu ressources, the best best would probably be to set the cpu affinity (see, e.g. https://unix.stackexchange.com/questions/73/how-can-i-set-the-processor-affinity-of-a-process-on-linux ), but that would also require to set affinity for all the other processes running on the machine if one wants to ensure that the prover processes are alone on each core (or find another solution ?)
One of the motivation for using something else that wall clock time to measure regression is that some colleagues told us that the OS might decide to batch IOs, which could result in arbitrary delays for some instances of provers on systems that are under load (or that do a lot of IO, e.g. when starting more than a dozen provers at the same time). One idea was to use cpu time to identity regressions, but maybe there would be other solutions (stagger the starting times of processes, preload/put in the cache the input problem files, etc...)

Halbaroth · 2023-03-15T16:35:24Z

I am trying another solution given by a colleague ;)

This patch does multiple things to try to improve stability of running times: - Introduce cpu affinity support in benchpress. This is configured with on the command-line with the new `--cpus` option, which replaces `-j` and allows to specify a list of cpus or cpu ranges in taskset format, such as `0-3,7-12,15` (strides are not supported). When the `--cpus` option is provided, provers will be run in parallel on the provided cpus, with at most one prover running on a given cpu at once. Note that the `--cpus` setting *only* concerns the prover runs (and some glue code around one specific prover run): it does *not* otherwise restrict the cpus used by benchpress itself. It is recommended to use an external method such as the `taskset` utility to assign to the toplevel benchpress process a CPU affinity that does not overlap with the prover CPUs provided in `--cpus`. When using the `--cpus` setting, users should be aware that *there may still be other processes on the system using these cpus* (obviously)! It is thus recommended to use one of the existing techniques to isolate these CPUs from the rest of the system. I know of two ways to do this: the `isolcpus` kernel parameter, which is deprecated but is slightly easier to use, and cpusets, which are not deprecated but harder to use. To use `isolcpus`, simply set the cpus to use for benchmarking as the isolated CPUs on the kernel cmdline and reboot (OK, maybe not that simple for one-shot benchmarking, but fairly easy for a machine that is mostly used for benchmarking). There is no need to set the CPU affinity for the `benchpress` binary: it will never be scheduled on the isolated CPUs, and neither will any other processes (unless manually required to). To use cpusets (which is the solution I employed on our benchmark machine at OCamlPro), you should create a `system` cpuset that contains only the CPUs that will *NOT* be used by benchpress, and move all processes to that cpuset (this can be done on a running system, consult the cpuset documentation). Then, create another cpuset that contains the CPUs to use for benchpress, including the CPUs to use for the `benchpress` binary (in practice I use the root cpuset that contains all the CPUs), and run `benchpress` in that cpuset. You must not forget to use `taskset` to prevent the `benchpress` binary from using the CPUs destined for the provers. In practice, I move the shell that I use to run benchpress to that second cpuset. - Input files are copied into RAM (through `/dev/shm`) before running the provers, and the input to the prover is the copy in RAM. This ensures we don't benchmark the time it takes to load the problem from disk, and hels minimise noise in situations of high disk contention (such as when tens of prover instances are trying to read their input at the same time). We are still limited by the RAM bandwidth but… what can you do. - Standard output and standard error of the prover are written to temporary files in RAM instead of using pipes. This ensures that the prover never gets stuck waiting for benchpress to read from the pipes (although this shouldn't be an issue due to huge default pipe buffer sizes on Linux), but more importantly minimises context switches between the prover and benchpress. With these changes we observe much less noise in benchmark results internally at OCamlPro, and this all but fixes sneeuwballen#58 (that said, it would still be nice to add support for getting execution times through getrusage).

This is configured with on the command-line with the new `--cpus` option, which replaces `-j` and allows to specify a list of cpus or cpu ranges in taskset format, such as `0-3,7-12,15` (strides are not supported). When the `--cpus` option is provided, provers will be run in parallel on the provided cpus, with at most one prover running on a given cpu at once. Note that the `--cpus` setting *only* concerns the prover runs (and some glue code around one specific prover run): it does *not* otherwise restrict the cpus used by benchpress itself. It is recommended to use an external method such as the `taskset` utility to assign to the toplevel benchpress process a CPU affinity that does not overlap with the prover CPUs provided in `--cpus`. When using the `--cpus` setting, users should be aware that *there may still be other processes on the system using these cpus* (obviously)! It is thus recommended to use one of the existing techniques to isolate these CPUs from the rest of the system. I know of two ways to do this: the `isolcpus` kernel parameter, which is deprecated but is slightly easier to use, and cpusets, which are not deprecated but harder to use. To use `isolcpus`, simply set the cpus to use for benchmarking as the isolated CPUs on the kernel cmdline and reboot (OK, maybe not that simple for one-shot benchmarking, but fairly easy for a machine that is mostly used for benchmarking). There is no need to set the CPU affinity for the `benchpress` binary: it will never be scheduled on the isolated CPUs, and neither will any other processes (unless manually required to). To use cpusets (which is the solution I employed on our benchmark machine at OCamlPro), you should create a `system` cpuset that contains only the CPUs that will *NOT* be used by benchpress, and move all processes to that cpuset (this can be done on a running system, consult the cpuset documentation). Then, create another cpuset that contains the CPUs to use for benchpress, including the CPUs to use for the `benchpress` binary (in practice I use the root cpuset that contains all the CPUs), and run `benchpress` in that cpuset. You must not forget to use `taskset` to prevent the `benchpress` binary from using the CPUs destined for the provers. In practice, I move the shell that I use to run benchpress to that second cpuset. This helps with sneeuwballen#58

bclement-ocp mentioned this issue Jun 7, 2023

Better benchmark stability #69

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong running time #58

Wrong running time #58

Halbaroth commented Mar 15, 2023

Halbaroth commented Mar 15, 2023 •

edited

Loading

c-cube commented Mar 15, 2023

Gbury commented Mar 15, 2023

Halbaroth commented Mar 15, 2023

Wrong running time #58

Wrong running time #58

Comments

Halbaroth commented Mar 15, 2023

Halbaroth commented Mar 15, 2023 • edited Loading

c-cube commented Mar 15, 2023

Gbury commented Mar 15, 2023

Halbaroth commented Mar 15, 2023

Halbaroth commented Mar 15, 2023 •

edited

Loading