This release improves the thread load balancing on CPUs with a large number of CPU cores. The worker threads now process smaller sieve intervals which improves the performance of short computations ≤10 seconds. On a 4th Gen AMD EPYC 9R14 CPU with 192 threads counting the primes up to 10^12 now runs 10% faster (in 1.187 secs) and counting the primes up to 10^11 runs 70% faster (in 0.115 secs).
ChangeLog
ParallelSieve.cpp
: Tune thread load balancing.