Skip to content

Commit

Permalink
Pre-sieve using primes <= 163
Browse files Browse the repository at this point in the history
  • Loading branch information
kimwalisch committed Nov 10, 2024
1 parent 4628dee commit 1af0bc4
Show file tree
Hide file tree
Showing 10 changed files with 6,675 additions and 3,853 deletions.
7 changes: 4 additions & 3 deletions ChangeLog
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
Changes in version 12.6, 08/11/2024
Changes in version 12.6, 10/11/2024
===================================

* CpuInfo.cpp: Correctly detect Intel Arrow Lake CPU cache
topology on Windows and Linux.
* PreSieve.cpp: Use static pre-sieve lookup tables, this avoids
initialization overhead to generate these lookup tables.
* PreSieve.cpp: Increased pre-sieving from primes <= 100 to
primes <= 163. Memory usage of pre-sieve lookup tables has been
reduced from 210 kilobytes to 123 kilobytes.

Changes in version 12.5, 22/10/2024
===================================
Expand Down
4 changes: 2 additions & 2 deletions cmake/auto_vectorization.cmake
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
# The andBuffers() function in PreSieve.cpp is important for
# The AND_PreSieveTables() function in PreSieve.cpp is important for
# performance and therefore it is important that this function is
# auto-vectorized by the compiler. For GCC & Clang we can enable
# auto vectorization using -ftree-vectorize.

# GCC/Clang enable auto-vectorization with -O2 and -O3, but for -O2
# GCC uses the "very-cheap" cost model which prevents our andBuffers()
# GCC uses the "very-cheap" cost model which prevents our AND_PreSieveTables()
# function from getting auto vectorized. But compiling with e.g.
# "-O2 -ftree-vectorize -fvect-cost-model=dynamic" fixes this issue.

Expand Down
2 changes: 1 addition & 1 deletion doc/ALGORITHMS.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ efficiently uses the CPU's multi level cache hierarchy.
### Optimizations used in primesieve

* Uses a bit array with 8 flags each 30 numbers for sieving
* Pre-sieves multiples of small primes < 100
* Pre-sieves multiples of small primes <= 163
* Compresses the sieving primes in order to improve cache efficiency [[5]](#references)
* Starts crossing off multiples at the square
* Uses a modulo 210 wheel that skips multiples of 2, 3, 5 and 7
Expand Down
46 changes: 28 additions & 18 deletions include/primesieve/PreSieve.hpp
Original file line number Diff line number Diff line change
@@ -1,26 +1,36 @@
///
/// @file PreSieve.hpp
/// @brief Pre-sieve multiples of small primes < 100 to speed up the
/// sieve of Eratosthenes. The idea is to allocate several
/// arrays (buffers_) and remove the multiples of small primes
/// from them at initialization. Each buffer is assigned
/// different primes, for example:
/// @brief Pre-sieve multiples of small primes <= 163 to speed up the
/// sieve of Eratosthenes. We use 16 static lookup tables from
/// which the multiples of small primes have been removed
/// upfront. Each preSieve lookup table is assigned different
/// primes used for pre-sieving:
///
/// buffer[0] removes multiplies of: { 7, 67, 71 } // 32 KiB
/// buffer[1] removes multiplies of: { 11, 41, 73 } // 32 KiB
/// buffer[2] removes multiplies of: { 13, 43, 59 } // 32 KiB
/// buffer[3] removes multiplies of: { 17, 37, 53 } // 32 KiB
/// buffer[4] removes multiplies of: { 19, 29, 61 } // 32 KiB
/// buffer[5] removes multiplies of: { 23, 31, 47 } // 32 KiB
/// buffer[6] removes multiplies of: { 79, 97 } // 8 KiB
/// buffer[7] removes multiplies of: { 83, 89 } // 7 KiB
/// preSieveTable[0] = { 7, 23, 37 }
/// preSieveTable[1] = { 11, 19, 31 }
/// preSieveTable[2] = { 13, 17, 29 }
/// preSieveTable[3] = { 41, 163 }
/// preSieveTable[4] = { 43, 157 }
/// preSieveTable[5] = { 47, 151 }
/// preSieveTable[6] = { 53, 149 }
/// preSieveTable[7] = { 59, 139 }
/// preSieveTable[8] = { 61, 137 }
/// preSieveTable[9] = { 67, 131 }
/// preSieveTable[10] = { 71, 127 }
/// preSieveTable[11] = { 73, 113 }
/// preSieveTable[12] = { 79, 109 }
/// preSieveTable[13] = { 83, 107 }
/// preSieveTable[14] = { 89, 103 }
/// preSieveTable[15] = { 97, 101 }
///
/// Then whilst sieving, we perform a bitwise AND on the
/// buffers_ arrays and store the result in the sieve array.
/// Pre-sieving provides a speedup of up to 30% when
/// sieving the primes < 10^10 using primesieve.
/// The total size of these 16 preSieveTables is 123
/// kilobytes. Whilst sieving, we perform a bitwise AND of all
/// preSieveTables and store the result in the sieve array.
/// Pre-sieving provides a speedup of up to 30% when sieving
/// the primes < 10^10 using primesieve.
///
/// Copyright (C) 2024 Kim Walisch, <[email protected]>
/// Copyright (C) 2022 @zielaj, https://github.com/zielaj
///
/// This file is distributed under the BSD License. See the COPYING
/// file in the top level directory.
Expand All @@ -38,7 +48,7 @@ class PreSieve
{
public:
static void preSieve(Vector<uint8_t>& sieve, uint64_t segmentLow);
static uint64_t getMaxPrime() { return 97; }
static uint64_t getMaxPrime() { return 163; }
};

} // namespace
Expand Down
Loading

0 comments on commit 1af0bc4

Please sign in to comment.