Algorithms
Raiting:
7

How to find prime numbers


imageThis article discusses the algorithms’ sieves for finding prime numbers. We consider in detail the classic Sieve of Eratosthenes, particularly its implementation in the popular programming languages, parallelization and optimization, and then we will describe a modern and fast Sieve of Atkin.

Here is a picture of the Sieve of Eratosthenes sculpture that was created by abstract expressionist Mark Di Suvero, constructed on the campus at Stanford University.

Introduction


The number can be called a prime if it has exactly two distinct divisors: figure one and itself. Numbers that have a greater number of divisors are called composite numbers. Thus, if we know how to factorize numbers then we can and check the numbers for simplicity. For example, something like this:

function isprime(n){
if(n==1) // 1 - not a prime number
return false;
/ / go over the possible divisors from 2 to sqrt (n)
for(d=2; d*d<=n; d++){
/ / if it divides evenly, then the composite
if(n%d==0)
return false;
}
/ / if there are no nontrivial divisors, then a simple
return true;
}
(Hereinafter is a JavaScript-like pseudo code unless something different will be mentioned.)
The test running time obviously is O(n½), i.e., it grows exponentially with respect to bit-length of n. This test is called the trial division.

It is quite unexpectedly that there are several ways to check the number simplicity without finding its divisors. If a polynomial factoring algorithm is a distant dream (the encryption security of RSA is based on it), then the developed test in 2004 for simplicity of AKS [1] works for polynomial-time.

Now, if we want to find all primes within a fairly wide range, the first impulse will probably be to test each number from the interval individually. Fortunately, if we have enough memory, we can use faster (and simpler) algorithms’ sieves. We will discuss two of them in this article: the classic sieve of Eratosthenes, which was known to the ancient Greeks, and the sieve of Atkin, the most perfect modern algorithm.

Sieve of Eratosthenes


The ancient Greek mathematician Eratosthenes suggested the following algorithm for finding all primes that not exceeding a given number n. Let us take an array S of length n and fill it with figures one (mark them as uncrossed). Now we will consistently see the elements of S [k], starting with k = 2. If S [k] = 1, then fill in with zeros (cross out or sow) all subsequent table cells, which numbers are multiple of k. As result is an array in which cells contain 1 if and only if the number of cell is a prime number.

A lot of time could be saved by noting that a composite number is less than n, at least one of the divisors does not exceed , the sowing process sufficiently to finish on . This is the Sieve of Eratosthenes animation, taken from Wikipedia:

image

A few more operations could be spared, if for the same reason, we begin crossing out multiples of k, starting with the number k2, but not with 2k.

image

The implementation takes the following form:

function sieve(n){
S = [];
S[1] = 0; // 1- not a prime number
/ / fill it with figures one
for(k=2; k<=n; k++)
S[k]=1;

for(k=2; k*k<=n; k++){
/ / if k is a simple (not crossed out)
if(S[k]==1){
/ / then we cross out multiples of k
for(l=k*k; l<=n; l+=k){
S[l]=0;
}
}
}
return S;
}

The effectiveness of the Sieve of Eratosthenes is caused by extreme simplicity of the inner cycle: it does not contain conditional transitions, as well as "heavy" operations such as multiplication and division.

We estimate the complexity of the algorithm. The first crossing out requires n/2 operations, the second is n/3, the third is n/5, etc. By the Mertens' formula

image

So the Sieve of Eratosthenes will require O(n log log n) operations. Memory consumption makes O(n).

Optimization and parallelization



The first optimization of the sieve suggested Eratosthenes himself: if among all even prime numbers 2 is only prime number, then let us save half the memory and time and write out and sow only odd numbers. The implementation of such algorithm modification would require only some cosmetic changes (code).

More advanced optimization (so-called wheel factorization) relies on the fact that all prime numbers except for 2, 3 and 5 lie in one of the eight following arithmetic progressions: 30k +1, 30k +7, 30k +11, 30k +13 , 30k +17, 30k +19, 30k +23 and 30k +29. In order to find all the prime numbers up to n, we first compute (again using a sieve) all primes to . Now, let us make eight sieves, each of them will include elements of the appropriate arithmetic progression less than n, and sow each of them in a separate stream. It is all done: we reduced the memory consumption and CPU load (in 4 times compared with the basic algorithm), as well we parallelized the algorithm.

Increasing the step progression and the number of sieves (e.g. we will need 48 sieves for 210 step progression that will save another 4% of resources) in parallel with the growth of n, we can increase the speed of the algorithm in the log log n times.

Segmentation


What do we do if despite all our tricks, there still is a RAM cram and the algorithm is swapping? We can replace a large sieve into a sequence of small sieves, and sow each separately. As above, we have to pre-prepare a list of primes up to , which takes O(n ½-ε) an additional memory. We do not need to store the prime numbers that were found in the process of sowing sieves. We will just give them to the output stream.

Let us do not make the sieves too small, they should not be less than O(n ½-ε) elements, because we will lose more and more in the productivity.

Sieve of Eratosthenes and one-liners


The set filtration by the data (for example,
primes = primes.select { |x| x == prime || x % prime != 0 } in Ruby), or the use of aka list comprehensions (for example,
let pgen (p:xs) = p : pgen [x|x <- xs, x `mod` p > 0] in Haskell) cause element wise checking divisibility. As a result the algorithm increases at least up to (this is the number of filtrations) that multiplied by (the minimum number of elements in the filtering set), where is the number of primes that does not exceed n, i.e. up to O(n3/2-ε) operations.

One-liner
(n: Int) => (2 to n) |> (r => r.foldLeft(r.toSet)((ps, x) => if (ps(x)) ps -- (x * x to n by x) else ps)) in Scala, it is near the Eratosthenes algorithm, because it avoids checking for divisibility. However, the constructing complexity of differential sets is proportional to the size of the larger of them, so the result is O(n3/2-ε) operations.

In general, the Sieve of Eratosthenes is difficult to implement effectively in the functional paradigm of invariant variables. If a functional language (e.g. OCaml) allows that it is worth to break the standard specifications and create an alterable array. In [3] is being discussed how to implement correctly the Sieve of Eratosthenes in Haskell using the technique of lazy crossings out.

Sieve of Eratosthenes and PHP


Let us write the algorithm of Eratosthenes in PHP. We will get something like this:

define("LIMIT",1000000);
define("SQRT_LIMIT",floor(sqrt(LIMIT)));

$S = array_fill(2,LIMIT-1,true);

for($i=2;$i<=SQRT_LIMIT;$i++){
if($S[$i]===true){
for($j=$i*$i; $j<=LIMIT; $j+=$i){
$S[$j]=false;
}
}
}

Here are two problems. The first one is connected with the system data types, and it is characteristic for many modern languages. The fact is that the Sieve of Eratosthenes is implemented most effectively in tehYaP, where we can declare a homogeneous array, sequentially located in memory. The calculation of the cell address S [i] takes a couple of processor commands. The array in PHP is actually a heterogeneous vocabulary, i.e. it is indexed by random strings or numbers, and it can contain different types of data. In this case, finding S [i] becomes a tedious task.

The second problem: the arrays in PHP take too much memory. For example, each element of $S consumes 128 bytes from the code above on a 64-bit system. As it was discussed above, it is not necessary to hold all sieve at once in the memory, it can be handled in parts.

In order to solve these problems is sufficient to choose a more appropriate data type - string!

define ("LIMIT", 1000000);
define ("SQRT_LIMIT", floor (sqrt (LIMIT)));

define("LIMIT",1000000);
define("SQRT_LIMIT",floor(sqrt(LIMIT)));

$S = str_repeat("\1", LIMIT+1);

for($i=2;$i<=SQRT_LIMIT;$i++){
if($S[$i]==="\1"){
for($j=$i*$i; $j<=LIMIT; $j+=$i){
$S[$j]="\0";
}
}
}

Now, every element takes exactly one byte, while the running time was decreased approximately in 3 times. Here is a script for measuring the speed.

Sieve of Atkin


A. O. L. Atkin and Daniel J. Bernstein proposed a new sowing method for the composite numbers in 1999, called the Sieve of Atkin. It is based on the following theorem.

Theorem. Let n - natural number, which is not divisible by any perfect square. Then:

1. If n is presented in the form 4k +1, then it is just if and only if, when the number of natural solutions of the equation 4x2+y2 = n is odd.
2. If n is presented in the form 6k +1, then it is just if and only if, when the number of natural solutions of the equation 3x2+y2 = n is odd.
3. If n is presented in the form of 12k-1, then it just then and only if, when the number of natural solutions of the equation 3x2−y2 = n for which x > y is odd.
The theorem proving can be found in [4].

The elementary number theory implies that all prime numbers more than 3 have the form 12k +1 (case 1), 12k +5 (again 1), 12k +7 (case 2) or 12k +11 (case 3).

In order to initialize the algorithm fill the sieve S with zeros. Now, for each pair (x, y) where are incremented values in the cells S [4x2+y2], S[3x2+y2], and if x > y, then in S [3x2−y2] as well. After computing the numbers of cells of the form 6k ± 1, containing odd numbers, they are prime numbers or divided into squares of primes.

As a final step, we go sequentially through supposedly prime numbers and cross out multiples of their squares.

It is clear from the description that the complexity of the Sieve of Atkin is proportional to n, rather than n log log n as in the algorithm of Eratosthenes.

The author's optimized implementation in C is presented in the form of primegen, and a simplified version is in Wikipedia.

So we can reduce the asymptotic complexity in log log n once and the memory consumption up to O(n½+o(1)), as in the Sieve of Eratosthenes using wheel factorization and segmentation.

Here is some info about log log


In fact the factor log log n grows very slowly. For example, log log 1010000 ≈ 10.Therefore, it can be understood as a constant, and the complexity of the algorithm of Eratosthenes is linear from a practical point of view, unless finding prime numbers is not a key function in the project. There can be used the basic version of the Sieve of Eratosthenes. However, it can be worth doing that when finding prime numbers at long intervals (starting from 232), because optimizations and the Sieve of Atkin can significantly improve its performance.

Literature


[1] Agrawal M., Kayal N., Saxena N. PRIMES is in P. - Annals of Math. 160 (2), 2004. - P.781-793.
[2] Vasilenko O. N. Number-theoretic algorithms in cryptography. - M., MCCME, 2003. – P.328.
[3] O'Neill M. E. The Genuine Sieve of Eratosthenes. - 2008.
[4] Atkin A. O. L., Bernstein D. J. Prime sieves using binary quadratic forms. - Math.Comp. 73 (246), 2003. - P. 1023-1030.
ZimerMan 9 january 2012, 14:29
Vote for this post
Bring it to the Main Page
 

Comments

0 JR February 10, 2012, 1:30
What about Hexile Sieving?
0 ZimerMan February 10, 2012, 13:06
Sorry, but for me Hexile Sieving is totally new... I can't say anything about it. Do you have any links where it describes?
0 JR February 29, 2012, 0:19
The url is : http://arxiv.org/pdf/1202.5948v1.pdf

0 abhishek0w March 27, 2012, 18:14
nice concepts....that...i learn .....very hlepfull......thanks
0 Jeremyv December 13, 2012, 20:00
Just used the string rather than array to solve Project Euler 10 in a couple seconds. The array was maxing out default memory size and I was looking for a more efficient sieve when I found your site. Although my sieve isn't more complex than the array I was using before, it does handing significantly larger numbers! Now I'll try a Sieve of Atkin approach.
0 LavonnaKitz November 6, 2013, 4:04
Wonderful learn, I simply handed this onto a colleague who was performing a little analysis on that. And he really purchased me lunchtime as a result of I discovered it for him smile So allow me to rephrase that: thank you for supper!
0 Carl A Bohn June 4, 2014, 4:15
I have studied prime number properties for almost 20 years. And after working through all the various division methods of locating prime numbers including the basic divide and conquer method to the E-sieve method, I have developed an algorithm that once having a clear understanding where prime numbers could occur, I can apply the E-sieve method by addition not by multiplication. The premise is to conserve all possible calculations for further use. The severe impediment of any divide and conquer method is you toss out all but the last calculation, upwards to thousands of such calculations just to determine one candidates primality. The method I have developed is based on a set pattern in which all prime numbers MUST occur - eliminating the so-called random nature of prime numbers. The method is so simple but would take a semester of graduate study to present it with any level of understanding. This method addresses the complications of large arrays, multi file management and data encoding for space conservation. And, all my code is in VB 6.0 on a lap top and is only 100 or so lines of code. And with minor expansion of the code can very easily determine prime numbers upwards to 100,000 places (base 10). I just don't have a mini-main frame to run up the program at full throttle.
0 kleop June 4, 2014, 9:18
Did you write an article about it? Perhaps, you could describe it in a few words and post it here.

Leave a Reply

B
I
U
S
Help
Avaible tags
  • <b>...</b>highlighting important text on the page in bold
  • <i>..</i>highlighting important text on the page in italic
  • <u>...</u>allocated with tag <u> text shownas underlined
  • <s>...</s>allocated with tag <s> text shown as strikethrough
  • <sup>...</sup>, <sub>...</sub>text in the tag <sup> appears as a superscript, <sub> - subscript
  • <blockquote>...</blockquote>For  highlight citation, use the tag <blockquote>
  • <code lang="lang">...</code>highlighting the program code (supported by bash, cpp, cs, css, xml, html, java, javascript, lisp, lua, php, perl, python, ruby, sql, scala, text)
  • <a href="http://...">...</a>link, specify the desired Internet address in the href attribute
  • <img src="http://..." alt="text" />specify the full path of image in the src attribute