# Structure and randomness of primes

Are the primes scattered over the numerical axis like wind-dispersed seeds? Of course not: simplicity is not a matter of chance, but the result of elementary arithmetic. A number is simple if and only if no smaller positive integer except one does not divide it whole.

But this is not the end of the story. The distribution of prime numbers looks random, with uneven discontinuities and clusters that look rather chaotic. If there is some scheme, it is incomprehensible. In fact, simple numbers look random enough to be played with them in dice. Create a list of successive prime numbers (say, starting with 11, 13, 17, 19, ...) and divide them by modulo 7. In other words, divide each prime number by 7 and save only the remainder. The result is a sequence of integers from the set {1, 2, 3, 4, 5, 6}, which looks almost like the result of several shots of the right bone.

Having worked with a larger sample (the first million prime numbers is greater than ), I calculated the number of primes with each of the six possible residues modulo 7 (also known as the six possible congruence classes modulo 7). I also feigned a million throws of a hexagonal bone. Can you say, looking at these two results, what of them what?

1 2 3 4 5 6

166 787 166 569 166 714 166 573 166 665 166 692

120 -98 47 -94 -2 25

1 2 3 4 5 6

166 768 166 290 166 412 166 638 167 282 166 610

101 -377 -255 -29 615 -57

In each table, the first line means the number of results in each of the six classes. The second line shows the difference , where - the average value of . In both cases, it seems that the numbers are distributed fairly evenly, without obvious displacements. The first table shows the remainders of prime numbers from division by modulus 7. Their distribution is slightly flatter in comparison with the simulated bone and with smaller deviations from the mean; the standard deviation of the two samples is 84 and 346, respectively. Judging by these tables, it seems that any of the processes can be used to provide the randomness required in a conventional dice game.

But the randomness condition is not only the uniform distribution of the results in the admissible interval. In addition, individual events in the sequence must be independent of each other. One roll of the dice should not affect the result of the next throw. To verify independence, one can look at successive events. How many times does one unit, or two, or three, go for the unit, and so on? We will write the number of all 36 possible pairs in the matrix . If the process is really random, all 36 pairs should have the same frequency, if small statistical fluctuations are not taken into account. You can turn the matrix into a color "heat map", in which cells with an amount above the average will be displayed in warm shades of pink and red, and below the average - are filled with colder shades of blue. (The insertion quantities are not the actual number , but the normalized variable is , where is again the average value, in our case .) Here is the heat map of the simulated right bone:

*Figure 1.*

There is nothing interesting. Almost all quantities are so close to the average value that the cells of the matrix appear to be neutral gray, only a few turned out to be pale pink or blue. This is exactly what you expect when successive dice rolls do not correlate with each other, and all possible pairs are equally likely.

Now we pass to the corresponding matrix of successive prime numbers modulo 7:

*Figure 2.*

Wow! It looks like we left the Country of random numbers, here the old gray movie turned into Technicolor. On the thermal map, a blue band appeared along the main diagonal (from the left top to the right bottom), meaning that successive pairs of primes having the same value of the remainder of the division by 7 are very unlikely. In other words, pairs occur less often than it would in a really random sequence. The off-diagonal (directly above the main diagonal) has a lighter blue color. This means that the pairs , where , appear a little less frequently than the average frequency. For example, and have slightly negative values of the normalized frequency. On the other hand, the sub-diagonal (under the main diagonal) is completely pink and red; such pairs as or , where , arise with a frequency above the average. Far from the diagonal, in the upper right and lower left corner we see a pastel chess pattern.

If you prefer dealing with numbers, not with colored squares, here's the matrix of values:

pairs of consecutive primes mod 7

1 2 3 4 5 6

1 15656 24376 33891 29964 33984 28916

2 37360 15506 22004 32645 25095 33959

3 25307 41107 14823 22747 32721 30009

4 32936 26183 37129 14681 21852 33791

5 24984 34207 26231 41154 15560 24529

6 30543 25190 32636 25382 37453 15488

Deviations from uniformity can be called anything, but not "insignificant." In the third row, for example, you can see that if you just met 3 in the sequence of prime numbers mod 7, then the next number is much more likely to be 2, and not yet another 3. If you were playing a game with bones out of prime numbers, then such a shift would have a huge impact on the result. Bones on prime numbers are "charged" bones!

These remarkably strong correlations in pairs were discovered by Robert J. Lemke-Oliver and Kannan Soundararajan of Stanford University. They discussed them in preprint , published on arXiv in March this year. For me, the most surprising thing about this discovery was that no one for many years noticed these structures. They are pretty obvious if you know where to look from.

I think you can not blame Euclid, because he did not find them: in antiquity ideas of chance and probability were not developed. But what about Gauss? He was a connoisseur of prime numbers, and he created his own lists of thousands of primes. In my youth he wrote : "one of my first projects was to observe the decreasing frequency of prime numbers, for which I calculated several thousand prime numbers ..." Moreover, Gauss practically invented the idea of congruence classes and modular arithmetic. But obviously, he never suspected that in the congruences of pairs of successive prime numbers something strange could lurk.

In the 1850s, the Russian mathematician Pafnutii Lvovich Chebyshev pointed to a slight distortion in prime numbers. The division of odd prime numbers modulo 4 divides them into two subsets. All primes in the series 5, 13, 17, 29, 37, ... are congruent to 1 modulo 4; in series 3, 7, 11, 19, 23, 31, ... are congruent to 3 modulo 4. Chebyshev noted that primes in the second category appear to be more numerous. For example, among the first 10 000 odd primes there are 4 943 congruent 1 and 5 057 congruent 3. But this effect is small compared to the irregularities seen in pairs of consecutive primes.

In our time, several authors have seen a hint of the phenomenon of successive prime numbers. Lemke-Oliver and Soundararajan mention three such testimonies. (See the list of reference materials at the end of the article.) In the 1950s and 60s, Stanislav Knapowski and Pal Turan explored various aspects of the remainder of the prime division by modulo m. In an article posthumously published in 1977, they considered the division of successive prime numbers modulo 4 with residues 1 or 3. They "assumed" that successive pairs with the same remainder and pairs with differing remnants "are not equally probable." In 2002, Chun-Min Co. investigated the series of consecutive prime numbers (not just their pairs) and created the elaborated fractal patterns based on their varying frequencies. Then in 2011 Avner Ash and his colleagues published a detailed analysis of "Frequencies of consecutive pairs of prime numbers", including several matrices, in which the diagonal decrease is clearly visible.

Given these precedents, were Lemke-Oliver and Soundararajan actually discoverers of correlations of consecutive primes? In my opinion, the answer should be positive. Although others have seen these patterns before, they did not describe them in such a way that they were fixed in the consciousness of the mathematical community. In fact, when Lemke-Oliver and Soundararajan announced their findings, the reaction was a surprise bordering on mistrust. Erica Clararaich in the article for Quanta quoted the reaction of the Oxford number theorist James Maynard:

When Soundararajan first told Maynard about the opening of the pair, Maynard said: "I believed him only half." As soon as I returned to my office, I conducted a numerical experiment to see for myself. "<Tgsrbq>Among the smallest prime numbers (x is less than about 600), the jumping champion is usually 2, but then wins 6 and dominates a fairly large interval of the numerical axis. Approximately near number 6 is inferior to the title of 30, which eventually loses 210. Odlyzhko with colleagues estimated that this last transition occurs approximately at . The numbers in this sequence of jumping champions - 2, 6, 30, 210, ... - are primaries ; The n-th primitive is the product of the first n prime numbers.

Obviously, this was a general reaction. Evelyn Lamb in an article for Nature quotes Soundararajan: "Everyone to whom we talked about this, eventually wrote our computer program to see for yourself. "

I did the same! Over the past few weeks, I've written a bunch of lines of code to analyze the division of prime numbers modulo m. My article is an account of my attempts to understand where these patterns come from. My methods are more computational and visual than mathematical, I can not prove anything. Lemke-Oliver and Soundararajan have applied a more rigorous and analytical approach, at the end of this article I will tell a little more about their results.

If you want to start your own investigation, you can take as a basis my code. It is written in the Julia programming language, is packaged in Jupyter notebook and is available on GitHub . (It so happened that this program became my first non-trivial experiment with Julia.) In another article, I will tell you more about my work with the language.)

In all the examples presented above, the division of primes by modulo 7 is given, but in the number 7 there is nothing special. I chose it simply because the six possible residues {1, 2, 3, 4, 5, 6} correspond to the faces of the ordinary cubic bone. Other modules give similar results. Lemke-Oliver and Soundararajan have spent most of the analysis on dividing prime numbers modulo 3, in which there are only two classes of congruences: a prime number greater than 3 must have a remainder of 1 or 2 when dividing by 3. Here is the matrix of pairs for the first million prime numbers :

1 2

1 218578 281453

2 281454 218514

Figure 3.

The pattern is quite minimalistic, but we still recognize: elements outside the diagonal for sequences and more than the elements on the diagonal for and

Figure 4.

Clearly visible is the blue bar on the main diagonal, although in other places of the matrix the pattern is slightly weakened and blurred.

I found that the correlations between consecutive prime numbers appear most clearly when the module itself is a prime number, and yet not too small. Look at the heat cards for consecutive prime numbers mod 13 and mod 17:

Figure 5.

Figure 6.

What about module 31?

Figure 7.

From these matrices you will get an excellent pattern for the kilt or mosaic floor, right? And in all of them we see interesting patterns. Diagonal bands are allocated not only on the main diagonal, but also in the entire matrix. Such strips also create a chessboard pattern. Along any row or column, cells are usually alternately red and blue. A more inconspicuous feature is the approximate two-sided symmetry along the opposite diagonal (going from the left to the bottom right). If you add a square along this line, then the cells connected together will be approximately the same color. (This fact was noticed by Ash and his co-authors.)

For further analysis, I decided to focus on consecutive prime numbers mod 19. The module is large enough to result in clearly differentiated strips, but not so large as to make the matrix huge.

Figure 8.

How to understand the meaning of what we see? We begin by observing that all prime numbers in our sample are odd, and therefore all the intervals between these simple ones are even numbers. For any given simple the following candidates for the rank of prime number become . Is this somehow related to the chessboard pattern? If the steps between prime numbers are to be a multiple of 2, this can definitely create correlations between every second cell of any column or row. (Indeed, the correlations with the cell must be clearly seen once, all the even elements must equal zero if the module is an even number.) You can only fill an exact cell with a "cyclic return" of the edge of the matrix to an odd boundary.)

The diagonal bands in the matrix assume a strong correlation between all pairs, separated by a certain numerical interval. For example, the darkest blue diagonal and the brightest red diagonal consist of cells separated by six elements along the j-axis. The first line contains cells 1 and 7, then 2 and 8, 3 and 9, and so on. It seemed to me that this relationship is easier to perceive if the matrix is "bent", so that the diagonals become columns. The idea is to apply a circular shift to each line, all values in the line will move to the left, and those that "fall" from the left edge will be inserted to the right. The first line is shifted to zero positions, the next line is moved to one position, and so on. (Is there a name for such a transformation? I'll just call it a "bend".)

Then I wrote the code for this conversion, but as a result I did not get exactly what I expected:

Figure 9.

What are these zigzags along the back diagonal? I assumed that I made a mistake and shifted one extra element. And the root of the problem really turned out to be this, but the "bug" is not in the algorithm, but in the data itself. The matrices shown by me in all the previous drawings are only partial, they discard the empty congruence classes. In particular, the matrix for prime numbers modulo 19 ignores all prime numbers congruent to 0 modulo 19 on the logical basis that there are no such primes. In the end, if and , then can not be simple because it is divisible by 19. However, the string and column for are the real part of the matrix. If you add them, our color-coded display will look like this:

Figure 10.

The presence of a null row and a column makes the definition of the transformation a bend more accurate: for each line apply a cyclic shift left at places. The resulting curved matrix also looks neater:

Figure 11.

What do these vertical bands tell us? In the original matrix, the element was the frequency with which follows . Here the cell color denotes the frequency with which follows . In other words, each column combines elements with the same mod 19 interval between two prime numbers. For example, the leftmost column contains all pairs, separated by an interval of the length of <bright> red column () in takes into account all cases where consecutive prime numbers are separated from one another .

Color coding gives us a qualitative idea of which intervals appear more or less often. For a more accurate quantitative measurement, we can sum along the columns and display the results in a bar graph:

Figure 12.

Three observations:

The above non-uniformity of even-odd numbers is clearly visible on the graph. With the exception of 0, every even interval is allocated against the background of its odd neighbors.

The interval between prime numbers mod 19, equal to 6, is the most popular of all. Numbers that are multiples of 6 (namely 12 and 18) are also frequent, but to a lesser extent.

The interval between prime numbers mod 19, equal to 0, is remarkably unpopular. These are the elements along the main diagonal of the original matrix, so it is not surprising that their sum is at the bottom, but the deficit is stronger than I expected.

I wanted to understand the sources of these patterns. What makes the interval 6 so attractive for pairs of successive prime numbers, and why almost all prime numbers eschew the poor interval 0?

I already have a remote guess about popularity 6. In the 1990s Andrew Odlyzhko, Michael Rubinshtein and Marek Wolfe performed a computational study of "jumping champions" among prime numbers:

<blockquote> An integer D is called a jumping champion if its most frequently occurring difference between consecutive prime numbers ≤ x for some x.

Why do primaries become the preferred intervals between consecutive prime numbers? If is a fairly large prime number, then can not be divided into 2, can not be divided into 2 and 3, can not be divided into 2, 3 and 5, and in general , where is the n-th primitive, is not divided into any of the first n prime numbers. Of course, can still be divided into some large prime number, or there may be another simple the number between and , such that the interval between prime numbers is not guaranteed to be a primitive. But these intervals have an advantage over other applicants.

We can see this justification in action, taking the difference between the consecutive elements of our list of one million eight-digit prime numbers and plotting their frequency:

*Figure 13.*

Again, the interval 6, which totals 13.7% of the total, clearly stands out, large multiple numbers of 6 also stand out among their immediate neighbors. It is worth noting and the general form of distribution: the cluster on the left (with a peak at 6), followed by a gradual decrease. The trend is similar to the Poisson distribution, and in fact it can be considered a correct description.

The color scheme cuts a set of data into fractions of 19 values in each. The blue fraction, which includes intervals between prime numbers from 0 to 18, counts 68 percent of all the intervals present in a sample of a million prime numbers; the gold share adds another 23 percent. The remaining 9 percent are distributed widely and evenly. Not all intervals are shown in the graph: the spectrum extends to 210. (One pair of consecutive prime numbers in the sample has a distance of 210, namely 20,831,323 and 20,831,533.)

It looks like Figure 13 shows most of the pattern of consecutive prime numbers modulo19. I can make the schedule even more informative by making a simple change. We start to shift each part of the 19-elements, until it is aligned with a fraction of 0, going to the columns that are in the same column. Since the second (golden) share moves to the left until column 19 aligns with column 0, and the third (pink) portion connects column 38 to column 0. Physically, this process can be represented as a rotation of the graph around a cylinder with a circumference of 19. It is mathematically a division intervals between prime numbers modulo 19.

*Figure 14.*

If you do not pay attention to too bright colors, Figure 14 is similar to Figure 12: the heights of all the columns are the same. This should not surprise us. In Figure 12, we divide primes by modulo 19, and then take the differences between successive separated prime numbers. In Figure 14, we take the differences between the prime numbers themselves, and then divide these differences modulo 19. Two procedures are similar:

Having studied the colors, we can arrange all the fragments of the puzzle into place. Why are the prime numbers mod 19 so often separated by an interval of 6? Well, in fact mod 19 has little to do with this; the very number 6 in this sample is the most frequent interval between prime numbers. The only other significant contribution to the "third" share, in particular a few pairs of primes with a distance of 44.

The superiority of the first fraction also explains the inequality between even and odd intervals. All intervals in the first fraction are necessarily even, odd intervals (mod 19) begin to appear only in the second share (intervals from 19 to 37) and only for this reason they are less frequent. For eight-digit prime numbers in the sample, more than two-thirds of the consecutive pairs are closer than 19 units, and therefore are in the first fraction. (The median distance between prime numbers is 12. The average interval is 16.68, which is close to the theoretical prediction at 16.72.)

Finally, Figure 14 can tell us something about the rarity of the intervals 0 mod 19. No two consecutive prime numbers can fall into one class of congruence mod 19, unless they are separated by a distance of 38 or a multiple of 38. Therefore, such pairs do not go on stage before the third share, and in it there can not be too many. The sample of a million prime numbers contains 8,384 consecutive pairs with a distance of 38 - less than one percent of the total number. And this is the main reason that the bone of primes so often falls out the same face twice in a row. This is the reason for the appearance of the blue diagonal strip in all matrices.

I find it very interesting that we can explain so much about the pattern of successive prime numbers mod m without going into details and not considering the special properties of prime numbers. In fact, we can recreate a significant part of the pattern without using prime numbers at all.

Two hundred years ago, Gauss and Legendre noticed that in the neighborhood of the number of all integers , which are simple, is about . In 1936, Swedish mathematician Harald Kramer suggested interpreting this share as a probability. The idea was to go through all the integers in order, taking each with probability . Numbers in the accepted set will not be simple except by coincidence, but they will have the same large-scale distribution as the distribution of primes. Here are the first few elements of the list of a million such "prime Cramer numbers", where the random selection process begins with :

*Figure 15.*

The prominent diagonal features look similar, but they are much simpler than in the corresponding graphs of prime numbers. For any prime Cramer number p mod 19, the most likely successor is p + 1 mod 19, and the least probable one is p + 19 mod 19. Between these limiting cases, there is a smooth gradient of frequency or probability with only a few small fluctuations, which most likely can be write off for statistical noise.

In this matrix there is only a chess pattern. We can partially restore its structure, generating a new set of numbers, which I call "semisimple Cramer numbers." They are created by the same probabilistic sifting of integers, but this time we consider only odd numbers as candidates and change the probability to to keep the same density:

*Figure 16.*

That's better! If you exclude from the sequence of even numbers, the minimum interval between semisimple numbers is 2, and this is also the most likely location.

Adding one more change, we will come even closer to imitating to the matrix of true prime numbers. In addition to eliminating all integers multiple of 2, we also get rid of multiples of 3 and change the sampling probability accordingly. We call the resulting numbers "semi-semisimple numbers of Cramer."

*Figure 17.*

Notice that 6 mod 19 is the most probable interval between half-semisimple Cramer numbers, as well as between true prime numbers, and that the same echoes are present on intervals 12 and 18. Indeed, the only obvious difference between this matrix and Figure 10 (the corresponding graph for true prime numbers) is in the column and row for numbers congruent to 0 modulo 19. There can not be any prime numbers among such numbers. If we eliminate them from the Cramer numbers, then the two matrices become almost indistinguishable. Here they are both:

*Figure 18.*

If you look closely, you can find differences - look at the diagonal extension to the southeast of line 1, column 15 - but in general, these modified Cramer numbers surprisingly well simulate prime numbers. In both diagrams, symmetry is also noticeable with respect to the inverse diagonal. And do not forget that these two sets contain in total only 19 percent of their values: Cramer numbers include 189,794 true prime numbers.

I want to add one more twist to this story. All the above examples are based on prime numbers (or their analogs) of eight decimal digits, or in other words, numbers in the neighborhood of . Are the same results valid for primes with large values? Consider a table created from successive pairs of millions of 40-bit prime numbers, taken modulo19. The pattern turns out to be familiar, but more pale:

*Figure 19.*

Let's pass to simple numbers of 400 digits, again divided by modulo 19, and get the colors faded almost beyond recognition:

*Figure 20.*

The blue band on the main diagonal is hardly distinguishable, and the remaining features turn into simple random spots.

So, it seems that for pairs of consecutive prime numbers, size matters. To understand the reasons for this, let's look at a group of differences between consecutive prime numbers in a 40-bit sample:

*Figure 21.*

Compared to the distribution of intervals for eight-digit prime numbers (Figure 13), the spectrum is much wider and flatter. In this form, the graph is truncated at the interval 240; the long "tail" actually stretches far to the right, and the largest gap between consecutive prime numbers is in 1 328. Also, as Odlyzhko and his colleagues predicted, the most frequent interval between 40-bit prime numbers is not 6 but 30.

Because of the wider distribution of intervals, the first share can not dominate the behavior of the system as it does among the eight-digit prime numbers. When we begin to place one of the other mod 19 fractions above (Figure 22 below), the first six or eight shares make a significant contribution. The unevenness of even-odd is still present, but the amplitude of these oscillations is much lower. The leftmost column of the graph, representing intervals congruent to 0 modulo 19, lags behind in growth, but not so much.

*Figure 22.*

Alignment of the spectrum becomes even more pronounced in the sample of a million 400-bit prime numbers:

*Figure 23.*

Now the gaps between the prime numbers stretched to 15 080, creating almost 800 shares of mod 19 (however, only 13 are shown). And here in the array there is a very intriguing comb structure. In general, the columns on numbers multiples of 6 stand out almost twice as compared to the height of the nearest neighbors, showing the continuing influence of the smallest prime divisors 2 and 3. Numbers that are multiples of 6, which are also multiple of 30, reach even greater heights. The values in the sequence 42, 84, 126, 168, 210, ... also make a big contribution: these numbers are multiples of . And notice that 210, which is a multiple of 6 and 30 and 42, is a new interval-champion, again confirming Odlyzhko's prediction.

Despite all this internal structure, when the columns arranged one on another mod 19, the mixture of 800 dol is so accurate that the heights are almost identical. The only thing that remains is a small unevenness of even-odd ones.

*Figure 24.*

And the chronically unpopular class of congruent 0 modulo 19 intervals finally found itself equal. Most of the height of the column is obtained not from a dozen early dots, but from a hundred later, representing intervals between 228 and 15 080 (all of them accumulated in the turquoise region of the graph).

Experiments with large prime numbers allow us to make a plausible assumption: when the size of prime numbers tends to infinity, all traces of correlations gradually fade, and successive pairs of prime numbers will be as random as throws of an ideal bone. But is it? There are several reasons to be skeptical about this hypothesis. First, if we increase the modulus m together with the size of prime numbers, making it comparable in magnitude to the median gap between prime numbers, then correlations will continue to manifest. In my 40-bit sample, the median gap between the prime numbers is 66, so let's look at the matrix of consecutive pairs mod 61. (To limit the statistical noise, I will do calculations with a sample of not one but ten million 40-bit primes.)

*Figure 25.*

The bands are back! And in fact, in addition to the familiar bright red bands at intervals of 6, there are more scattered pink and blue frequencies with a period of 30. I would like to look at the matrix for simple 400-bit numbers, which may have even more complex features, with interacting waves in the periods 6, 30 and 210. Unfortunately, I can not show you this picture. The median gap between 400-bit primes is about 640, so we'll have to assign m a value equal to a prime number in this interval, say 641. To fill the matrix, it takes about one billion consecutive 400-bit numbers, which is more than I'm ready to calculate.

There are other reasons for doubting that correlations disappear completely as the primes increase. The comb structure, so noticeable in Figures 21 and 23, tells us that the rules of divisibility into small prime numbers strongly affect the distribution of large primes mod m, and this influence still does not disappear when the numbers increase. Moreover, even when m is much smaller than the median interval between prime numbers, the blue band remains weakly visible. Here is the matrix for pairs of consecutive 400-bit primes mod 3:

1 2

1 248291 251128

2 251127 249453

The difference in the elements on the diagonals and outside the diagonals is much smaller than with the eight-digit prime numbers (compare with Figure 3), but the deviations still do not look like a random variation.

To get a clearer idea of how correlation varies as a function of the size of prime numbers, I decided to create a sample of primes in the entire range from single-digit to 400-bit numbers. In this project I decided to do better than Gauss: he entered the prime numbers into the tables by chiliads (by groups of 1,000), and I calculated them by myriads (groups of 10,000). To measure correlations among prime numbers mod m, I calculated the mean value of the diagonal elements of the matrix and the average of the elements outside the diagonals, and then took the ratio outside / on the diagonal. If the consecutive prime numbers do not correlate at all, then the ratio should tend to 1.

Figure 26 shows the result for 797 myriads of prime numbers mod 3. The curve is curved upwards, has a sharp initial drop, and then a much flatter segment. Starting from about 100 bits, there are samples with a ratio of less than 1, meaning that the diagonal is more densely populated than the areas outside the diagonal. But even at 400 bits, most of the ratios are still higher than 1. What do we see here? Is the curve slowly approaching the ratio 1, or is there a limiting value, slightly greater than 1? Unfortunately, computational experiments do not allow us to give a definitive answer.

*Figure 26.*

To address this problem, Lemke-Oliver and Sounararajan use other tools. Although they use numerical research

Vote for this post
Bring it to the Main Page |
||

## Comments

## Leave a Reply