Tuesday, April 8, 2008

On the Randomness of Pi



(Note: Before an astute reader points it out, a true test of the independence of the digits of pi is not the frequency of each digit. I am already working on an update to this end.)

As a man with a bit of a mathematical background I've always been sort of mystified by the popular obsession with the number pi. The interest is not entirely misplaced1 and can be evoked with a simple example:

Upon ejecting Adam and Eve from the Garden of Eden, God said to the Angels, all perfect in proportion and hence identical, that a boundary must be made around the Garden so that Mankind might never enter the garden again. He instructed Michael to grasp with one hand the Tree of Knowledge of Good and Evil, which would form the center of the forbidden Garden, and in his other grasp one of the thousand archangels, each grasping another until a chain of 100 archangels was formed. The last angel, Chamuel, grasping the Sword of Damocles, would then proceed to trace with its burning tip, a line in the lush undergrowth of the Garden, walking constrained by the angels, until he returned to his starting point, tracing out a circle around the Garden. God then instructed the archangels to guard this boundary by joining hands as before, along the line of demarcation, so that no point of the boundary could be crossed by Mankind ever again. To their dismay, no number of archangels, hands joined thusly, could form such a circle, for when 628 angels joined hands they found that there was not sufficient space for the final angel. Seeing this, God was silent, the angels dismayed, and Mankind, watching huddled from the wilderness, was filled with dread, for it now understood the world it had been expelled into.


In less metaphorical terms, pi is disturbing because it relates which ought to be two simple things, the radius of a circle, easily perceived, with its circumference, also simple enough to see, but it does so in a way which requires kinds of numbers well beyond the ken of day to day life.

This unfortunately leads to all sorts of misplaced fascination with the exact digits of pi, as though a mystical secret may be encoded in them. This is not the soul province of the casually interested, either. In Contact, a novel by none other than Carl Sagan, a thoroughly trained astronomer who probably should have known better, the digits of pi have hidden deep within them a message from the creators of the Universe.

So rather than try to talk up how interesting pi is, today I will focusing on how uninteresting it is. First, lets have some fun.

At nersc you can search the first four billion binary digits of pi for character strings. For instance, searching for Jesus gives us:

search string = "jesus"
25-bit binary equivalent = 0101000101100111010110011

search string found at binary index = 514534284
binary pi : 1110001001010001011001110101100110000000000000101011110110000011
binary string: 0101000101100111010110011
character pi : sxgepajxpkkt;gbjesus__bwvawwmn;n:,tyjj
character string: jesus
But a search for Muhammed, the Prophet of Islam, yields
search string = "muhammed"
40-bit binary equivalent = 0110110101010000000101101011010010100100

string does not occur in first 4 billion binary digits of pi
Which might disturb any Muslims reading except that a search for Islam gives a result at 758395516 digits but a search for Christianity fails.

Why can't pi tell us whose religion is right? What is going on with these mixed signals? The answer is that, practically speaking (and probably rigorously), the digits of pi are random. The reason Islam appears and Christianity doesn't is that Christianity has twelve letters in it while Islam has only 5. If each digit of pi is drawn randomly, in a completely uncorrelated way from those around it, and we are representing the digits in base 26 (so that we have the alphabet to work with), then the probability of getting any particular string of five characters is just one twenty-sixth to the 5th power, which is approximately 8.4e-8. The probability of finding Christianity (or any other 12 letter string, for that matter) is one twenty-sixth to the power 12 which is 1e-17, nine orders of magnitude smaller. Christianity should have chosen shorter name.

If you are looking for enlightenment in pi, you are quite probably just as well off doing so in the digits produced by a high quality random number generator.

We can, with enough data, test the hypothesis that pi's digits are random and get a confidence interval. Of course we can never completely disprove the possibility that pi is not random (and its actually not clear that a random pattern contains no information) but we can at least give a likelihood that our hypothesis is wrong.

We do this by grabbing a bunch of digits of pi and comparing the expected distribution of digits (all digits occur equally often) with the actual distribution calculated from the data. While it would be fun to count the digits ourselves, people with much greater resources than I have on hand have already counted the digits of pi out to 1200000000000 places. The results are

0 119999636735
1 120000035569
2 120000620567
3 119999716885
4 120000114112
5 119999710206
6 119999941333
7 119999740505
8 120000830484
9 119999653604
Our model for the digits of pi is a discreet probability distribution which produces one of ten results with equal probability. A standard test for whether certain data are drawn for a hypothetical distribution is the Pearson's Chi Squared test. This is given by


Where Oi is the observation (in this case the number of digits with value i and Ei is the expected number given by the model. In this case, for 1200000000000 digits of pi, the expected count for each digit is just a tenth of that value, or 120000000000. The value of chi in the above expression can be used with the partial distribution function of the chi squared distribution to ask "what is the probability that we measure this data set or a less likely one if the null hypothesis is true?" In this case, the answer is (from a digital back of the envelope calculation) p = 0.85 which gives us some sense that these numbers really are uniformly distributed.

Of course, so are the digits of a number like 3.012345678901234567890123456789(repeating). To really test the randomness of pi, we would have to test whether all pairs of digits appear uniformly, then all triples, then all quartets and so on. I leave this as an exercise to the reader, saying only that no result that I know of has ever suggested that there is invisible structure in the digits of pi.

While its never been mathematically proven that pi is normal, there are some recent results like the Bailey-Borwein-Plouffe formula which suggest that it's very likely (although not in a probabilistic sense). Maybe we will be lucky enough to live to see the day its proven random and we can all move on to spending our valuable time obsessing over the golden ratio.


Some interesting links:
Pi at the Wikipedia
Visualize Pi
Search Pi for Strings

Finally, if you like this post, you may enjoy reading my regular blog located at Dorophone.

1 ...or unwanted, for that matter, any interest in things other than NASCAR and the foibles of the Olsen Twins is welcome. Note to self: celebrity model destruction derby.

No comments: