Jump to: ↓ Some plots | ↓ Table of Contents
Euclid's algorithm "might be called the grandaddy of all algorithms", as Knuth aptly put it [5]. Countless modern-day students know the joy of performing the algorithm by hand, essentially in the form set down by Euclid in his Elements over two thousand years ago. Analyses of the algorithm go back as far as the 18th century, when it was understood (at least implicitly) that consecutive Fibonacci numbers give rise to the worst-case in terms of the number of divisions performed, relative to the size of the inputs. Average-case analyses and finer points received surprisingly little attention until the latter half of the last century. (See Knuth [5] and Shallit [9] for superlatively detailed accounts of the algorithm's fascinating history.) Only in recent decades has Euclid's algorithm been put into a general framework for the analysis of algorithms (by Baladi and Vallée [1] et al.), in which certain algorithms are viewed as dynamical systems: the mathematics is as beautiful as it is deep. Although the current state-of-the-art paints a fairly detailed picture with respect to the distributional analysis of Euclid's algorithm, it still tantalises us with open questions, and these questions remain focal points of some exciting research programmes.
Indeed, one of our goals here is to shed some light (numerically) on a certain "obscure" constant that arises in the analysis of Euclid's algorithm. Our main goal, though, is simply to demonstrate how fun and instructive it can be to study introductory discrete mathematics and statistics alongside basic Python. We'll code Euclid's algorithm, apply it to almost five billion pairs of integers to produce some data, then tabulate and visualise the results (the above animation is just one example). Along the way, we'll discuss the Fibonacci numbers and touch on dynamic programming, among other things like random walks... Libraries we'll use include NumPy, SciPy, Matplotlib, and pandas.
There are various ways of analysing Euclid's algorithm. Here is one example. Let
In the above animation, the distribution of
Jump to: ↑ § 1. Introduction | ↓ Table of Contents
Is this a random walk? See One-dimensional analysis: mean and error term for the answer.
See One-dimensional analysis: mean and error term for the context of this instance of square-root-cancellation.
See One-dimensional analysis: variance for more about the next two plots.
See One-dimensional analysis: distribution for an explanation of the animation below.
See Two-dimensional analysis: error terms & subdominant constant in the variance for an explanation of the plots below.
Jump to: ↑ Some plots | ↓ Libraries
§ 1... Introduction [Maths/Visuals]
............... Some plots [Visuals]
§ 2... Libraries [Code]
§ 3... A quick recap of the GCD and Euclid's algorithm [Maths]
............... Code for Euclid's algorithm [Code]
§ 4... Worst-case analysis [Maths/Code]
............... The Fibonacci numbers and dynamic programming [Code]
§ 5... Constants [Maths/Code]
§ 6... One-dimensional analysis [Maths]
............... Average-case analysis [Maths]
............... Error term, variance, and distribution [Maths]
§ 7... Two-dimensional analysis [Maths]
............... Average-case analysis [Maths]
............... Variance and distribution [Maths]
§ 8... Code for analysing Euclid's algorithm [Code]
§ 9... Generating and exporting/importing the raw data [Code]
§ 10... Numerical investigation & data visualisation
............... One-dimensional analysis: distribution [Maths/Code/Visuals]
............... One-dimensional analysis: mean and error term [Maths/Code/Visuals]
............... One-dimensional analysis: variance [Maths/Code/Visuals]
............... Two-dimensional analysis: error terms & subdominant constant in the variance [Maths/Code/Visuals]
............... Two-dimensional analysis: distribution [Maths/Code/Visuals]
References
Jump to: Table of Contents | ↓ A quick recap of the GCD and Euclid's algorithm
# We'll use the following libraries (by the end of this notebook).
import numpy as np # Naturally, we'll need to do some numerics.
import pandas as pd # We'll want to put data in data frames to import and export it, and to display it nicely.
from timeit import default_timer as timer # We'll want to see how long certain computations take.
import scipy.stats as stats # We'll want to plot normal curves and so on.
from scipy.optimize import curve_fit # We'll want to do some curve-fitting.
from scipy.special import zeta # For certain constants involving the Riemann zeta function.
import matplotlib.pyplot as plt # We'll want to plot our data.
from matplotlib import animation # We'll even make animations to display our data.
from matplotlib import rc # For animations.
from IPython.display import HTML # For displaying and saving animations.
from matplotlib.animation import PillowWriter # To save animated gifs.
Jump to: Table of Contents | ↑ Libraries | ↓ Code for Euclid's algorithm
Proposition 3.1. Let
Definition 3.2. The integer
The divisors of
Note that, for all
How can we compute
Euclid's algorithm. If
with the property that
But
and
Definition 3.3. Given
Jump to: Table of Contents | ↑ A quick recap of the GCD and Euclid's algorithm | ↓ Worst-case analysis
# Let's code Euclid's algorithm in a few different ways.
# No special libraries are required for this code block.
# If we just want gcd(a,b), this is about the most straightforward way.
def gcd(a,b):
if b == 0:
return abs(a)
return gcd(b,a%b)
# As a side-note, the definition of gcd extends in an obvious way to gcd of an n-tuple of integer, n >= 2.
# Namely, gcd(a_1,...,a_n) is the unique nonnegative integer with the following property.
# For any integer d, d divides every a_i if and only if d divides gcd(a_1,...,a_n).
# Thus, gcd(a_1,...,a_n) = gcd(a_1,gcd(a_2,...,a_n)), and we can use recursion to compute the gcd of an n-tuple, as below.
def gcdn(*ntuple): # arbitrary number of arguments
if len(ntuple) == 2:
return gcd(ntuple[0], ntuple[1])
return gcd(ntuple[0],gcdn(*ntuple[1:]))
# If we want gcd(a,b) and the number of steps (T(a,b)), we can use this.
def gcd_steps(a,b,i): # Input i = 0
if b == 0:
return abs(a), i # Output gcd(a,b) and i = T(a,b)
i += 1
return gcd_steps(b,a%b,i)
# If we want the sequence of remainders
# [r_0,r_1,r_2,...,r_n] (a = r_0, b = r_1, r_{n + 1} = 0, r_1 > r_2 > ... > r_n > 0),
# we can use this.
def rem(a,b,r): # Input r = []
r.append(a)
if b == 0:
return r
return rem(b,a%b,r) # gcd(a,b) = abs(r[-1]), gcd_steps = len(r) - 1
# If we want to record both remainders and quotients [q_1,...,q_n], we can use this.
def quot_rem(a,b,q,r): # Input r = [] and q = []
r.append(a)
if b == 0:
return q, r
(Q, R) = divmod(a,b)
q.append(Q)
return quot_rem(b,R,q,r) # rem(a,b,r) = quot_rem(a,b,q,r)[1]
# If we want to write out the entire process, we can use this.
def write_euclid(a,b):
q, r = quot_rem(a,b,[],[])
R = r[2:] # The next few lines are largely to facilitate the formatting of the output (alignment etc.)
R.append(0)
A, B, Q = r, r[1:], q
GCDcol = [f'gcd({a},{b})', f'gcd({b},{R[0]})']
for i in range(len(R) - 1):
GCDcol.append(f'gcd({R[i]},{R[i + 1]})')
for i in range(len(B)): # For display, put parentheses around negative integers.
if B[i] < 0:
B[i] = f'({B[i]})'
for i in range(len(Q)): # For display, put parentheses around negative integers.
if Q[i] < 0:
Q[i] = f'({Q[i]})'
pm = [] # Next few lines are so we display a = bq - r instead of a = bq + -r in the case where r < 0.
for i in range(len(R)):
if R[i] < 0:
R[i] = -R[i]
pm.append('-')
else:
pm.append('+')
colA = max([len(str(i)) for i in A]) # For alignment.
colB = max([len(str(i)) for i in B])
colQ = max([len(str(i)) for i in Q])
colR = max([len(str(i)) for i in R])
colGCDl = max([len(i) for i in GCDcol[:-2]])
colGCDr = max([len(i) for i in GCDcol[1:]])
if len(A) > 1:
for i in range(len(R)):
print(f'{A[i]:>{colA}} = {Q[i]:>{colQ}}*{B[i]:>{colB}} {pm[i]} {R[i]:>{colR}} \t ∴ {GCDcol[i]:>{colGCDl}} = {GCDcol[i + 1]:<{colGCDr}}')
print(f'\ngcd({a},{b}) = {abs(r[-1])}, T({a},{b}) = {len(r) - 1}\n')
# Try out some of the above code.
write_euclid(1011,69)
1011 = 14*69 + 45 ∴ gcd(1011,69) = gcd(69,45)
69 = 1*45 + 24 ∴ gcd(69,45) = gcd(45,24)
45 = 1*24 + 21 ∴ gcd(45,24) = gcd(24,21)
24 = 1*21 + 3 ∴ gcd(24,21) = gcd(21,3)
21 = 7* 3 + 0 ∴ gcd(21,3) = gcd(3,0)
gcd(1011,69) = 3, T(1011,69) = 5
write_euclid(-1011,-69)
-1011 = 14*(-69) - 45 ∴ gcd(-1011,-69) = gcd(-69,-45)
-69 = 1*(-45) - 24 ∴ gcd(-69,-45) = gcd(-45,-24)
-45 = 1*(-24) - 21 ∴ gcd(-45,-24) = gcd(-24,-21)
-24 = 1*(-21) - 3 ∴ gcd(-24,-21) = gcd(-21,-3)
-21 = 7* (-3) + 0 ∴ gcd(-21,-3) = gcd(-3,0)
gcd(-1011,-69) = 3, T(-1011,-69) = 5
Jump to: Table of Contents | ↑ Code for Euclid's algorithm | ↓ The Fibonacci numbers and dynamic programming
First let's record a result that will be useful in the sequel.
Proposition 4.1. We have the following for
(a)
(b) If
(c) For all positive integers
Proof. (a) Simply note that
$b \mid a$ if and only if there exists an integer$q$ such that$a = qb + 0$ .(b) If
$1 \le b < a$ then the first two divisions of the Euclidean algorithm for$\gcd(b,a)$ are:$b = 0a + b$ and$a = qb + r$ ,$0 \le r < b$ . The second division is the first done in the Euclidean algorithm for$\gcd(a,b)$ .(c) If (3.1) is the remainder sequence given by Euclid's algorithm for
$\gcd(a,b)$ , then for any positive integer$d$ ,
$$da = dr_0, db = dr_1 > dr_2 > \cdots > dr_n > dr_{n + 1} = 0$$ is the remainder sequence given by Euclid's algorithm for
$\gcd(da,db)$ (the quotients$q_i$ in (3.2) won't change).
# Let's illustrate the invariance of the number of divisions under (a,b) -> (da,db).
# Take (a,b) as in the previous code block and d = 67.
write_euclid(67*1011,67*69)
67737 = 14*4623 + 3015 ∴ gcd(67737,4623) = gcd(4623,3015)
4623 = 1*3015 + 1608 ∴ gcd(4623,3015) = gcd(3015,1608)
3015 = 1*1608 + 1407 ∴ gcd(3015,1608) = gcd(1608,1407)
1608 = 1*1407 + 201 ∴ gcd(1608,1407) = gcd(1407,201)
1407 = 7* 201 + 0 ∴ gcd(1407,201) = gcd(201,0)
gcd(67737,4623) = 201, T(67737,4623) = 5
What are the smallest possible values of
# Some very naive code to answer the question: what are the smallest possible values of a and b if
# a >= b >= 1 and T(a,b) = n?
# Basically a very inefficient way of computing Fibonacci numbers, the code really starts to get slow for n = 16.
# But the first few n are enough to help us see a pattern.
def W(n):
a, b = 1, 1
while gcd_steps(a,b,0)[1] < n:
a += 1
for b in range(1,a+1):
if gcd_steps(a,b,0)[1] == n:
return a, b
return a, b
print(W(1), W(2), W(3), W(4), W(5), W(6), W(7), W(8), W(9), W(10))
(1, 1) (3, 2) (5, 3) (8, 5) (13, 8) (21, 13) (34, 21) (55, 34) (89, 55) (144, 89)
It isn't difficult to see that for a given
write_euclid(55,34)
55 = 1*34 + 21 ∴ gcd(55,34) = gcd(34,21)
34 = 1*21 + 13 ∴ gcd(34,21) = gcd(21,13)
21 = 1*13 + 8 ∴ gcd(21,13) = gcd(13,8)
13 = 1* 8 + 5 ∴ gcd(13,8) = gcd(8,5)
8 = 1* 5 + 3 ∴ gcd(8,5) = gcd(5,3)
5 = 1* 3 + 2 ∴ gcd(5,3) = gcd(3,2)
3 = 1* 2 + 1 ∴ gcd(3,2) = gcd(2,1)
2 = 2* 1 + 0 ∴ gcd(2,1) = gcd(1,0)
gcd(55,34) = 1, T(55,34) = 8
These are the Fibonacci numbers!
Definition 4.2. Let
Proposition 4.3. Let
Proof. (a) Since
$f_{n + 2} = f_{n + 1} + f_{n}$ , we have$\gcd(f_{n + 2},f_{n + 1}) = \gcd(f_{n + 1},f_{n})$ . Since$\gcd(f_1,f_0) = \gcd(1,0) = 1$ , that$\gcd(f_{n + 1},f_n) = 1$ for all$n \ge 0$ now follows by mathematical induction.Also, since
$f_{n + 2} = f_{n + 1} + f_n$ and$0 \le f_n < f_{n + 1}$ for$n \ge 2$ , we have$T(f_{n + 2},f_{n + 1}) = T(f_{n + 1}, f_n) + 1$ for$n \ge 2$ . Since$T(f_{1 + 2},f_{1 + 1}) = T(2,1) = 1$ , that$T(f_{n + 2},f_{n + 1}) = n$ for all$n \ge 1$ now follows by mathematical induction.(b) We've verified the result for
$n = 1$ and$n = 2$ , so let$n \ge 2$ and suppose the result holds for$n$ . Let$T(a,b) = n + 1$ . Since this is greater than$1$ ,$b \nmid a$ (as noted above), and$a = qb + r$ for some$q$ and$r$ with$q \ge 1$ and$1 \le r < b$ . Now,$T(b,r) = n$ and so, by inductive hypothesis,$b \ge f_{n + 2}$ and$r \ge f_{n + 1}$ . Thus,$a = qb + r \ge f_{n + 2} + f_{n+1} = f_{n + 3}$ . In conclusion,$a \ge f_{n + 3}$ and$b \ge f_{n + 2}$ , and the result holds for all$n \ge 2$ , by mathematical induction.
Remark 4.4. For
# Let's test this out for small n.
# In the table below, row a column b contains T(a,b).
# Below the diagonal corresponds to a > b.
# Note that, for instance, the first time 4 appears below the diagonal is in row 8, column 5: 8 = f_6 and 5 = f_5.
test_dict = {}
for a in range(14):
test_dict[a] = {}
for b in range(14):
test_dict[a][b] = gcd_steps(a,b,0)[1]
pd.DataFrame.from_dict(test_dict, orient='index')#.astype('int')
0 1 2 3 4 5 6 7 8 9 10 11 12 13
0 0 1 1 1 1 1 1 1 1 1 1 1 1 1
1 0 1 2 2 2 2 2 2 2 2 2 2 2 2
2 0 1 1 3 2 3 2 3 2 3 2 3 2 3
3 0 1 2 1 3 4 2 3 4 2 3 4 2 3
4 0 1 1 2 1 3 3 4 2 3 3 4 2 3
5 0 1 2 3 2 1 3 4 5 4 2 3 4 5
6 0 1 1 1 2 2 1 3 3 3 4 4 2 3
7 0 1 2 2 3 3 2 1 3 4 4 5 5 4
8 0 1 1 3 1 4 2 2 1 3 3 5 3 6
9 0 1 2 1 2 3 2 3 2 1 3 4 3 4
10 0 1 1 2 2 1 3 3 2 2 1 3 3 4
11 0 1 2 3 3 2 3 4 4 3 2 1 3 4
12 0 1 1 1 1 3 1 4 2 2 2 2 1 3
13 0 1 2 2 2 4 2 3 5 3 3 3 2 1
Jump to: Table of Contents | ↑ Worst-case analysis | ↓ Constants
# Let's take a detour and compute some Fibonacci numbers.
# Computing f_n naively takes time exponential in n.
# E.g. to compute f_5 naively we do this:
# f_5 = f_4 + f_3
# = f_3 + f_2 + f_2 + f_1
# = f_2 + f_1 + f_1 + f_0 + f_1 + f_0 + 1
# = f_1 + 1 + 1 + 0 + 1 + 0 + 1
# = 1 + 1 + 1 + 0 + 1 + 0 + 1
# = 5
# We can think of this as a tree with f_5 at the top,
# and where each parent node has two children until the node corresponds to f_1 or f_0.
# We end up having to sum a sequence of 1's and 0's of length f_n,
# and we know f_n is of order phi^n, where phi is the golden ratio.
def naive_fib(n):
if n == 0:
return 0
if n == 1:
return 1
if n > 1:
return naive_fib(n - 1) + naive_fib(n - 2)
if n < 0:
return naive_fib(-n) # Why not?
naive_time = []
for n in range(31):
start = timer()
naive_fib(n)
end = timer()
naive_time.append(end - start)
# Of course it is very silly to compute fib(n) from scratch if you've just computed f_0,f_1,...,f_{n-1}.
# Let's compute f_n the way a human might do it: by starting at the beginning.
FIB = []
list_time = []
for n in range(101):
start = timer()
if (n == 0 or n == 1):
FIB.append(n)
else:
FIB.append(FIB[n - 1] + FIB[n - 2])
end = timer()
list_time.append(end - start)
# Now if we want to compute f_n but don't want to have to maintain an ever-expanding list
# of all previous Fibonacci numbers, we can do the following.
# This algorithm computes f_n in O(n) time for n >= 1
# (a unit of time being the time taken to do an addition and update a couple of variables).
# There are only about n additions needed in this algorithm.
# This is the archetypal example of dynamic programming, specifically memoization.
# a and b in the code below essentially constitute a hash table, which is updated in each iteration.
# Perhaps the previous code (generating a list of all Fibonacci numbers) up to f_n, is an example
# of tabulation.
def fib(n):
a, b = 0, 1 # a and b will be placeholders for fib(n - 2) and fib(n - 1)
if n == 0:
return a
if n == 1:
return b
if n < 0:
return fib(-n) # Why not?
for k in range(n-1):
c = a + b # c will be fib(n) ultimately.
a = b # First, k = 0, c = 0 + 1 = 1, a -> 1, b -> 1.
b = c # Then, k = 1, c -> 1 + 1 = 2, a -> 1, b -> 2. And so on.
return c
dynamic_time = []
for n in range(101):
start = timer()
fib(n)
end = timer()
dynamic_time.append(end - start)
fig, ax = plt.subplots()
fig.suptitle('Time to compute nth Fibonacci number')
x = [n for n in range(101)]
ax.plot(x[:31],naive_time,'r.', label='Naive recursion')
ax.plot(x,list_time,'b.', label=f'List all $f_k$ for $k \leq n$')
plt.plot(x,dynamic_time,'g+', label='Dynamic algorithm')
ax.set_xlabel(r'$n$')
ax.set_ylabel('units of time')
ax.grid(True)
ax.legend()
plt.show()
We can show that if
For any root
# Let's try this out.
# In fact, phi^n/sqrt(5) < f_n when n is odd, and phi^n/sqrt(5) > f_n when n is even.
phi, psi = np.roots([1,-1,-1])
def fib_closed_form(n):
return (phi**(n) - psi**(n))/(phi - psi)
for n in range(11):
print(f'n = {n}, f_n = {fib(n)}, phi^n/(phi - psi) = {phi**(n)/(phi - psi):.3f}..., nearest int = {int(np.rint(phi**(n)/(phi - psi)))}')
n = 0, f_n = 0, phi^n/(phi - psi) = 0.447..., nearest int = 0
n = 1, f_n = 1, phi^n/(phi - psi) = 0.724..., nearest int = 1
n = 2, f_n = 1, phi^n/(phi - psi) = 1.171..., nearest int = 1
n = 3, f_n = 2, phi^n/(phi - psi) = 1.894..., nearest int = 2
n = 4, f_n = 3, phi^n/(phi - psi) = 3.065..., nearest int = 3
n = 5, f_n = 5, phi^n/(phi - psi) = 4.960..., nearest int = 5
n = 6, f_n = 8, phi^n/(phi - psi) = 8.025..., nearest int = 8
n = 7, f_n = 13, phi^n/(phi - psi) = 12.985..., nearest int = 13
n = 8, f_n = 21, phi^n/(phi - psi) = 21.010..., nearest int = 21
n = 9, f_n = 34, phi^n/(phi - psi) = 33.994..., nearest int = 34
n = 10, f_n = 55, phi^n/(phi - psi) = 55.004..., nearest int = 55
We are now ready to establish the worst-case result for the number of divisions in Euclid's algorithm. The following propostion implies that for
Proposition 4.5. For
Note that
Proof. First of all, if
$m \ge 1$ then$f_{n + 1} \le m < f_{n + 2}$ for some$n \ge 1$ . Thus, in view of (4.1), we have (for$m \ge 2$ in the left inequality)
$$\frac{\log(m - 1)}{\log \phi} - 0.328 < n < \frac{\log(m + 1)}{\log \phi} + 0.673. \tag{$*$}$$ To see this in more detail, note that
$f_{k} = \lfloor \phi^k/(\phi - \psi)\rfloor$ if$k$ is even, while$f_{k} = \lceil \phi^k/(\phi - \psi)\rceil$ if$k$ is odd; also,$\phi - \psi = \sqrt{5}$ . Thus, supposing first that$n$ is even,
$$\frac{\phi^{n+1}}{\sqrt{5}} < f_{n+1} \le m < f_{n + 2} < \frac{\phi^{n+2}}{\sqrt{5}}.$$ Taking logarithms, we see that
$$n + 1 < \frac{\log m + \frac{1}{2}\log 5}{\log \phi} < n + 2,$$ leading (after a calculation) to
$$ \frac{\log m}{\log \phi} - 0.3277\ldots < n < \frac{\log m}{\log \phi} + 0.6722\ldots .$$ Similarly, if
$n$ is odd and$m \ge 2$ , then
$$ \frac{\log(m-1)}{\log \phi} - 0.3277\ldots < n < \frac{\log(m + 1)}{\log \phi} + 0.6722\ldots .$$ Combining gives
$(*)$ .(a) Now, let
$N \ge 2$ be given, and let$n \ge 2$ be such that$f_{n + 1} \le N < f_{n + 2}$ . By Proposition 4.3(b), for$1 \le b \le a \le N$ , we have$T(a,b) \le n - 1$ , with equality attained when$a = f_{n + 1}$ and$b = f_{n}$ . That is,
$$\max\{T(a,b) : 1 \le b \le a \le N\} = n - 1,$$ and the result follows by
$(*)$ .
We see that, for a given
Jump to: Table of Contents | ↑ The Fibonacci numbers and dynamic programming | ↓ § 6. One-dimensional analysis
We will presently begin our analysis of Euclid's algorithm (beyond the worst-case). Several constants will arise: let us record them here at the outset, for ease of reference.
The Euler-Mascheroni constant:
The Riemann zeta function
There is also the logarithmic derivative (
The reciprocal of Lévy's constant:
Porter's constant, also called the Lochs-Porter constant:
Note also that
We also have:
Note also that
Hensley's constant:
Finally, we have mysterious constants
(and our numerics suggest that
#CONSTANTS
# Constants needed to define other subsequent constants...
euler_mascheroni = 0.57721566490153286060 # Approximate value of Euler-Mascheroni constant.
zeta_der_2 = -0.93754825431584375370 # Approximate value of zeta'(2)
zeta_dder_2 = 1.98928 # Approximation to zeta''(2)
# The constants we actually need...
# For the mean in the one- and two-dimensional analyses
# This is the reciprocal of Levy's constant. We'll name it here in honour of Dixon.
lambda_dixon = 2*np.log(2)/zeta(2)
# Porter's constant
porter_constant = ((np.log(2))/zeta(2))*(3*np.log(2) + 4*euler_mascheroni - 4*zeta_der_2/zeta(2) - 2) - (1/2)
# For the mean in the two-dimensional analysis (subdominant constants)
# Norton's refinement of the mean. (Counting all pairs, not just a > b.)
nu_norton = -1 + lambda_dixon*(2*euler_mascheroni + 1.5*np.log(2) - 1.5 - zeta_der_2/zeta(2))
# Norton's constant for case of coprime pairs. (Counting all coprime pairs, not just a > b.)
nu_norton_coprime = nu_norton - lambda_dixon*zeta_der_2/zeta(2)
# For the variance in the two-dimensional analysis
# Hensley's constant in the variance, as given by Lhote.
eta_hensley = 0.5160524
# Subdominant constants in the variance for the two-dimensional analysis
# Our "guesstimate" of the subdominant constant in a certain variance
kappa_var = -0.1
delta_kappa = np.around(eta_hensley*zeta_der_2/zeta(2) + (lambda_dixon**2)*((zeta_dder_2/zeta(2)) - (zeta_der_2/zeta(2))**2),3)
# Our "guesstimate" of the subdominant constant in another variance
kappa_var_coprime = np.around(kappa_var - delta_kappa,3)
print(f"{'gamma':>16} = {euler_mascheroni}" +'\n' +
f"{'zeta(2)':>16} = {zeta(2)}" + '\n' +
f"{'der_zeta(2)':>16} = {zeta_der_2}" + '\n' +
f"{'dder_zeta(2)':>16} = {zeta_dder_2}" + '\n' +
f"{'lambda':>16} = {lambda_dixon}" + '\n' +
f"{'C_P':>16} = {porter_constant}" + '\n' +
f"{'nu':>16} = {nu_norton}" + '\n' +
f"{'nu_1':>16} = {nu_norton_coprime}" + '\n' +
f"{'eta':>16} = {eta_hensley}" + '\n' +
f"{'kappa - kappa_1':>16} = {delta_kappa}" + '\n' +
f"{'kappa':>16} = {kappa_var} (?)" + '\n' +
f"{'kappa_1':>16} = {kappa_var_coprime} (?)")
gamma = 0.5772156649015329
zeta(2) = 1.6449340668482264
der_zeta(2) = -0.9375482543158438
dder_zeta(2) = 1.98928
lambda = 0.8427659132721945
C_P = 1.4670780794339755
nu = 0.06535142592303722
nu_1 = 0.5456951227978781
eta = 0.5160524
kappa - kappa_1 = 0.334
kappa = -0.1 (?)
kappa_1 = -0.434 (?)
Jump to: Table of Contents | ↑ Constants | ↓ Average-case analysis
As we will see, "one-dimensional analysis" is "harder" than "two-dimensional analysis". We may study
Jump to: Table of Contents | ↑ § 6. One-dimensional analysis | ↓ Error term, variance, and distribution | ↓↓ Numerics
To formulate results precisely, given a positive integer
is the average number of divisions performed in Euclid's algorithm for
Heilbronn [3] considered (exercise to show the equivalence)
and showed that
Theorem 6.1. [Porter, 1975]. For all
A result of this kind had been predicted by Knuth [5], based on a heuristic and numerics.
Jump to: Table of Contents | ↑ Average-case analysis | ↓ § 7. Two-dimensional analysis | ↓↓ Numerics for error term, variance, and distribution
Comparable estimates for the second moment of
For a given
then, by the central limit theorem,
as
(Last inequality an exercise.) Similarly for the probability that
Conjecture 6.2. For all
We might be able to replace the
Regarding the variance, it seems that
Jump to: Table of Contents | ↑ Error term, variance, and distribution | ↓ Average-case analysis
As already mentioned, averaging the number of divisions in the Euclidean algorithm over both inputs allows us to prove much more than we can in the one-dimensional analysis. There are various equivalent ways to formulate results: let
then define random variables
by letting each be equal to
Jump to: Table of Contents | ↑ § 7. Two-dimensional analysis | ↓ Variance and distribution | ↓↓ Numerics
Thus, for instance,
is the average number of divisions performed by Euclid's algorithm, over all pairs
Theorem 7.1 [Norton, 1990]. For all
with
As one might expect, if Conjecture 6.2 is true, the
By Norton's result and the next proposition, the following estimates all hold and are in fact equivalent: for
with
Proposition 7.2. (a) For
(b) The following statements are equivalent. (i) There exist constants
(ii) There exist constants
Proof. (a) For the second equation in (7.6), note that
$$ \begin{align*} \# U_1 \mathbb{E}[Y_1] & = \sum_{(a,b) \in U_1} T(a,b) \\ & = \sum_{\substack{(a,b) \in U_1 \\ a > b}} T(a,b) + \sum_{\substack{(a,b) \in U_1 \\ b > a}} T(a,b) + \sum_{\substack{(a,b) \in U_1 \\ a = b}} T(a,b) \\ & = \sum_{\substack{(a,b) \in U_1 \\ a > b}} T(a,b) + \sum_{\substack{(a,b) \in U_1 \\ a > b}} T(b,a) + T(1,1) \\ & = \sum_{\substack{(a,b) \in U_1 \\ a > b}} T(a,b) + \sum_{\substack{(a,b) \in U_1 \\ a > b}} \left[T(a,b) + 1\right] + 1 \\ & = 2\sum_{\substack{(a,b) \in U_1 \\ a > b}} T(a,b) + \sum_{\substack{(a,b) \in U_1 \\ a > b}} 1 + 1 \\ & = 2\#\Omega_1 \mathbb{E}[X_1] + \#\Omega_1 + 1. \end{align*} $$ Since
$\#U_1 = 2\#\Omega_1 + N$ (if this isn't clear, replace$T(a,b)$ by$1$ in the above equations), this gives
$$ \begin{align*} \#U_1\mathbb{E}[Y_1] = \#U_1 \mathbb{E}[X_1] - N\mathbb{E}[X_1] + {\textstyle \frac{1}{2}} \left(\#U_1 - N\right) + 1, \end{align*} $$ which upon dividing by
$\#U_1$ becomes
$$ \begin{align*} \mathbb{E}[Y_1] = \mathbb{E}[X_1] + {\textstyle \frac{1}{2}} - \frac{N}{\# U_1}\left(\mathbb{E}[X_1] + {\textstyle \frac{1}{2} + \frac{1}{N}}\right). \end{align*} $$ All of this is valid for
$N \ge 1$ . Proposition 4.5(c) implies that$\mathbb{E}[X_1] \ll \log N$ for$N \ge 2$ . Also, a classical result of Mertens is that for$N \ge 2$ ,
$$ \begin{align*} \sum_{\substack{1 \le a,b \le N \\ \gcd(a,b) = 1}} 1 = \frac{N^2}{\zeta(2)} + O\left(N \log N\right). \tag{7.7} \end{align*} $$ We'll use this in the proof of part (b), but at this point we only need that
$\#U_1 \gg N^2$ . Combining gives the second equation in (7.6). The verification of the first equation in (7.6) is almost identical.(b) We show that (i) implies (ii). Since
$\gcd(c,d) = 1$ if and only if$\gcd(gc,gd) = g$ , and since$T(gc,gd) = T(c,d)$ (Proposition 4.1(c)),
$$ \begin{align*} N^2 \mathbb{E}[Y] & = \sum_{1 \le a,b \le N} T(a,b) \\ & = \sum_{g = 1}^{N} \sum_{\substack{1 \le a,b \le N \\ \gcd(a,b) = g}} T(a,b) \\ & = \sum_{g = 1}^{N} \sum_{\substack{1 \le c,d \le N/g \\ \gcd(c,d) = 1}} T(gc,gd) \\ & = \sum_{g = 1}^{N} \sum_{\substack{1 \le c,d \le N/g \\ \gcd(c,d) = 1}} T(c,d) \\ & = \sum_{1 \le g \le N/2} \sum_{\substack{1 \le c,d \le N/g \\ \gcd(c,d) = 1}} T(c,d) + \sum_{N/2 < g \le N} T(1,1) \\ & = \sum_{1 \le g \le N/2} \sum_{\substack{1 \le c,d \le N/g \\ \gcd(c,d) = 1}} T(c,d) + O(N). \tag{$\dagger$} \end{align*} $$ We now assume (i), which is tantamount to assuming that, for
$M \ge 2$ and$\epsilon > 0$ ,
$$ \begin{align*} \sum_{\substack{1 \le c,d \le M \\ \gcd(c,d) = 1}} T(c,d) = \sum_{\substack{1 \le c,d \le M \\ \gcd(c,d) = 1}} \left(\lambda \log M + \nu_1 + O_{\epsilon}\left(M^{\epsilon - 1/6}\right)\right), \end{align*} $$ which, in view of (7.7), is equivalent to
$$ \begin{align*} \sum_{\substack{1 \le c,d \le M \\ \gcd(c,d) = 1}} T(c,d) = \frac{M^2}{\zeta(2)}\left(\lambda \log M + \nu_1\right) + O_{\epsilon}\left(M^{2 + \epsilon - 1/6}\right). \end{align*} $$ Putting this into
$(\dagger)$ , we obtain
$$ \begin{align*} \mathbb{E}[Y] & = \sum_{1 \le g \le N/2} \left[\left(\frac{(1/g)^2}{\zeta(2)}\right)\left(\lambda \log(N/g) + \nu_1\right) + O_{\epsilon}\left(N^{\epsilon - 1/6}/g^{2 + \epsilon - 1/6}\right)\right] + O\left(N^{-1}\right)\\ & = \lambda \log N \left(\frac{1}{\zeta(2)} \sum_{1 \le g \le N/2} \frac{1}{g^2}\right) + \lambda\left(\frac{1}{\zeta(2)} \sum_{1 \le g \le N/2} \frac{-\log g}{g^2}\right) + \nu_1 \left(\frac{1}{\zeta(2)} \sum_{1 \le g \le N/2} \frac{1}{g^2}\right) + O_{\epsilon}\left(N^{\epsilon - 1/6} \right), \tag{$\ddagger$} \end{align*} $$ where we have used the fact that
$$\sum_{g = 1}^{\infty} \frac{1}{g^{2 + \epsilon - 1/6}} \ll 1$$ in the$O$ -term. Finally,
$$ \begin{align*} \sum_{1 \le g \le N/2} \frac{1}{g^2} = \sum_{g = 1}^{\infty} \frac{1}{g^2} + O\left(N^{-1}\right) = \zeta(2) + O\left(N^{-1}\right) \quad \text{and} \quad \sum_{1 \le g \le N/2} \frac{-\log g}{g^2} = \sum_{g = 1}^{\infty} \frac{\log g}{g^2} + O\left(N^{-1}\log N\right) = \zeta'(2) + O\left(N^{-1}\log N\right), \end{align*} $$ and making these substitutions in
$(\ddagger)$ yields
$$ \begin{align*} \mathbb{E}[Y] = \lambda\log N + \lambda \frac{\zeta'(2)}{\zeta(2)} + \nu_1 + O_{\epsilon}\left(N^{\epsilon - 1/6} \right). \end{align*} $$ Hence (i) implies (ii). The converse can be obtained similarly: we begin by noting that (
$\mu$ denoting the Möbius function)
$$ \begin{align*} \sum_{\substack{1 \le a,b \le N \\ \gcd(a,b) = 1}} T(a,b) = \sum_{1 \le a,b \le N} T(a,b) \sum_{\substack{\delta \mid a \\ \delta \mid b}} \mu(\delta) = \sum_{1 \le \delta \le N} \mu(\delta) \sum_{1 \le c,d \le N/\delta} T(\delta c,\delta d) = \sum_{1 \le \delta \le N} \mu(\delta) \sum_{1 \le c,d \le N/\delta} T(c,d). \end{align*} $$ We leave the remainder of the proof as an exercise for the reader.
Naturally, the next questions are related to variance, and, ultimately, distribution.
Jump to: Table of Contents | ↑ Average-case analysis | ↓ § 8. Code for analysing Euclid's algorithm | ↓↓ Numerics for variance and distribution
Dixon [2] showed that
Theorem 7.3. [Hensley, 1994.] There exist constants
The same is true with
A unifying framework for analysing a general class of Euclidean (and other) algorithms has been developed, principally, by Vallée. The methods rely on properties of transfer operators suitably adapted from dynamical systems theory. The following is a special case of a very deep result of Baladi and Vallée [1].
Theorem 7.4. [Baladi & Vallée, 2005.] There exist constants
and
(for some positive constant
The same is true of
As we mentioned following Theomem 7.1 of Norton, if Conjecture 6.2 is true, the
In the variance (7.10), Hensley's constant
(see (5.11)). Our numerical data leads us to "guesstimate" the numerical value of
Naturally,
Jump to: Table of Contents | ↑ Variance and distribution | ↓ § 9. Generating and exporting/importing the raw data
# We'll re-do the gcd step-counter code here just to have everything we need for the raw data in one block.
# Enter any two integers a and b, and (typically) i = 0.
# Return (g,i), where g = gcd(a,b) and i is the number of steps (u,v) -> (v,u%v) in the calculation of g.
def gcd_steps(a,b,i): # Input i = 0
if b == 0:
return abs(a), i # Output gcd(a,b) and i = T(a,b)
i += 1
return gcd_steps(b,a%b,i)
# Input a dictionary and output a new dictionary, sorted by keys (if the keys are sortable).
def dictionary_sort(dictionary):
L = list(dictionary.keys())
L.sort()
sorted_dictionary = {}
for k in L:
sorted_dictionary[k] = dictionary[k]
return sorted_dictionary
# ONE-DIMENSIONAL CASE
# Input gcdlist and list 1, output two dictionaries, both of whose keys are the elements a of list1,
# and whose corresponding values are themselves dictionaries, the keys of which are positive integers s,
# s being the number of divisions performed in the Euclidean algorithm in computing gcd(a,b) for some b in [1,a].
# The corresponding values are frequencies (the number of b for which T(a,b) = s).
# In the first dictionary, we restrict to b for which gcd(a,b) is in gcdlist (primarily interested in gcdlist = [1]).
# In the second output dictionary, we count all b in [1,a].
# Thus, the second output dictionary items take the form a : {1 : x, 2 : y, ...}, where x is the number of b in [1,a] such that
# T(a,b) = 1, y is the number of b in [1,a] such that T(a,b) = 2, and so on.
# First output dictionary similar but b is restricted by gcdlist.
def heilbronn(gcdlist, list1):
list1.sort()
output_dictionary_restricted = {}
output_dictionary_all = {}
for a in list1:
output_dictionary_restricted[a] = {}
output_dictionary_all[a] = {}
for b in range(1,a+1):
g, s = gcd_steps(a,b,0)
if (g in gcdlist):
if s in output_dictionary_restricted[a].keys():
output_dictionary_restricted[a][s] += 1
else:
output_dictionary_restricted[a][s] = 1
if s in output_dictionary_all[a].keys():
output_dictionary_all[a][s] += 1
else:
output_dictionary_all[a][s] = 1
return output_dictionary_restricted, output_dictionary_all
# TWO-DIMENSIONAL CASE
# Input three lists and three dictionaries (gcdlist, list1, list2, gcd_dictionary, steps_dictionary, steps_dictionary_all).
# Usually, list1 and the three dictionaries will be empty (unless we're building upon data we've already produced).
# Output will be three "meta" dictionaries, each containing dictionaries whose keys are the integers in list2.
# Say the output dictionaries are A, B, C, and N is a key.
# In A, the value corresponding to N will be a dictionary itself, where each item is of the form
# g : #{0 < b < a < N : gcd(a,b) = g}.
# In B, the value corresponding to N will be a dictionary itself, where each item is of the form
# s : #{0 < b < a < N and gcd(a,b) is in gcdlist: T(a,b) = s}.
# In C, the value corresponding to N will be a dictionary itself, where each item is of the form
# s : #{0 < b < a < N : T(a,b) = s}.
# If we've already spent a long time to create A, B, and C, we will not want to re-do the computation when we want to
# extend them. If we want to extend A, B, and C, we should input A, B, and C as the three dictionary inputs.
# list1 should be the keys of A, B, and C, although all that really matters is that list1[-1] == list2[0].
# list2[1:] should list of keys we want to add to A, B, and C.
def euclid_alg_frequencies(gcdlist, list1, list2, gcd_dictionary, steps_dictionary, steps_dictionary_all):
list2.sort()
gcd_frequencies = {}
steps_frequencies = {}
steps_frequencies_all = {}
updated_gcd_dictionary = dictionary_sort(gcd_dictionary)
updated_steps_dictionary = dictionary_sort(steps_dictionary)
updated_steps_dictionary_all = dictionary_sort(steps_dictionary_all)
if list1 == []:
b_0 = list2[0]
else:
list1.sort()
b_0 = list1[0]
for k in updated_gcd_dictionary[list1[-1]].keys():
gcd_frequencies[k] = updated_gcd_dictionary[list1[-1]][k]
for k in updated_steps_dictionary[list1[-1]].keys():
steps_frequencies[k] = updated_steps_dictionary[list1[-1]][k]
for k in updated_steps_dictionary_all[list1[-1]].keys():
steps_frequencies_all[k] = updated_steps_dictionary_all[list1[-1]][k]
for k in range(len(list2)-1):
for b in range(list2[k], list2[k + 1]):
for a in range(b + 1, list2[k + 1]):
g, s = gcd_steps(a,b,0)
if g in gcd_frequencies.keys():
gcd_frequencies[g] += 1
else:
gcd_frequencies[g] = 1
if s in steps_frequencies_all.keys():
steps_frequencies_all[s] += 1
else:
steps_frequencies_all[s] = 1
if (g in gcdlist or gcdlist == []):
if s in steps_frequencies.keys():
steps_frequencies[s] += 1
else:
steps_frequencies[s] = 1
for b in range(b_0,list2[k]):
for a in range(list2[k], list2[k + 1]):
g, s = gcd_steps(a,b,0)
if g in gcd_frequencies.keys():
gcd_frequencies[g] += 1
else:
gcd_frequencies[g] = 1
if s in steps_frequencies_all.keys():
steps_frequencies_all[s] += 1
else:
steps_frequencies_all[s] = 1
if (g in gcdlist or gcdlist == []):
if s in steps_frequencies.keys():
steps_frequencies[s] += 1
else:
steps_frequencies[s] = 1
updated_gcd_dictionary[list2[k+1]] = dictionary_sort(gcd_frequencies)
updated_steps_dictionary[list2[k+1]] = dictionary_sort(steps_frequencies)
updated_steps_dictionary_all[list2[k+1]] = dictionary_sort(steps_frequencies_all)
return updated_gcd_dictionary, updated_steps_dictionary, updated_steps_dictionary_all
# Now some code that will take a dictionary as input.
# Assume the keys are numbers and each key's value is the number of times (frequency of) the key occurs in some data.
# The output is a dictionary, whose first item is itself a dictionary, whose keys are the same as the input dictionary,
# and for which each key's value is the _proportion_ of occurrences (_relative_ frequency) of the key among the data.
# The other items in the output dictionary are mean, variance, median, mode, etc., of the original data.
def dictionary_statistics(dictionary):
frequencies = dictionary_sort(dictionary)
relative_frequencies = {}
number_of_objects_counted = 0
mean = 0
median = 0
mode = []
second_moment = 0
variance = 0
standard_deviation = 0
M = max(frequencies.values())
for s in frequencies.keys():
number_of_objects_counted += frequencies[s]
mean += s*frequencies[s]
second_moment += (s**2)*frequencies[s]
if frequencies[s] == M:
mode.append(s)
mean = mean/number_of_objects_counted
second_moment = second_moment/number_of_objects_counted
variance = second_moment - mean**2
standard_deviation = np.sqrt(variance)
# A little subroutine for computing the median...
temp_counter = 0
if number_of_objects_counted%2 == 1:
for s in frequencies.keys():
if temp_counter < number_of_objects_counted/2:
temp_counter += frequencies[s]
if temp_counter > number_of_objects_counted/2:
median = s
if number_of_objects_counted%2 == 0:
for s in frequencies.keys():
if temp_counter < number_of_objects_counted/2:
temp_counter += frequencies[s]
if temp_counter >= number_of_objects_counted/2:
median = s
temp_counter = 0
for s in frequencies.keys():
if temp_counter < 1 + (number_of_objects_counted/2):
temp_counter += frequencies[s]
if temp_counter >= 1 + (number_of_objects_counted/2):
median = (median + s)/2
# Finally, let's get the relative frequencies.
for s in frequencies.keys():
relative_frequencies[s] = frequencies[s]/number_of_objects_counted
output_dictionary = {}
output_dictionary["dist"] = relative_frequencies
output_dictionary["mean"] = mean
output_dictionary["2ndmom"] = second_moment
output_dictionary["var"] = variance
output_dictionary["sdv"] = standard_deviation
output_dictionary["med"] = median
output_dictionary["mode"] = mode
return output_dictionary
# It will be convenient to just input a dictionary of dictionaries for which each key's value is the frequency of that key,
# and output a dictionary with the same keys, but the values replaced by the _relative_ frequency of its key.
def basic_stats(meta_dictionary):
output_dictionary = {}
for k in meta_dictionary.keys():
output_dictionary[k] = dictionary_statistics(meta_dictionary[k])
return output_dictionary
def dists(meta_dictionary):
output_dictionary = {}
for k in meta_dictionary.keys():
output_dictionary[k] = dictionary_statistics(meta_dictionary[k])["dist"]
return output_dictionary
# We may want to display the data in a readable format, etc.
# The next bit of code is convenient.
# We assume the input is a dictionary of dictionaries whose values are all either floats or integers.
def tabulate(meta_dictionary):
type_indicator = 1 # 1 means all values are integers; 0 means some are floats.
for v in meta_dictionary.values():
for w in v.values():
if type(w) == float:
type_indicator *= 0
if type_indicator == 1:
return pd.DataFrame.from_dict(meta_dictionary).fillna(0).apply(np.int64) # Fill empty cells with 0. Not really necessary, but anyway. np.int64 is probably overkill as well.
if type_indicator == 0:
return pd.DataFrame.from_dict(meta_dictionary).fillna(0).apply(np.float64) # Fill empty cells with 0. Not really necessary. np.float64 is probably overkill.
Jump to: Table of Contents | ↑ § 8. Code for analysing Euclid's algorithm | ↓ § 10. Numerical investigation & data visualisation
# Here's where we generate the raw data (one-dimensional case).
# We'll comment this code out once it's done because we don't want to re-do it every time we run our notebook.
#a_list = list(range(1,10001))
#start = timer()
#H1, H = heilbronn([1],a_list)
#end = timer()
#print(end - start)
# Output: 201.3096869001165
201.3096869001165
# Here's where we generate the raw data (two-dimensional case).
# It will take a while if we want to look at pairs up to tens of thousands.
# We'll comment this code out once it's done because we don't want to re-do it every time we run our notebook.
#checkpoints = list(range(1,101001,1000))
#start = timer()
#A, B, C = euclid_alg_frequencies([1], [], checkpoints, {}, {}, {})
#end = timer()
#print(end - start)
# Output: 17615.60527349997
17615.60527349997
# Convert dictionaries to data frames for a nice display of the data (if desired).
# This code block uses pandas to create the data frames (in our "tabulate function"), then export to csv.
# We'll comment this code once it's done becuse we don't want to re-generate our raw data every time we run our notebook.
#Adf = tabulate(A)
#Bdf = tabulate(B)
#Cdf = tabulate(C)
#H1df = tabulate(H1)
#Hdf = tabulate(H)
# Save the data as a csv file for later use.
#Adf.to_csv(r'data\gcd_pairs_100001df.csv')
#Bdf.to_csv(r'data\euclid_steps_coprime_100001df.csv')
#Cdf.to_csv(r'data\euclid_steps_100001df.csv')
#H1df.to_csv(r'data\euclid_steps_1d-coprime-10001df.csv')
#Hdf.to_csv(r'data\euclid_steps_1d-all-10001df.csv')
# Get the data back into data frames.
Adf = pd.read_csv(r'data\gcd_pairs_100001df.csv', index_col=0)
Bdf = pd.read_csv(r'data\euclid_steps_coprime_100001df.csv', index_col=0)
Cdf = pd.read_csv(r'data\euclid_steps_100001df.csv', index_col=0)
H1df = pd.read_csv(r'data\euclid_steps_1d-coprime-10001df.csv', index_col=0)
Hdf = pd.read_csv(r'data\euclid_steps_1d-all-10001df.csv', index_col=0)
# Get original dictionaries back from the data frames.
A = Adf.to_dict(orient='dict')
B = Bdf.to_dict(orient='dict')
C = Cdf.to_dict(orient='dict')
H1 = H1df.to_dict(orient='dict')
H = Hdf.to_dict(orient='dict')
# Without any modification, these dictionaries will now have strings for keys.
# That is, the keys will be '1001' and so on, instead of 1001.
# This is presumably because the keys are taken from the column headers of the data frame,
# which are by default assumed to be strings?
# I'll use the below hack to convert cast the strings back to integers.
fixA = {}
fixB = {}
fixC = {}
fixH1 = {}
fixH = {}
for k in A.keys():
fixA[int(k)] = A[k]
for k in B.keys():
fixB[int(k)] = B[k]
for k in C.keys():
fixC[int(k)] = C[k]
for k in H1.keys():
fixH1[int(k)] = H1[k]
for k in H.keys():
fixH[int(k)] = H[k]
A = fixA
B = fixB
C = fixC
H1 = fixH1
H = fixH
# We also have the following difference between the original dictionaries and the above
# (after exporting data frame, importing data frame, sending to dict, and casting keys):
# we will have items with zero-values.
# We can 'fix' this with the hack below, although it's not a big deal.
fixA = {}
fixB = {}
fixC = {}
fixH1 = {}
fixH = {}
for k in A.keys():
fixA[k] = {}
for s in A[k].keys():
if A[k][s] == 0:
pass
else:
fixA[k][s] = A[k][s]
for k in B.keys():
fixB[k] = {}
for s in B[k].keys():
if B[k][s] == 0:
pass
else:
fixB[k][s] = B[k][s]
for k in C.keys():
fixC[k] = {}
for s in C[k].keys():
if C[k][s] == 0:
pass
else:
fixC[k][s] = C[k][s]
for k in H1.keys():
fixH1[k] = {}
for s in H1[k].keys():
if H1[k][s] == 0:
pass
else:
fixH1[k][s] = H1[k][s]
for k in H.keys():
fixH[k] = {}
for s in H[k].keys():
if H[k][s] == 0:
pass
else:
fixH[k][s] = H[k][s]
A = fixA
B = fixB
C = fixC
H1 = fixH1
H = fixH
# From the raw data, generate relative frequencies etc.
Astats = basic_stats(A)
Adist = dists(A)
Bstats = basic_stats(B)
Bdist = dists(B)
Cstats = basic_stats(C)
Cdist = dists(C)
H1stats = basic_stats(H1)
H1dist = dists(H1)
Hstats = basic_stats(H)
Hdist = dists(H)
# Column N row s of table C equals #{0 < b < a < N : T(a,b) = s}
# Table B is similar but we restrict to coprime (a,b).
#tabulate(C)
# Column N row s of table Cdist equals #{0 < b < a < N : T(a,b) = s}/(N choose 2)
# Table Bdist is similar but we restrict to coprime (a,b).
# tabulate(Cdist)
Jump to: Table of Contents | ↑ § 9. Generating and exporting/importing the raw data | ↓ One-dimensional analysis: distribution
We will now visualise our data in various ways. Before each code block, we will explain the goal and give a sample frame from the animation we intend to produce. Some of the animations may take a few minutes to complete.
Jump to: Table of Contents | ↑ § 10. Numerical investigation & data visualisation | ↓ One-dimensional analysis: mean and error term | ↑↑ Theory
We now generate an animation consisting of a sequence of frames like the one below.
This frame corresponds to
We also see the value of
We also show the variance
where "error" is "small". Such a result, even if we could precisely formulate and prove it, would be somewhat otiose without a good estimate for
# We might decide to change our data before proceeding, but we won't want to overwrite our existing dictionaries.
# We'll use new variables (e.g. J, J1 instead of H, H1).
# But we don't want to have to go through the below code and change H1 to J1 etc.
# Hence we'll just do this, and work from hereon with Zstats etc.
# If we want to work with different data, all we have to do is change the right-hand sides of the next two assignments.
Z = H1
Zstats = H1stats
Zdist = H1dist
fig, ax = plt.subplots()
fig.suptitle('Number of divisions in Euclidean algorithm for gcd(a,b) \nfor totatives b of a')
# We want to animate a sequence of plots, one plot for each a in the list frame_list.
# We might want the frame_list to start at a = MiNa, end at a = MaXa, and go up by increments of skip.
# We could also enter any frame_list we wish, with a's not necessarily going up by regular increments.
MiNa = 100 # Set a-value for first frame.
MaXa = 10000 # Set a-value for last frame.
skip = 100 # Set increment.
# Now we define the frame_list list.
# We just want to make sure we don't ask for a frame corresponding to an a we don't have data on.
frame_list = []
for a in range(MiNa,MaXa+skip,skip):
if a in list(Zdist.keys()):
frame_list.append(a)
# This frame_list for a with the "worst" error term with respect to Porter's theorem.
# frame_list = []
# for a in err_champs.keys():
# if a >= MiNa:
# frame_list.append(a)
# We want a common horizontal axis for each frame in our sequence of plots.
# It needs to cover every possible value for the number of divisions T(a,b),
# for each a in our frame_list.
hor_axis_set = set()
for a in frame_list:
hor_axis_set = hor_axis_set.union(list(Zdist[a].keys()))
hor_axis_list = list(hor_axis_set)
hor_axis_list.sort()
# We can now define lower and upper bounds for the horizontal axis.
# Likewise, we want a common vertical axis range for each plot in our sequence
# (otherwise it will appear to jump around).
x_min, x_max = min(hor_axis_list), max(hor_axis_list)
y_max = max([max(list(Zdist[a].values())) for a in frame_list])
# We can now define a function that gives the plot we want for a given a in our frame_list.
def dist_Z_seq(a):
ax.clear()
# Bounds for the plot, and horizontal axis tick marks.
xleft, xright, ybottom, ytop = x_min - 0.5, x_max + 0.5, 0, np.ceil(10*y_max)/10
ax.set(xlim=(xleft, xright), ylim=(ybottom, ytop))
ax.set_xticks(hor_axis_list)
# A grid is helpful, but we want it underneath everything else.
ax.grid(True,zorder=0,alpha=0.7)
# Plots of the actual data
ax.plot(list(Zdist[a].keys()), Zdist[a].values(), 'bx', zorder=4.5, label=r'$\mathrm{Prob}(Z = s)$')
ax.bar(list(Zdist[a].keys()), Zdist[a].values(), zorder=2.5)
# Plot of normal curve with mean and variance the same as the mean and variance of the data
mu = Zstats[a]['mean']
var = Zstats[a]['var']
sigma = np.sqrt(var)
normal_points = np.linspace(list(Zdist[a].keys())[0], list(Zdist[a].keys())[-1])
#ax.plot(normal_points, stats.norm.pdf(normal_points, mu, sigma), '-', color='#ADD8E6', label=r'$\mathrm{Norm}(\mu,\sigma^2)$', zorder=3.5)
# Plot of normal curve with estimate for mean given by Porter's theorem.
# As we don't have an estimate for the variance, we'll just use the actual variance in the data
mu_p = (lambda_dixon*np.log(a) + porter_constant - 1)
ax.plot(normal_points, stats.norm.pdf(normal_points, mu_p, sigma), ':', color='y', label=r'$\mathrm{Norm}(\mu_*,\sigma^2)$', zorder=3.5)
# Label the axes
ax.set_xlabel(r'$s = \#$steps of the form $(u,v) \mapsto (v,u$ mod $v)$')
ax.set_ylabel('probability')
# Add text on top of the plot
ax.text(0.05,0.93,fr'$a = $ {a},', transform=ax.transAxes)
ax.text(0.23,0.93,fr'$\mu_* = \lambda\log a + C_P - 1 = {mu_p:.3f}$', transform=ax.transAxes)
ax.text(0.455,0.85,r'$\mathbb{E}[Z] = $' + fr'${mu:.3f}$', transform=ax.transAxes)
#ax.text(0.7,0.78,fr'$\mu = $ {mu:.3f}, $\sigma^2 = $ {var:.3f}', transform=ax.transAxes)
ax.text(0.75,0.73,fr'$\mu_* = $ {mu_p:.3f}', transform=ax.transAxes)
ax.text(0.75,0.65,fr'$\sigma^2 = $ {sigma:.3f}', transform=ax.transAxes)
# The legend...
ax.legend(loc=1, ncol=1, framealpha=0.5)
# Generate the animation
dist_Z_seq_anim = animation.FuncAnimation(fig, dist_Z_seq, frames=frame_list, interval=500, blit=False, repeat=False)
# This is supposed to remedy the blurry axis ticks/labels.
plt.rcParams['savefig.facecolor'] = 'white'
# Just a sample frame. See next code block for animation.
dist_Z_seq(1000)
plt.show()
# Save a video of the animation.
HTML(dist_Z_seq_anim.to_html5_video())
# Alternatively...
# rc('animation', html='html5')
# dist_Z_seq_anim
# Save the animation as an animated GIF
f = r"images\euclid_steps_distribution_1d_10001.gif"
dist_Z_seq_anim.save(f, dpi=100, writer='imagemagick', extra_args=['-loop','1'], fps=2)
# extra_args=['-loop','1'] for no looping, '0' for looping.
f = r"images\euclid_steps_distribution_1d_10001_loop.gif"
dist_Z_seq_anim.save(f, dpi=100, writer='imagemagick', extra_args=['-loop','0'], fps=2)
Jump to: Table of Contents | ↑ One-dimensional analysis: distribution | ↓ One-dimensional analysis: variance | ↑↑ Theory
We now generate an animation consisting of a sequence of frames like the one below.
This frame corresponds to
for all
Thus, if we consider the sum of the sign of the error term, i.e.
we expect this to behave like the distance from the origin in a one-dimensional random walk, after
The frame corresponding to
We also produce a second animation in relation to the error term in Porter's theorem. Recall Conjecture 6.1, effectively assering that
This is illustrated in a sequence of plots corresponding to
The horizontal axis shows
In the data for
# We analyse the error term in the theorem of Porter.
# For a given a, compute sum [T(a,b) - (lambda*log(a) + C_P - 1)] over totatives b of a.
# This should be <<_{epsilon} phi(a)*a^{epsilon - 1/6} according to the theorem, and
# <<_{epsilon} a^{epsilon + 1/2} conjecturally.
def porter_err(a):
return sum([(s - (lambda_dixon*np.log(a) + porter_constant - 1))*Z[a][s] for s in Z[a].keys()])
# For each key in our dictionary, make an item in a new dictionary whose value is err(a).
# Thus, a typical entry in PORTER_ERR is of the form a : err(a)
PORTER_ERR = {}
for a in Z.keys():
PORTER_ERR[a] = porter_err(a)
# Let's take a look at the sign of the error term err(a).
# If err(a) behaves like a normal random variable centred at 0, it should be positive more or less half of the time.
# Indeed, this seems to be the case for the data we have.
# Note that this will only work/make sense if we have err(a) for every a from 1 to some upper bound,
# i.e. if Z.keys() (and hence PORTER_ERR.keys()) contain all a from a = 1 to a = list(Z.keys())[-1].
POS = [0] # POS[N] will be the number of a in {1,...,N} for which err(a) is positive.
NEG = [0] # NEG[N] will be the number of a in {1,...,N} for which err(a) is negative (or zero... which won't happen).
for a in PORTER_ERR.keys():
if PORTER_ERR[a] > 0:
POS.append(POS[a-1] + 1)
NEG.append(NEG[a-1])
else:
POS.append(POS[a-1])
NEG.append(NEG[a-1] + 1)
prop_pos = [0]
prop_neg = [0]
for a in PORTER_ERR.keys():
prop_pos.append(POS[a]/a)
prop_neg.append(NEG[a]/a)
# Let's sum sgn(err(a)).
# The value of the sum should look like the distance from the origin of a random walk
# that goes left/right with probability 1/2.
sum_sign = [POS[a] - NEG[a] for a in PORTER_ERR.keys()]
fig, ax = plt.subplots()
fig.suptitle(r'Sum of sign of error term in estimate for $\mathbb{E}[Z]$')
def err_sgn_seq(N):
ax.clear()
ax.plot(range(1,N + 1),sum_sign[1:N + 1], label=r'$\sum_{a = 1}^{N}\mathrm{sgn}(\mathbb{E}[Z] - (\lambda\log a + C_P - 1))$')
ax.set_xlabel(fr'$N$')
# Note that err(a) > 0 if and only if err(a)/phi(a) > 0...
#ax.set_ylabel(r'$\sum_{1 \leq a \leq N}\mathrm{sgn}(\mathbb{E}[Z] - (\lambda\log a + C_P - 1))$')
ax.text(0.8, 0.9,fr'$N = {N}$', transform=ax.transAxes)
ax.text(0.8, 0.83,fr'{100*prop_pos[N+1]:.2f}$\% +$', transform=ax.transAxes)
ax.text(0.8, 0.76,fr'{100*prop_neg[N+1]:.2f}$\% -$', transform=ax.transAxes)
ax.legend(loc=3,framealpha=0.5)
err_sgn_seq_anim = animation.FuncAnimation(fig, err_sgn_seq, frames=list(range(1,2001)), interval=25, blit=False, repeat=False)
# This is supposed to remedy the blurry axis ticks/labels.
plt.rcParams['savefig.facecolor'] = 'white'
# Just a sample frame.
err_sgn_seq(1000)
plt.show()
# Save a video of the animation.
HTML(err_sgn_seq_anim.to_html5_video())
# Alternatively...
# rc('animation', html='html5')
# err_sgn_seq_anim
# Let's make a dictionary whose keys are what we will call "error champions".
# We call a an error champion if err(a)/sqrt(a) > err(a')/sqrt(a') for all 1 <= a' < a,
# or if err(a)/sqrt(a) < err(a')/sqrt(a') for all 1 <= a' < a.
# In other words, the maximum (minimum) relative error err(a)/sqrt(a) "jumps" at the error champions.
err_champs = {}
c_pos, c_neg, c = 0, 0, 0
for a in PORTER_ERR.keys():
if PORTER_ERR[a]/np.sqrt(a) > c_pos:
c_pos = PORTER_ERR[a]/np.sqrt(a)
c = max(c_pos,-c_neg)
err_champs[a] = (PORTER_ERR[a], c_pos, c_neg, c)
if PORTER_ERR[a]/np.sqrt(a) < c_neg:
c_neg = PORTER_ERR[a]/np.sqrt(a)
c = max(c_pos,-c_neg)
err_champs[a] = (PORTER_ERR[a], c_pos, c_neg, c)
fig, ax = plt.subplots()
fig.suptitle('Error term in Porter\'s estimate')
# We want to animate a sequence of plots, one plot for each a in the list frame_list.
# We might want the frame_list to start at a = MiNa, end at a = MaXa, and go up by increments of skip.
# We could also enter any frame_list we wish, with a's not necessarily going up by regular increments.
MiNa = 1 # Set a-value for first frame.
MaXa = 2000 # Set a-value for last frame.
skip = 1 # Set increment.
# Now we define the frame_list list.
# We just want to make sure we don't ask for a frame corresponding to an a we don't have data on.
frame_list = []
for a in range(MiNa,MaXa+skip,skip):
if a in list(PORTER_ERR.keys()):
frame_list.append(a)
# We can now define lower and upper bounds for the horizontal axis.
# Likewise, we want a common vertical axis range for each plot in our sequence
# (otherwise it will appear to jump around).
x_min, x_max = min(frame_list), max(frame_list)
y_min, y_max = min([PORTER_ERR[a] for a in frame_list]), max([PORTER_ERR[a] for a in frame_list])
# We can now define a function that gives the plot we want for a given a in our frame_list.
def err_seq(N):
ax.clear()
# Uncomment the next few lines if static axes/bacground are desired.
# # Bounds for the plot, and horizontal axis tick marks.
# xleft, xright, ybottom, ytop = x_min - 0.5, x_max + 0.5, y_min - 10, y_max + 10
# ax.set(xlim=(xleft, xright), ylim=(ybottom, ytop))
# ax.set_xticks(frame_list)
# # A grid is helpful, but we want it underneath everything else.
# ax.grid(True,zorder=0,alpha=0.7)
# The plots
# The error term
ax.plot(list(PORTER_ERR.keys())[1:N+1], list(PORTER_ERR.values())[1:N+1], label=r'$\sum_{b \in \mathbb{Z}_a^{\times}} [T(a,b) - (\lambda \log a + C_P - 1)]$')
# y = +/- sqrt(x)
ax.plot(list(PORTER_ERR.keys())[1:N+1], [a/(np.sqrt(a)) for a in range(1,N+1)], 'r:', label=r'$\pm\sqrt{a}$')
ax.plot(list(PORTER_ERR.keys())[1:N+1], [-a/(np.sqrt(a)) for a in range(1,N+1)], 'r:')
# y = +/- c sqrt(x), where c is such that the error term lies inside the parabola.
# c comes from the "error champion" dictionary, defined in the previous code block.
c = max([err_champs[a][3] for a in err_champs.keys() if a < N + 1])
ax.plot(list(PORTER_ERR.keys())[1:N+1], [c*a/(np.sqrt(a)) for a in range(1,N+1)], 'y:', label=fr'$\pm${c:.3f}' + r'$\sqrt{a}$')
ax.plot(list(PORTER_ERR.keys())[1:N+1], [-c*a/(np.sqrt(a)) for a in range(1,N+1)], 'y:')
# Labels, text, and legend
ax.set_xlabel(fr'$a$')
ax.set_ylabel('error')
ax.text(0.05, 0.93,fr'$a \leq $ {N}', transform=ax.transAxes)
# We'll label the "error champion" points. Leave the label for 50 frames.
for a in err_champs.keys():
if N - 50 < a < N + 1:
ax.annotate(f'a = {a}', (a,PORTER_ERR[a]), textcoords="offset points", xytext=(0,0), ha='center')
else:
pass
ax.legend(loc=3, framealpha=0.5)
err_seq_anim = animation.FuncAnimation(fig, err_seq, frames=frame_list, interval=50, blit=False, repeat=False)
# This is supposed to remedy the blurry axis ticks/labels.
plt.rcParams['savefig.facecolor'] = 'white'
# Just a sample frame.
err_seq(650)
plt.show()
# Save a video of the animation.
HTML(err_seq_anim.to_html5_video())
#Alternatively...
# rc('animation', html='html5')
# err_seq_anim
Jump to: Table of Contents | ↑ One-dimensional analysis: mean and error term | ↓ Two-dimensional analysis: error terms & subdominant constant in the variance | ↑↑ Theory
We now generate an animation consisting of a sequence of frames like the one below.
This frame corresponds to
Although Porter's theorem (Theorem 6.1) gives a very precise estimate for
We consider the second moment
of
In the plot above, the horizontal axis shows
The curve certainly seems to have the right rate of growth, but
As we noted,
In conclusion, we suspect that
# Let's investigate the variance of Z as a grows. Since we have good estimates for the mean of Z,
# estimating the variance is tantamount to estimating the second moment, so let's start with that.
Z = H1
Zstats = H1stats
Zdist = H1dist
# Let us assume that the second moment of Z is well-approximated by a function of the form P(log a),
# where P is a degree-two polynomial. We'll curve-fit E[Z^2] to such a function.
def sec_mom_shape(x, c2, c1, c0):
return c2*(np.log(x))**2 + c1*np.log(x) + c0
# Now for the animated sequence of plots.
# The animation will arise from a sequence of plots corresponding to N in a "frame_list".
# We might want the frame_list to start at N = MiNN, end at N = MaXN, and go up by increments of skip.
# We could also enter any frame_list we wish, with N's not necessarily going up by regular increments.
MiNN = 100 # Set N-value for first frame.
MaXN = 10000 # Set N-value for last frame.
skip = 100 # Set increment.
frame_list = range(MiNN,MaXN + skip, skip)
fig, ax = plt.subplots()
fig.suptitle(r'Second moment of $Z$')
def sec_mom_seq(N):
ax.clear()
# A grid is helpful, but we want it to lie underneath everything else.
ax.grid(True,zorder=0,alpha=0.7)
# Plot of the actual second moment of Z.
# Range for a.
LBa, UBa = 1, N
# We'll use this for the horizonal axis in both plots below...
hor_axis = [a for a in Zstats.keys() if LBa <= a <= UBa]
# ...and this for the data...
y_data = [Zstats[a]['2ndmom'] for a in hor_axis]
# Plot of E[Z^2]
ax.plot(hor_axis, y_data, 'bx', label=r'$\frac{1}{\phi(a)}\sum_{b \in \mathbb{Z}_a^{\times}} T(a,b)^2$, ' + fr'${LBa} \leq a \leq {UBa}$')
# Now curve-fit each error with our err_shape model, and plot the curve of best fit.
popt, pcov = curve_fit(sec_mom_shape, hor_axis, y_data)
(c2, c1, c0) = tuple(popt)
ax.plot(hor_axis, sec_mom_shape(hor_axis, *popt), 'y:', label = fr'$c_2(\log a)^2 + c_1\log a + c_0$' + '\n' + fr'$(c_2,c_1,c_0) = ({c2:.4f}, {c1:.4f}, {c0:.4f})$')
# Add labels, text, legend, etc.
ax.set_xlabel(r'$a$')
ax.set_ylabel(r'$\mathbb{E}[Z^2]$')
ax.legend(loc=4,framealpha=0.5)
sec_mom_seq_anim = animation.FuncAnimation(fig, sec_mom_seq, frames=frame_list, interval=500, blit=False, repeat=False)
# This is supposed to remedy the blurry axis ticks/labels.
plt.rcParams['savefig.facecolor'] = 'white'
# Just a sample frame.
sec_mom_seq(1000)
plt.show()
# Save a video of the animation.
HTML(sec_mom_seq_anim.to_html5_video())
#Alternatively...
# rc('animation', html='html5')
# sec_mom_seq_anim
# Define a function that we think approximated Var(Z) fairly well.
# We determined guess_constant via a line of best fit not shown in this notebook.
Z = H1
Zstats = H1stats
Zdist = H1dist
guess_constant = -0.354293955
def guess_var(a):
return eta_hensley*np.log(a) + guess_constant
# Define a function that we think grows at the same rate as the error term
# Var(Z) - guess_var(a)
def err_var_shape(a):
if a == 1:
return 1
return (np.log(a))**2/np.sqrt(a)
# Create a dictionary whose keys are those a for which we have data (i.e. the keys of Zstats),
# and for which the value corresponding to key a is Var(Z) - guess_var(a), i.e. the error in
# our guess.
VAR_GUESS_ERR = {}
for a in Zstats.keys():
VAR_GUESS_ERR[a] = Zstats[a]['var'] - guess_var(a)
# We now plot the error term in the prediction that
# Var(Z) = eta*log(a) + guess_constant + O_{epsilon}(a^{epsilon - 1/2}).
# The animation will arise from a sequence of plots corresponding to N in a "frame_list".
# We might want the frame_list to start at N = MiNN, end at N = MaXN, and go up by increments of skip.
# We could also enter any frame_list we wish, with N's not necessarily going up by regular increments.
MiNN = 200 # Set N-value for first frame.
MaXN = 10000 # Set N-value for last frame.
skip = 100 # Set increment.
frame_list = range(MiNN,MaXN + skip, skip)
fig, ax = plt.subplots()
fig.suptitle(r'Error in estimate for variance of $Z$')
def var_err(N):
ax.clear()
# A grid is helpful, but we want it to lie underneath everything else.
ax.grid(True,zorder=0,alpha=0.7)
# Range for a.
LBa, UBa = 100, N
# We'll use this for the horizonal axis in both plots below...
hor_axis = [a for a in VAR_GUESS_ERR.keys() if LBa <= a <= UBa]
# ...and this for the data...
y_data = [VAR_GUESS_ERR[a] for a in hor_axis]
# Plot of Var(Z) - (eta*log(a) + guess_constant)
ax.plot(hor_axis, y_data, 'b-', label=r'$\mathrm{Var}(Z) - (\eta\log a -$' + f'{-guess_constant:.3f}' + r'$)$, ' + fr'${LBa} \leq a \leq {UBa}$')
# Plot bounding curves
cp = max([VAR_GUESS_ERR[a]/err_var_shape(a) for a in hor_axis])
cn = min([VAR_GUESS_ERR[a]/err_var_shape(a) for a in hor_axis])
ax.plot(hor_axis, [cp*err_var_shape(a) for a in hor_axis], 'y-', zorder=4.5, label=r'$c_+(\log a)^2/\sqrt{a}$, ' + fr'$c_+ = {cp:.3f}$')
ax.plot(hor_axis, [cn*err_var_shape(a) for a in hor_axis], 'g-', zorder=4.5, label=r'$c_-(\log a)^2/\sqrt{a}$, ' + fr'$c_- = -{-cn:.3f}$')
# Add labels, text, legend, etc.
ax.set_xlabel(r'$a$')
ax.set_ylabel(r'error')
ax.legend(loc=0,framealpha=0.8, ncol=1)
var_err_anim = animation.FuncAnimation(fig, var_err, frames=frame_list, interval=500, blit=False, repeat=False)
# This is supposed to remedy the blurry axis ticks/labels.
plt.rcParams['savefig.facecolor'] = 'white'
# Just a sample frame.
var_err(1000)
plt.show()
# Save a video of the animation.
HTML(var_err_anim.to_html5_video())
#Alternatively...
# rc('animation', html='html5')
# var_err_anim
Jump to: Table of Contents | ↑ One-dimensional analysis: variance | ↓ Two-dimensional analysis: distribution | ↑↑ Theory: mean, variance
We now generate an animation from a sequence of frames, one of which is shown below. Each frame contains four plots: the two plots on the left correspond to the random variable
Norton's theorem (Theorem 7.1) tells us that
The bottom two plots are analogous, but for the variances of
Thus, in the bottom two plots, we curve-fit the set of points corresponding to
In our animation, we start off with five points in each of the four plots, corresponding to
# Let's analyse our data to try to come up with a prediction for the
# subdominant constant in the asymptotic variance of our random variables.
# We might decide to change our data before proceeding, but we won't want to overwrite our existing dictionaries.
# We'll use new variables (e.g. D, E, F instead of A, B, C).
# But we don't want to have to go through the below code and change A, B, C to D, E, F etc.
# Hence we'll just do this, and work from hereon with X1stats etc.
# If we want to work with different data, all we have to do is change the right-hand sides of the next four assignments.
X1stats = Bstats
X1dist = Bdist
Xstats = Cstats
Xdist = Cdist
# We'll animate four plot sequences, but we'll combine four plots in each frame.
# In each frame, there will be two one plot for the mean of X, one for the variance of X,
# and one for the mean of X_1, one for the variance of X_1. The plots will be of the
# respective error terms in the estimates for the means and variances (in the theorems of
# Norton and Hensley/Baladi-Vallee. Except, we will ignore subdominant constants.
# Thus, one plot will be of E[X] - mu*log N, another for Var(X) - eta*log N, and ditto for
# X_1.
# The animation will arise from a sequence of plots corresponding to N in a "frame_list".
# We might want the frame_list to start at N = MiNN, end at N = MaXN, and go up by increments of skip.
# We could also enter any frame_list we wish, with N's not necessarily going up by regular increments.
MiNN = 5001 # Set N-value for first frame.
MaXN = 100001 # Set N-value for last frame.
skip = 1000 # Set increment.
# Now we define the frame_list list.
# We just want to make sure we don't ask for a frame corresponding to an a we don't have data on.
frame_list = []
for N in range(MiNN,MaXN+skip,skip):
if N in list(Xdist.keys()): # We're assuming the keys of Xdist are the same as those of X1dist...
frame_list.append(N)
for N in range(MiNN,MaXN+skip,skip): # The reason for this will become apparent.
if N in list(Xdist.keys()):
frame_list.append(N - MaXN - 1)
# Let's pull the means and variances from our "stats" dictionaries into new dictionaries with the same keys.
# Let's make simple lists with all of the data we need at the outset, so we don't have to
# repeat computations every time we call our function that defines a plot in our animation sequence.
ERR_MEAN = {} # Dictionary for E[X] - mu*log N
ERR_MEAN1 = {} # Dictionary for E[X_1] - mu*log N
ERR_VAR = {} # Dictionary for Var(X) - eta*log N
ERR_VAR1 = {} # Dictionary for Var(X_1) - eta*log N
MEAN_DELTA = {} # Dictionary for E[X_1] - E[X] (which should be close to nu_1 - nu = 0.4803...)
VAR_DELTA = {} # Dictionary for Var(X_1) - Var(X) (which should be close to kappa_1 - kappa = 0.3340...)
for N in Xdist.keys():
ERR_MEAN[N] = Xstats[N]["mean"] - lambda_dixon*np.log(N - 1)
ERR_MEAN1[N] = X1stats[N]["mean"] - lambda_dixon*np.log(N - 1)
ERR_VAR[N] = Xstats[N]["var"] - eta_hensley*np.log(N - 1)
ERR_VAR1[N] = X1stats[N]["var"] - eta_hensley*np.log(N - 1)
MEAN_DELTA[N] = X1stats[N]["mean"] - Xstats[N]["mean"]
VAR_DELTA[N] = X1stats[N]["var"] - Xstats[N]["var"]
# We're going to curve-fit the errors (1 <= b < a <= N) to a function of the form
# a/sqrt(N) + b. We know there is a subdominant constant b, and we believe, based on our
# heuristic, and empirical data from the 1-dimensional analysis, that we have square-root
# cancellation. Let's define a function for this model here.
def err_shape(x, a, b):
return a/(np.sqrt(x)) + b
# Now set up the figure: four plots in a 2x2 grid.
fig, ((ax, ax1), (axv, axv1)) = plt.subplots(2, 2, figsize=(15,10), sharey=False)
plt.subplots_adjust(wspace=0.25, hspace=0.25)
fig.suptitle('Error term in estimate for mean and variance')
# Now let us define the four plots we want to see given an N in frame_list.
def err_term_sequence(N):
ax.clear() # For E[X] - mu*log N
ax1.clear() # For E[X_1] - mu*log N
axv.clear() # For Var(X) - eta*log N
axv1.clear() # For Var(X_1) - eta*log N
# A grid is helpful, but we want it to lie underneath everything else.
ax.grid(True,zorder=0,alpha=0.7)
ax1.grid(True,zorder=0,alpha=0.7)
axv.grid(True,zorder=0,alpha=0.7)
axv1.grid(True,zorder=0,alpha=0.7)
# Plots of the actual error terms
# Range for n. First we plot points from 1000 [or...] to N as N increases from MiNN = 5001 [or...] to MaXN = 100001 [or...].
# Then, we plot points from 96001 + N [or...] to 100001 [or...] as N increases from -95001 to -1 [or...].
if N > 0:
LBn, UBn = MiNN - 4*skip - 1, N
if N < 0:
LBn, UBn = 100001 + N - 4*skip, 100001
# We'll use this for the horizonal axis in the next eight plots.
HOR_AXIS = [n for n in ERR_MEAN.keys() if n in range(LBn,UBn + 1)]
ax.plot(HOR_AXIS, [ERR_MEAN[n] for n in ERR_MEAN.keys() if n in range(LBn,UBn + 1)], 'b.', label=r'$\mathbb{E}[X] - \lambda\log N$')
ax1.plot(HOR_AXIS, [ERR_MEAN1[n] for n in ERR_MEAN1.keys() if n in range(LBn,UBn + 1)], 'r.', label=r'$\mathbb{E}[X_1] - \lambda\log N$')
axv.plot(HOR_AXIS, [ERR_VAR[n] for n in ERR_VAR.keys() if n in range(LBn,UBn + 1)], 'g.', label=r'$\mathrm{Var}[X] - \eta\log N$')
axv1.plot(HOR_AXIS, [ERR_VAR1[n] for n in ERR_VAR1.keys() if n in range(LBn,UBn + 1)], 'y.', label=r'$\mathrm{Var}[X_1] - \eta\log N$')
# Now curve-fit each error with our err_shape model.
popt, pcov = curve_fit(err_shape, HOR_AXIS, [ERR_MEAN[n] for n in ERR_MEAN.keys() if n in range(LBn,UBn + 1)])
(a, b) = tuple(popt)
popt1, pcov1 = curve_fit(err_shape, HOR_AXIS, [ERR_MEAN1[n] for n in ERR_MEAN1.keys() if n in range(LBn,UBn + 1)])
(a1, b1) = tuple(popt1)
poptv, pcovv = curve_fit(err_shape, HOR_AXIS, [ERR_VAR[n] for n in ERR_VAR.keys() if n in range(LBn,UBn + 1)])
(av, bv) = tuple(poptv)
poptv1, pcovv1 = curve_fit(err_shape, HOR_AXIS, [ERR_VAR1[n] for n in ERR_VAR1.keys() if n in range(LBn,UBn + 1)])
(av1, bv1) = tuple(poptv1)
# Plot the curves of best fit.
ax.plot(HOR_AXIS, err_shape(HOR_AXIS, *popt), '-', color='#34d5eb', label=r'$C_1/\sqrt{N} + B, $' + f' $C_1=${a:.4f}..., $B=${b:.4f}...')
ax1.plot(HOR_AXIS, err_shape(HOR_AXIS, *popt1), '-', color='#eb7734', label=r'$C_2/\sqrt{N} + B_1, $' + f' $ C_2=${a1:.4f}..., $B_1=${b1:.4f}...')
axv.plot(HOR_AXIS, err_shape(HOR_AXIS, *poptv), '-', color='#A4FD83', label=r'$C_3/\sqrt{N} + D, $' + f' $C_3=${av:.4f}..., $D=${bv:.4f}...')
axv1.plot(HOR_AXIS, err_shape(HOR_AXIS, *poptv1), '-', color='#FFF769', label=r'$C_4/\sqrt{N} + D_1, $' + f' $C_4=${av1:.4f}..., $D_1=${bv1:.4f}...')
# Add text to the plots.
ax.text(0.5, 0.73,fr'Limiting $B$-value: {nu_norton - 0.5:.4f}...', transform=ax.transAxes)
ax.text(0.5, 0.68,fr'$B_1 - B$ = {b1 - b:.4f}...', transform=ax.transAxes)
ax.text(0.5, 0.63,fr'Limiting $(B_1 - B)$-value: {nu_norton_coprime - nu_norton:.4f}...', transform=ax.transAxes)
ax1.text(0.5, 0.73,fr'Limiting $B_1$-value: {nu_norton_coprime - 0.5:.4f}...', transform=ax1.transAxes)
ax1.text(0.5, 0.68,fr'$B_1 - B$ = {b1 - b:.4f}...', transform=ax1.transAxes)
ax1.text(0.5, 0.63,fr'Limiting $(B_1 - B)$-value: {nu_norton_coprime - nu_norton:.4f}...', transform=ax1.transAxes)
axv.text(0.5, 0.35,f'Limiting $D$-value: open question!', transform=axv.transAxes)
axv.text(0.5, 0.3,fr'$D - D_1$ = {bv - bv1:.4f}...', transform=axv.transAxes)
axv.text(0.5, 0.25,fr'Limiting $(D - D_1)$-value: {delta_kappa:.3f}...', transform=axv.transAxes)
axv1.text(0.5, 0.35,f'Limiting $D_1$-value: open question!', transform=axv1.transAxes)
axv1.text(0.5, 0.3,fr'$D - D_1$ = {bv - bv1:.4f}...', transform=axv1.transAxes)
axv1.text(0.5, 0.25,fr'Limiting $(D - D_1)$-value: {delta_kappa:.3f}...', transform=axv1.transAxes)
# Axis labels, legend, etc.
ax.set_xlabel(fr'$N = {LBn}, {LBn + skip},\ldots,{UBn-1}$')
ax.set_ylabel('error')
ax1.set_xlabel(fr'$N = {LBn}, {LBn + skip},\ldots,{UBn-1}$')
ax1.set_ylabel('error')
axv.set_xlabel(fr'$N = {LBn}, {LBn + skip},\ldots,{UBn-1}$')
axv.set_ylabel('error')
axv1.set_xlabel(fr'$N = {LBn}, {LBn + skip},\ldots,{UBn-1}$')
axv1.set_ylabel('error')
ax.legend(loc=0, ncol=1, framealpha=0.5)
ax1.legend(loc=0, ncol=1, framealpha=0.5)
axv.legend(loc=0, ncol=1, framealpha=0.5)
axv1.legend(loc=0, ncol=1, framealpha=0.5)
# Generate the animation
err_term_sequence_anim = animation.FuncAnimation(fig, err_term_sequence, frames=frame_list, interval=500, blit=False, repeat=False)
# This is supposed to remedy the blurry axis ticks/labels.
plt.rcParams['savefig.facecolor'] = 'white'
# Just a sample frame.
err_term_sequence(100001)
plt.show()
# Save a video of the animation.
HTML(err_term_sequence_anim.to_html5_video())
# Alternatively...
# rc('animation', html='html5')
# err_term_sequence_anim
Jump to: Table of Contents | ↑ Two-dimensional analysis: error terms & subdominant constant in the variance | ↓ References | ↑↑ Theory
Finally, in the culmination of the above theorems and numerical data, we generate an animation showing the distribution of our random variables
The horizontal axis shows the possible values
Note that in view of (7.7),
The plot also displays the means
The dotted blue curve is the PDF for the normal distribution of mean
We also display the mean, median, and mode of
# We might decide to change our data before proceeding, but we won't want to overwrite our existing dictionaries.
# We'll use new variables (e.g. D, E, F instead of A, B, C).
# But we don't want to have to go through the below code and change A, B, C to D, E, F etc.
# Hence we'll just do this, and work from hereon with X1stats etc.
# If we want to work with different data, all we have to do is change the right-hand sides of the next four assignments.
X1stats = Bstats
X1dist = Bdist
Xstats = Cstats
Xdist = Cdist
fig, ax = plt.subplots()
fig.suptitle('Distribution of number of divisions in Euclidean algorithm \nfor gcd(a,b) with 0 < b < a < N')
# We want to animate a sequence of plots, one plot for each N in the list frame_list.
# We might want the frame_list to start at N = MiNN, end at N = MaXN, and go up by increments of skip.
# We could also enter any frame_list we wish, with a's not necessarily going up by regular increments.
# To include all N in our frame_list, we could just set frame_list = list(X1dist.keys())
MiNN = 1001 # Set N-value for first frame.
MaXN = 100001 # Set N-value for last frame.
skip = 1000 # Set increment.
# Now we define the frame_list list.
# We just want to make sure we don't ask for a frame corresponding to an N we don't have data on.
frame_list = []
for N in range(MiNN,MaXN+skip,skip):
if N in list(X1dist.keys()): # We'll be plotting data from Xdist, Xstats, X1dist, X1stats dictionaries: we're assuming they all have the same keys.
frame_list.append(N)
# We want a common horizontal axis for each frame in our sequence of plots.
# It needs to cover every possible value for the number of divisions T(a,b),
# for each (a,b) for each N in our frame_list.
# In this case, this will just end up being list(X1dist[N].keys()) for the
# largest N in our frame_list, but the below code is a bit more flexible.
hor_axis_set = set()
for N in frame_list:
hor_axis_set = hor_axis_set.union(list(X1dist[N].keys()))
hor_axis_list = list(hor_axis_set)
hor_axis_list.sort()
# We can now define lower and upper bounds for the horizontal axis.
# Likewise, we want a common vertical axis range for each plot in our sequence
# (otherwise it will appear to jump around).
x_min, x_max = min(hor_axis_list), max(hor_axis_list)
y_min, y_max = 0, max([max(list(X1dist[N].values())) for N in frame_list])
# We can now define a function that gives the plot we want for a given N in our frame_list.
def two_dim_dist_seq(N):
ax.clear()
# Bounds for the plot, and horizontal/vertical axis tick marks.
xleft, xright, ybottom, ytop = x_min - 0.5, x_max + 0.5, y_min - 0.01, np.ceil(10*y_max)/10
ax.set(xlim=(xleft, xright), ylim=(ybottom, ytop))
ax.set_xticks(hor_axis_list)
gridnum = 10
ax.set_yticks(np.linspace(0, ytop, num=gridnum+1))
# A grid is helpful, but we want it underneath everything else.
ax.grid(True,zorder=0,alpha=0.7)
# Plots of the actual data
ax.plot(list(Xdist[N].keys()),list(Xdist[N].values()),'b.',zorder=4.5, label='all (a,b)')
ax.plot(list(X1dist[N].keys()),list(X1dist[N].values()),'.', color = 'r', zorder=4.5, label='coprime (a,b)')
# Plots of normal curves with mean and variance the same as the mean and variance of the data
mu = Xstats[N]['mean']
var = Xstats[N]['var']
sigma = np.sqrt(var)
mu1 = X1stats[N]['mean']
var1 = X1stats[N]['var']
sigma1 = np.sqrt(var1)
normal_points = np.linspace(list(Xdist[N].keys())[0], list(Xdist[N].keys())[-1])
ax.plot(normal_points, stats.norm.pdf(normal_points, mu, sigma), ':', color='b', alpha=0.9,zorder=3.5, label=r'$\mathrm{Norm}(\mu,\sigma^2)$')
ax.plot(normal_points, stats.norm.pdf(normal_points, mu1, sigma1), ':', color='r', alpha=0.9, zorder=3.5, label=r'$\mathrm{Norm}(\mu_1,\sigma_1^2)$')
# Plots of normal curves with mean given by the estimate in Norton's theorem, and
# variance given by the estimate in Hensley's theorem (see Baladi/Vallee theorem as well),
# but the subdominant constants in the variance given by our "guesstimate".
mu_p = lambda_dixon*np.log(N - 1) + nu_norton - 0.5
var_p = eta_hensley*np.log(N - 1) + kappa_var
sigma_p = np.sqrt(var_p)
mu1_p = lambda_dixon*np.log(N - 1) + nu_norton_coprime - 0.5
var1_p = eta_hensley*np.log(N - 1) + kappa_var_coprime
sigma1_p = np.sqrt(var1_p)
ax.plot(normal_points, stats.norm.pdf(normal_points, mu_p, sigma_p), '-',color='#34d5eb', alpha=0.5,zorder=2.5, label=r'$\mathrm{Norm}(\mu_{*},\sigma_{*}^2)$')
ax.plot(normal_points, stats.norm.pdf(normal_points, mu1_p, sigma1_p), '-',color='#eb7734', alpha=0.5,zorder=2.5, label=r'$\mathrm{Norm}(\mu_{1*},\sigma_{1*}^2)$')
# Label the axes
ax.set_xlabel(r'$\#$steps of the form $(u,v) \mapsto (v,u$ mod $v)$')
ax.set_ylabel('probability')
# Add text and legend on top of the plot
# We'll include the medians and modes...
median = Xstats[N]['med']
median1 = X1stats[N]['med']
mode = Xstats[N]['mode']
mode1 = X1stats[N]['mode']
# A bit of trial and error (view the plots) to get the positioning.
# Show N
ax.text(xright-9, ytop*((gridnum - 3)/gridnum), fr'$N = {N}$')
# Show the mean and variance of the data (all (a,b)), then the estimates given by the theory.
ax.text(xright-9, ytop*((gridnum - 3.75)/gridnum), fr'$\mu = {mu:.3f}, \sigma^2 = {var:.3f}$')
ax.text(xright-9, ytop*((gridnum - 4.5)/gridnum), fr'$\mu_* = {mu_p:.3f}, \sigma_*^2 = {var_p:.3f}$')
# Show the mean and variance of the data (coprime (a,b)), then the estimates given by the theory.
ax.text(xright-9, ytop*((gridnum - 5.25)/gridnum), fr'$\mu_1 = {mu1:.3f}, \sigma^2_1 = {var1:.3f}$')
subscript = '_{1*}'
ax.text(xright-9, ytop*((gridnum - 6)/gridnum), fr'$\mu{subscript} = {mu1_p:.3f}, \sigma{subscript}^2 = {var1_p:.3f}$')
# Show the median and mode(s) (coprime (a,b))
ax.text(xright-6, ytop*((gridnum - 6.75)/gridnum), fr'median$_1$: {median1}')
ax.text(xright-6, ytop*((gridnum - 7.5)/gridnum), fr'mode(s)$_1$: {mode1}')
# Show the median and mode(s) (all (a,b))
ax.text(xright-6, ytop*((gridnum - 8.25)/gridnum), f'median: {median}')
ax.text(xright-6, ytop*((gridnum - 9)/gridnum), f'mode(s): {mode}')
# Legend
ax.legend(loc=2, ncol=3, framealpha=0.5, mode="expand")
# Generate the animation
two_dim_dist_anim = animation.FuncAnimation(fig, two_dim_dist_seq, frames=frame_list, interval=500, blit=False, repeat=False)
# This is supposed to remedy the blurry axis ticks/labels.
plt.rcParams['savefig.facecolor'] = 'white'
# Just a sample frame
two_dim_dist_seq(1001)
plt.show()
# Save a video of the animation.
HTML(two_dim_dist_anim.to_html5_video())
# Alternatively...
# rc('animation', html='html5')
# two_dim_dist_anim
# Save the animation as an animated GIF
f = r"images\euclid_steps_distribution.gif"
two_dim_dist_anim.save(f, dpi=100, writer='imagemagick', extra_args=['-loop','1'], fps=5)
# extra_args=['-loop','1'] for no looping, '0' for looping.
f = r"images\euclid_steps_distribution_loop.gif"
two_dim_dist_anim.save(f, dpi=100, writer='imagemagick', extra_args=['-loop','0'], fps=5)
Jump to: Table of Contents | ↑ Two-dimensional analysis: distribution
[1] V. BALADI & B. VALLÉE. "Euclidean algorithms are Gaussian." J. Number Theory 110(2):331–386, 2005.
[2] J. D. DIXON. "The number of steps in the Euclidean algorithm." J. Number Theory 2(4):414–422, 1970.
[3] H. HEILBRONN. "On the average length of a class of finite continued fractions." In Abhandlungen aus Zahlentheorie und Analysis, VEB Deutscher Verlag, Berlin 1968.
[4] D. HENSLEY. "The number of steps in the Euclidean algorithm." J. Number Theory 49(2):142–182, 1994.
[5] D. E. KNUTH. The Art of Computer Programming, Volume 2: Seminumerical Algorithms. 3rd Edn. Addison-Wesley, Reading, Mass., 1997.
[6] L. LHOTE. "Efficient computation of a class of continued fraction constants." (Summary by J. CLEMENT.) pp. 71–74 in Algorithms Seminar 2002–2004 (Ed. F. CHYZAK), 2005.
[7] G. H. NORTON. "On the asymptotic analysis of the Euclidean algorithm." Journal of Symbolic Computation 10(1):53–58, 1990.
[8] J. W. PORTER. "On a theorem of Heilbronn." Mathematika 22(1):20–28, 1975.
[9] J. SHALLIT. "Origins of the analysis of the Euclidean algorithm." Historia Mathematica 21(4):401–419, 1994.