Bucket Sorting in O(n) Expected Time
Godfried Toussaint
School of Computer Science
McGill University
Montreal, Quebec, Canada
1. Introduction
Given n numbers X1, X2,..., Xn drawn at random independently from the uniform distribution
in [0,1], it is desired to sort them in O(n) expected time.
Our model of computation allows the floor function to be performed in constant time. The
following algorithm does the job.
2. Algorithm BUCKET-SORT
Begin
Step 1: Find Xmin and Xmax, the points with minimum and maximum value.
Step 2: Divide the interval [Xmin, Xmax] into n-2 “buckets” or intervals of equal length.
Step 3: “Throw” the remaining n-2 points into their respective buckets using the floor func-
tion.
Step 4: For each bucket that contains more than one point sort them with any method that runs
in at most quadratic worst-case time.
Step 5: Scan through the buckets and concatenate the sorted lists in each bucket.
End
3. Analysis
Once Xmin and Xmax are found the algorithm processes the remaining n-2 points which are
themselves uniformly distributed in [Xmin, Xmax]. Since we have n-2 buckets it follows that the
probability that a remaining point falls in the i-th bucket is pi = 1/(n-2). In other words, the number
of points that falls in bucket i is a binomial random variable, denoted by Ni, with parameters (n-2)
and pi, i = 1, 2,..., n-2. If we sort each Ni using a quadratic time algorithm the total time taken by
BUCKET-SORT is given by
-1-
T ( n ) = k 1 N 12 + k 1 N 22 + … + k n – 2 N n2 – 2
n–2
= c ∑ N i2 (1)
i=1
where c is a positive constant.
To find the expected time we need to take the expected value, denoted by E{.}, of (1).
n–2
E { T ( n ) } = c ∑ E { N i2 } (2)
i=1
Thus we need to know the expected value of the square of a random variable. Now, for any
random variable X we have
E { X 2 } = µ 2 + Var ( X ) (3)
This is easy to see from the definition of the variance since
Var ( X ) = E { ( X – µ ) 2 }
= E { X 2 – 2µX + µ 2 }
= E { X 2 } – 2µE { X } + µ 2
= E { X 2 } – µ2
Furthermore, for a binomial random variable Ni with parameters (n-2) and pi we have that:
µ = ( n – 2 ) pi (4)
and Var ( X ) = ( n – 2 ) p i ( 1 – p i ) (5)
-2-
1
Substituting (4) and (5) into (3) and using p i = ------------ yields
n–2
1
E { N i2 } = 2 – ------------ (6)
n–2
Substituting (6) into (2) we have
n–2
1
E { T ( n ) } = c ∑ 2 – ------------
n – 2
i=1
= 2cn – 5c
= O(n) – O(1)
= O(n)
Therefore, for points uniformly distributed in the unit interval, algorithm BUCKET-SORT
runs in linear expected time.
-3-