jast@www Version 4.2a (archive)

HeapSort is one of the most well-known higher-level sorting algorithms and it uses a heap for sorting in a very clever way, at least that's my opinion. When I examined the most important sorting algorithms, I immediately fell in love with HeapSort (and people say uni is useless...). I know, QuickSort has shown to be more efficient in most cases, but I still like HeapSort, and that's why I decided to bring you more information about it and a do-it-yourself guide. Sit down, grab a bag of crisps (if you insist on something else, so be it) and a drink, and build your own HeapSort code.

Like the name says, HeapSort uses a heap for sorting. A heap is a very special tree, satisfying the following requirements:

- It's a binary tree (i.e. every node has at most two children)
- The tree is complete up to at least the next-to-last level and only the rightmost nodes may be missing on the last level
- The value of each node is higher than all of its chilren's

It may sound strange, but that's it. The first two requirements specify a
kind of tree that has some very useful properties. For example, it's
possible to determine a node's parent (`P(x) = |_ x/2 _|`) or its
children (`C(x) = x*2 (left) or x*2 + 1 (right)`) in constant time
(the formulae involve only constants--what did you expect?). Furthermore,
a heap can be saved in a single array without losing the tree structure.
As an overall conclusion, it's pretty easy and fast to construct a heap
from an array, then take the first element of it several times,
"repairing" the heap after each removal. The values taken from the top are
then sorted downwards. Great, isn't it? Here's how it works in
practice:

Everything starts with the array that's to be sorted (called A[1..n] here). This array is also where the heap will be built. The following diagram shows how the tree form of a heap in the array would look like:

__ A[1]_ / \ A[2] A[3] / \ / \ A[4] A[5] A[6] A[7] / \ A[8] A[9]

After looking at this, all the requirements and gains should be well understandable.

All the properties of the heap lead to the conclusion that requirement #3
is already fulfilled starting from element `|_ n/2 + 1 _|` for any
array you can think of. Now, to build a heap from it, you go backwards
from the middle (i.e. the elements before the part that already *is* part
of the heap) and move the new element to the right place. If its value is
lower than both of its children, it's swapped with the higher of the
children (and then the same needs to be done again with the node's new
children and so on). After this, another element fulfills all
requirements. When you've worked your way to the first element in the
array, you've got a fully featured heap and you can start sorting.

There it goes! Again, you work your way from the end (the very end this time, not the middle) to the start and swap the last element of the heap with the first one. Then, you make the heap smaller (the swap you just performed moved a sorted value to the end of the heap, you don't want to move it again) and get the new first element of the heap into its correct position, just like you did when you built the heap. This goes on and on and on and suddenly, your heap is only one element. Congratulations, you're through!

This may all sound a little complicated to you. I know that an example says more than a thousand articles, so here is one. You can enter a string below and a coloured (the red bit is the heap) snapshot of every step performed during the sorting will show up, together with some statistics about the number of swaps and compares needed. This interactive sorting tool is a little Perl script I wrote to understand the algorithm in the first place and I hope you can learn from it, too.

start: a b c d e f g E X A M P L E 5 7 3 2 4 7 1 2 3 8 step 1: generating heap... ( 1a, 3c) a b c d e f g E X A M P L E 5 7 3 2 4 7 1 2 3 8 ( 1a, 4c) a b c d e f g E X A M P L E 5 7 3 2 4 7 1 2 3 8 ( 1a, 4c) a b c d e f g E X A M P L E 5 7 3 2 4 7 1 2 3 8 ( 1a, 4c) a b c d e f g E X A M P L E 5 7 3 2 4 7 1 2 3 8 ( 1a, 4c) a b c d e f g E X A M P L E 5 7 3 2 4 7 1 2 3 8 ( 1a, 4c) a b c d e f g E X A M P L E 5 7 3 2 4 7 1 2 3 8 ( 1a, 4c) a b c d e f g E X A M P L E 5 7 3 2 4 7 1 2 3 8 ( 1a, 4c) a b c d e f g E X A M P L E 5 7 3 2 4 7 1 2 3 8 ( 1a, 4c) a b c d e f g E X A M P L E 5 7 3 2 4 7 1 2 3 8 ( 2a, 8c) a b g d e f c E X A M P L E 5 7 3 2 4 7 1 2 3 8 ( 2a, 8c) a e g d b f c E X A M P L E 5 7 3 2 4 7 1 2 3 8 ( 3a, 12c) g e f d b a c E X A M P L E 5 7 3 2 4 7 1 2 3 8 step 2: sorting... ( 7a, 14c) f e c d b a E E X A M P L 8 5 7 3 2 4 7 1 2 3 g ( 8a, 18c) e d c X b a E E 4 A M P L 8 5 7 3 2 3 7 1 2 f g ( 7a, 14c) d b c X M a E E 4 A 2 P L 8 5 7 3 2 3 7 1 e f g ( 7a, 14c) c b a X M P E E 4 A 2 1 L 8 5 7 3 2 3 7 d e f g ( 7a, 17c) b X a E M P E 7 4 A 2 1 L 8 5 7 3 2 3 c d e f g ( 7a, 14c) a X P E M L E 7 4 A 2 1 3 8 5 7 3 2 b c d e f g ( 7a, 14c) X M P E A L E 7 4 2 2 1 3 8 5 7 3 a b c d e f g ( 6a, 13c) P M L E A 3 E 7 4 2 2 1 3 8 5 7 X a b c d e f g ( 6a, 13c) M E L 7 A 3 E 7 4 2 2 1 3 8 5 P X a b c d e f g ( 7a, 13c) L E E 7 A 3 8 7 4 2 2 1 3 5 M P X a b c d e f g ( 6a, 13c) E A E 7 5 3 8 7 4 2 2 1 3 L M P X a b c d e f g ( 6a, 10c) E A 8 7 5 3 3 7 4 2 2 1 E L M P X a b c d e f g ( 7a, 14c) A 7 8 7 5 3 3 1 4 2 2 E E L M P X a b c d e f g ( 6a, 10c) 8 7 3 7 5 2 3 1 4 2 A E E L M P X a b c d e f g ( 7a, 14c) 7 7 3 4 5 2 3 1 2 8 A E E L M P X a b c d e f g ( 6a, 10c) 7 5 3 4 2 2 3 1 7 8 A E E L M P X a b c d e f g ( 6a, 10c) 5 4 3 1 2 2 3 7 7 8 A E E L M P X a b c d e f g ( 5a, 9c) 4 3 3 1 2 2 5 7 7 8 A E E L M P X a b c d e f g ( 5a, 9c) 3 2 3 1 2 4 5 7 7 8 A E E L M P X a b c d e f g ( 5a, 6c) 3 2 2 1 3 4 5 7 7 8 A E E L M P X a b c d e f g ( 5a, 6c) 2 1 2 3 3 4 5 7 7 8 A E E L M P X a b c d e f g ( 4a, 4c) 2 1 2 3 3 4 5 7 7 8 A E E L M P X a b c d e f g ( 4a, 2c) 1 2 2 3 3 4 5 7 7 8 A E E L M P X a b c d e f g 157a (=> 52s), 361c total. explanation: a = assigns, s = swaps, c = compares, heap

And because I'm a nice guy all over, you even get the source code of the
script, together with a few other scripts that visualise other sorting
algorithms, and everything's distributed under the terms of the BSD
Licence (meaning you can almost freely copy and modify the code or
compiled versions).

» Download

This article was last updated on 2004-12-21.

Comments with a different background were submitted by the author himself.

AC: