Heapify and Heap Sort
Recall that Heap Sort basically operates by building a heap with n values then destroying the heap.
A complete binary tree with n nodes means that at most there are log n nodes from the root (top) to a leaf (a node at the bottom of the tree)
Insertion may require the percolate up process. The number of times a node needs to percolate up can be no more than the number of nodes from the root to the leaf. Therefore, it is pretty easy to see that this percolate up process is O(log n) for a tree with n nodes. This means that to add a value to a Heap is a process that is O(log n). In order for us to build a heap we need to insert into it n times. Therefore, the worst case runtime for this process is O(n log n)
The deletion process may require a percolate down process. Like the percolate up process this also is O(log n). Thus, to remove a value from the heap is O(log n). We need to remove n values so this process is O(n log n).
To heap sort we build heap O(n log n) then destroy the heap O(n log n). Therefore the whole sorting algorithm has a runtime of O(n log n)
Our heap can be represented with an array. The simplest implementation of heap sort would be to create a heap object:
A simple but not good implementation of heap sort
- Python
- C++
def heap_sort(mylist):
the_heap=Heap() # we assume we have created some Heap Object
size = len(mylist)
for i in range(0,size):
the_heap.insert(mylist[i])
for i in range(0,size):
mylist[i]=the_heap.front()
the_heap.delete()
//simple but not correct way to heap sort an array
void heapSort(int data[],int size){
Heap theheap;
for(int i=0;i<size;i++){
theheap.insert(data[i]);
}
for(int i=0;i<size;i++){
data[i]=theheap.front();
theheap.delete();
}
}
but this is not actually a good implementation. The above would create a heap as it goes meaning that we need to actually double the data storage as the heap itself would be an array that is as big as data. Thus, the function would have the same drawback as merge sort. Furthermore building a heap by multiple insertions is actually slower than making a heap in place. Thus, instead of building an actual heap with a heap object, we will implement heap sort by turning the array into a heap in place then removing from it in place.
One way we can do this by borrowing the idea from insertion sort... in that algorithm we start off by hiving off the first element and saying that it is a sorted array. We then "insert" successive elements into this sorted array so that it stays sorted.
Remember that a heap uses every element of the array without any sort of gaps. If you have a heap with n elements, you need n element array to represent the heap. If you took a value out, then the array size decreases by 1. After doing a removal, you effectively end up with an open spot at the back of the array. Thus, we sort by filling those open spots with the value being removed, forming a sorted list as the values being removed will come out based on their priority ordering.
Thus heap sort works like this:
- we build a heap in place (turn the array of unsorted numbers into a heap)
- we remove a value from the heap and place that value at the end of the array.
One key thing we have to change though is the priority. We actually need to ensure that items that should go towards the end of the array is considered highest priority for the heap. Thus if we were sorting an array in ascending order (small to big) we should give larger values a higher priority.
Heapify
Since our heap is actually implemented with an array, it would be good to have a way to actually create a heap in place starting with an array that isn't a heap and ending with an array that is heap. While it is possible to simply "insert" values into the heap repeatedly, the faster way to perform this task is an algorithm called Heapify.
In the Heapify Algorithm, works like this:
Given a node within the heap where both its left and right children are proper heaps (maintains proper heap order) , do the following:
- If the node has higher priority than both children, we are done, the entire heap is a proper heap
- Otherwise
- swap current node with higher priority child
- heapify() that subtree
Effectively what is happening is that we already know that both the left and right children are proper heaps. If the current node is higher priority than both children then the entire heap must be proper
However if its not then we do a swap, this means that the current node's value has now gone down into one of the subtrees. This could cause that subtree to no longer be a heap because the root of that subtree now has a value of lower priority than it use to have. so we heapify the subtree.
Building a maxheap in place
This example will start with an array that is not heap. It will perform heapify() on it to form a max heap (bigger values have higher priority). In each of the diagrams below, the argument to heapify() is the index corresponding to the node
Firstly, all leaf nodes are valid heaps. Since they have no subtree, we don't need to deal with those nodes. They are highlighted in green in this next picture. Our algorithm therefore starts at the first non-leaf node from the bottom. This node is at index (n-2)/2 where n is the total number of values in our heap.
The heapify function takes the index of the root of the heapify routine (ie we know that nodes children are heaps, and we are looking at it from that node down.
heapify(3)
First node to consider is the node with 5 (index 3). the left subtree is lower priority and the right subtree doesn't exist so we do nothing.
heapify(2)
Second node to consider is 3 (index 2). both 4 and 7 are bigger than 3 and thus have higher priority. However, 7 is higher priority than 4, so we swap these two values. 3 is now in a leaf node and thus we can stop.
heapify(1)
Next consider the node with 6. 6 is higher priority than 5 but not higher priority than 8, so we swap them. 6 is now in a leaf node and thus we can stop.
heapify(0)
Lastly we consider the node 1. both 8 and 7 have higher priority than 1, but 8 is higher priority than 7 and thus we swap 8 and 1 first.
At this point we need to heapify the subtree that now has 1 as it's root (index 1). We swap it with the higher priority child (6 in this case). Notice that because we won't need to do anything to the right subtree with 7 as root, as that subtree was not modified.
After the final swap we have a proper max heap
Once the heap is built, we continuously remove the highest priority value from the heap and place it at the back of the array.