Merge Sort
The merge sort works on the idea of merging two already sorted lists.
If there existed two already sorted list, merging the two together into a single sorted list can be accomplished in O(n) time where n is the combined length of the two lists.
Thus, if we are able to take our original list, split it into two pieces and then somehow get them into a sorted state, we can then use the merge algorithm to put these two pieces back together.
The algorithm of exactly how we can do this is described below.
Algorithm:
Merge algorithm
Understanding the merge algorithm is an important step of understanding how merge sort works.
The algorithm to merge already sorted list works like this
- create an empty merged list
- have a way to "point" at the first element of each of the two list
- compare the values being pointed at and pick the smaller of the two
- copy the smaller to the end of the merged list, and advance the "pointer" of the list that the value came from
- continue until one of the list is completely copied then copy over remainder of the rest of the list
Merge Sort Algorithm
The merge algorithm is an O(n) algorithm as we pick n items between two lists. However, it depends on having two lists that are already sorted...So we must first get a list into that state.
- Start with the entire list
- Split the list into two halfs
- mergesort each half - this will result in two sorted lists
- merge them together
Now... the above is clearly a recursive algorithm. The base case for this are lists of size 0 or size 1. If you have an list that is 0 elements it doesn't exist so we don't have to sort it. If a list is 1 element then that list is sorted.
Animation
http://cathyatseneca.github.io/DSAnim/web/merge.html
Implementation
The implementation of merge sort is split into 3 pieces, each piece is detailed below.
mergeSort()
This is the mergeSort() function that the program calls. The recursive merge sort has a complicated prototype, we hide the complexity by writing a function that will make the initial call. We also need a second list to store the merged list. The sorting algorithm creates a temporary list once and passes that into a recursive function to do all the work. The idea here is that we don't want to allocate memory on each merge... instead we can just use one list that stays around for the entire process.
- Python
- C++
def merge_sort(mylist):
empty_list = [0] * len(mylist) # create an empty list for merging. doing it once
# is more efficient than repeatedly creating it when merging
recursive_merge_sort(mylist, 0, len(mylist) - 1, empty_list) # call recursive mergesort
void mergeSort(int arr[],int size){
int* tmp=new int[size];
mergeSort(arr,tmp,0,size-1);
delete [] tmp;
}
recursive merge sort
The recursive mergesort splits the list into two pieces, mergesorts each piece then merge them together. While we speak about splitting up the list, the reality is that the split is more conceptual than actual. We will use indexes to indicate where one list begins and one list ends. Thus this function is not only given the list but also the indexes to indicate where it begins and ends. The function is also passed an empty list that we pass to merge function. This is done for efficiency.
- Python
- C++
def recursive_merge_sort(mylist, first_index, last_index, empty_list):
# recursive merge sort. the base case occurs when first_index >= last_index
# that situation will only occur if the portion of the list is size 0 or 1.
# in either case, do nothing and exit function.
if first_index < last_index:
mid_index = (first_index + last_index) // 2
recursive_merge_sort(mylist, first_index, mid_index, empty_list)
recursive_merge_sort(mylist, mid_index + 1, last_index, empty_list)
merge(mylist, first_index, mid_index + 1, last_index, empty_list)
void mergeSort(int arr[],int tmp[],int start,int end){
//if the array is more than one element big
if(start<end){
int mid=(start+end)/2;
mergeSort(arr,tmp,start,mid);
mergeSort(arr,tmp,mid+1,end);
merge(arr,tmp,start,mid+1,end);
}
}
The merge function
The merge function takes two sorted list and merge them together. However, because we are merging together two sorted pieces of a list, rather than being given individual lists, the function prototype must take in indexes to indicating starting/ending points of these lists within the original list
- Python
- C++
# this function will merge two sorted pieces of an array into a single sorted segment.
# We will refer to the two smaller sorted pieces of lists as a and b.
# These are not true lists but rather pieces of mylist
# list a starts at a_first_index and ends at b_first_index - 1 (inclusive)
# list b starts at b_first_index and ends at b_last_index (inclusive)
# this function will merge them together using empty_list as temporary storage
# once it is merged together, it is copied back into mylist, creating a single
# sorted segment that starts at a_first_index to b_last_index (inclusive)
def merge(mylist, a_first_index, b_first_index, b_last_index, empty_list):
a_ptr = a_first_index # used to track value from a
b_ptr = b_first_index
empty_list_index = a_ptr
while (a_ptr < b_first_index) and (b_ptr <= b_last_index):
if mylist[a_ptr] <= mylist[b_ptr]:
empty_list[empty_list_index] = mylist[a_ptr]
empty_list_index += 1
a_ptr += 1
else:
empty_list[empty_list_index] = mylist[b_ptr]
empty_list_index += 1
b_ptr += 1
while a_ptr < b_first_index:
empty_list[empty_list_index] = mylist[a_ptr]
empty_list_index += 1
a_ptr += 1
while b_ptr <= b_last_index:
empty_list[empty_list_index] = mylist[b_ptr]
empty_list_index += 1
b_ptr += 1
for i in range(a_first_index, b_last_index + 1):
mylist[i] = empty_list[i]
/*function merges the two halfs of a sorted array together.
The arrays are defined from arr[startA]to arr[startB-1] and arr[startB]
to arr[endB]*/
void merge(int arr[],int tmp[],int startA,int startB,int endB){
int aptr=startA;
int bptr=startB;
int idx=startA;
while(aptr < startB && bptr < endB+1){
if(arr[aptr] < arr[bptr]){
tmp[idx]=arr[aptr];
idx++;
aptr++;
}
else{
tmp[idx++]=arr[bptr];
bptr++;
}
}
while(aptr<startB){
tmp[idx]=arr[aptr];
idx++;
aptr++;
}
while(bptr < endB+1){
tmp[idx++]=arr[bptr];
bptr++;
}
//copy back into arr
for(int i=startA;i<=endB;i++){
arr[i]=tmp[i];
}
}