List Implementation
There are two typical ways to implement a list. The first is to use an array like data structure, the second is to use a linked list data structure (more on this below). The python builtin list structure is array based, while the C++ STL list is linked list based.
There are advantages and disadvantages to each implementations. Depending on the use case, there may be advantages with one implementation vs another.
Array based Implementation
Advantages
- items are stored in memory consecutively and you can have direct access to any particular item through the use of its index in constant time.
- When sorted, the list can be searched using binary search, providing a speedy search through large data sets.
Drawbacks
- space is often wasted as arrays are typically created bigger than needed.
- Insertion into anywhere other than the very end of the array is an expensive operation as it requires the shifting of all values from the point of insertion to the end of the array.
- Removal of any value anywhere other than the very last item is also expensive as it also requires a shift in all items from the point of removal to the very end of the list.
- growing the array can be amortized (average out the linear cost of single grow() spread out over all list operations) to , but when it does occur, that one single call to grow at that moment is linear, this may need to be considered if list is used in an application that has latency restrictions.
Linked list Implementation
Advantages
A linked list is an implementation that stores data in nodes. These nodes are linked together one after another.
- Linked lists are very easy to grow and shrink as nodes only exist if there is data stored in them. When you grow the linked list, old nodes do not need to be duplicated as part of the grow process.
- Nodes are only created if there is data to store. No need to preallocate extra space
- Data is not stored in consecutive memory locations so a large block of contiguous memory is not required even for storing large amounts of data.
- Both insertion and removal of any node in the list (assuming that the position of the insertion/removal is known) can be very efficient and runs in constant, O(1) time as it would only require a change of a few pointers. Exactly how long it takes depends on the type of linked list and the exact operations being performed. The key however is that when values are added or removed from a linked list, other values in the list are not moved around.
Drawbacks
- Each piece of data requires the storage of at least one extra pointer. When an array is full, it uses less memory than a linked list of the same size. Furthermore, if the data being stored in each node doesn't require much memory, then the pointer cost relative to the data stored can be significant. For example, an integer takes just as much room to store as a pointer. If you have a singly linked list of integers, the storage of each piece of data is double that of storing it into an array. Thus any array that is more than 50% full is storing more integers with same amount of storage as a singly linked list with the same amount of data.
- A linked list cannot be searched using binary search as direct access to nodes are not available.
- Data is not necessarily stored consecutively, this will mean that hardware advantages such as caching won't apply. If your program requires you to do something with all the data in the list, this can have significant impact on performance.
Memory Requirements
To implement a list using arrays, we allocate more space than is necessary. The array is used until it is full. Once an array is full, either all new insertions fail until an item is removed or the array must be reallocated. The reallocation typically involves creating a larger array, copying over the old data, and making this the array. In order to amortize the cost of the grow operation over many list operations, implement the grow() by doubling the size of the array.
Growing an array can be an expensive operation which may require the duplication of all n elements of the array. Growing by some constant number of elements ensures poor performance. Always double.
To implement a list using a linked list like structure, each value is put into its own data node. Each node in the list is then linked with the next node by storing the address of the next node in the list. The memory usage in this case is at least one extra pointer for each piece of data. However, nodes are created on an as needed basis. There are never have unused nodes. Thus, the amount of memory used to store data in a linked list structure is typically lower than an array until the array is nearly full.