The purpose of the work is to develop counting sort algorithm and to implement it in code. Investigate the efficiency and algorithmic complexity of counting sort and compare it with the exchange sorting algorithm (insertion sort). Determine the advantages and disadvantages of the algorithm by comparing it with other exchange algorithms.
Counting sort is not a comparison sorting algorithm and has linear complexity - O(n + k). To achieve O(n) complexity, counting sort assumes that each element is an integer in the range 1 to k, for some integer k and k=O(n). Counting sort first determines for each input element X the number of elements, less than or equal to X. Then uses this information to place the element of X directly at its position in the original array. For example, if 15 elements are less than or equal to X, then X belongs to the original position 15.
Counting-Sort (A, n)
1 k=MAX(A)-MIN(A)
2 Declare B[1 : n] and C[0 : k]
3 for i = 0 downto k
4 C[i] = 0
5 for j = 1 downto n
6 C[A[j]] = C[A[j]] + l
7 for i = 1 downto k
8 C[i] = C[i] +C[i - 1]
9 for j = n downto 1
10 B[C[A[j]]] = A[j]
11 C[A[j]] = C[A[j]] – 1
12 return B
The for loop in lines 3-4 takes O(k) time, the for loop in lines 5-6 takes O(n) time, the for loop in lines 7-8 takes O(k) time, and the for loop in lines 9-11 takes O(n) time. Thus, the total execution time is O(k + n), where k is the range of elements and n is the total number of elements. In practice, it makes sense to use count sort when k = O(n), in which case the execution time is O(n).
In the above pseudocode, an auxiliary array C of size k was used, where k is the range of elements in this array. As a result, the counting sort algorithm has a space complexity of O(k).
A valuable advantage of counting sort is its stability: elements with the same value appear in the output array in the same order as in the input array, i.e. it disassociates two elements according to the rule that the element that appears first in the input array also appears first in the output array.
Let's consider the graphic modeling of the sorting algorithm by counting. The Counting-Sort (A, n) procedure takes as input the array A=[7, 3,7, 0, 3, 2, 1, 7, 1, 1, 0] and n=11. Array C is initialized with zeros (Fig. a) of size k - the range of elements, k=0-7 k=7. The number of identical elements from the input array A is determined and written into the array C (Fig. b). Array C after rows 7-8 (Fig. c).
Output array B after execution of lines 9-11 after one iteration (fig.d), third iteration (fig.e), after eleventh iteration (fig.f). Finally, the output sorted array B (Fig.g).
Insertion sort is a sorting algorithm that places the unsorted element in the appropriate place during each iteration. The algorithm assumes that the key element is already sorted, then selects the next unsorted element of the array. If the unsorted element is greater than the next one, it is placed to the right, otherwise to the left. Other elements are also sorted in this way.
Insertion-Sort (A, n)
1 for i = 2 to n
2 key = A[i]
3 j = i – 1
4 while j > 0 and A[j] > key
5 A[j + 1] = A[j]
6 j = j - 1
7 A[j+ 1] = key
, ...where (c_1, \ldots, c_7) represent the number of steps.
To calculate the worst case time complexity, when the input array is sorted in reverse order, we use the decomposition of sums and after several algebraic operations we have.
The running time is thus a quadratic function of n. ( T(n) = O(n^2) ).
Does not use additional space for sorting
The insertion sort algorithm is a stable algorithm.
The Insertion-Sort (A, n) procedure takes as input the array A=[7, 0, 5, 1, 8] and the number of elements in the array is n=5. The blue square holds the value of the key from A[i]. The blue arrows represent the movement of the key, the orange arrows represent the element A[j].
- Efficiency:
-
counting sort is an algorithm with linear complexity, that is, its time complexity depends on the number of elements in the input array. It works very efficiently for arrays when k=O(n), k is a range of values. Let's give an example when k=O(n) (Fig.C.1) and k=O(n^2) (Fig.C.2). Execution time of the counting sorting algorithm has increased significantly, approx. 1000 times, when the value of the input array k=O(n^2).
Figure C.1 - k=O(n)
Figure C.2 - k=O(n^2)
-
insertion sorting has a quadratic time complexity, which means that its performance significantly decreases with large arrays (Fig. C.3).
Figure C.3 - Execution time of the algorithms
- Space complexity
-
the counting sort requires additional memory for storing temporary data during sorting, namely of size k, therefore, when the range of permissible values of k increases, we have a significant amount of occupied memory ( Fig. C.4). The graph shows the dependence of the used memory in MB to k;
Fig. C.4 - Memory usage
-
insertion sort does not require additional memory for sorting
- Worst and best cases for Counting sort:
-
it is not advisable to use counting sorting to sort arrays when the range of permissible values of k is much larger than the array itself. Let's examine the situation when the n=1000, min=1, max=1000000 (Fig. C.5)
Fig. C.5 - Counting sort worst case
-
suppose we have an array n=1000, min=1, max=10000, which will be the best case for sorting by count (Fig. C.6)
Fig. C.6 - Counting sort best case
- Worst and best case for Insertion sort:
- for insertion sort the worst case is when the input array is reversed according to what it is being sorted (Fig. C.7), it took almost 8 seconds for insertion sort and 0.006456 in the case of counting sort.
Fig. C.7 - Insertion sort worst case
-
the best case is when the input array is already sorted (Fig. C.8). The execution time of the insertion sort was 0.000531 seconds, and for counting it was 0.010585.
Fig. C.7 - Insertion sort best case
- Stability:
- counting sort, in its classic implementation, is stable
- insertion sort is also stable
- Acceptable data:
- counting sort is used to sort integers. However, some changes are possible to implement the sorting of negative numbers;
- insertion sort can be used to sort any valid data types, as it depends on how the comparisons are defined for those types.
- Sorting Algorithm Comparison
Algorithm | Best Case Time Complexity | Worst Case Time Complexity | Average Case Time Complexity | Space Complexity | Stability |
---|---|---|---|---|---|
Counting Sort | ( O(n + k) ) | ( O(n + k) ) | ( O(n + k) ) | ( O(n + k) ) | + |
Insertion Sort | ( O(n) ) | ( O(n^2) ) | ( O(n^2) ) | ( O(1) ) | + |
Advantages:
- Counting sort has linear time complexity: O(n + k), where n is the number of elements in the input array and k is the range of input values. In cases where k=O(n), counting sort can perform significantly faster than any comparable sorting algorithm, such as insertion sort, quicksort, or merge sort.
- Counting sort is a stable sorting algorithm.
- Since no elements are compared, this algorithm can become very coercive in some cases.
- Because counting sort is well suited for sorting well-defined, finite numbers, it can be used as a subroutine in other sorting algorithms, such as bucket sort, radix sort, etc.
Disadvantages:
- Space Complexity: Counting sort requires additional memory to create the array, which can be a significant disadvantage when dealing with large ranges of input data, making it a "not in-place" sort algorithm. The space complexity is O(k), and for large values of k this can become very impractical.
- Limited applicability: Count sort is not suitable for sorting data with a wide or unbounded range of values, and its use in such a case is impractical.