1. INTRODUCTION
1.1 A New Sorting Algorithm
One of the most common and fundamental problem in computer science, is ordering a list of items, which is well known as ‘Sorting’. Considering information processing performed by millions of computers along the whole world, ‘Sorting’ takes a very important role.
As it gained its importance to real world problem, it also gets the attention of computer scientists. As a matter of fact, various types of sorting algorithm have been developed. Some well known sorting algorithm are Bubble sort, Heap sort, Insertion sort, Merge sort, Quick sort, Selection sort, Shell sort etc [1-4]. An implementer uses a certain algorithm depending on the characteristics of distribution if the data elements or on some other certain context.
My algorithm, however, can provide optimize performance, in a context, where it is known that, the data elements are distributed in an order that, there exists a considerable low amount of sorted subsets. Computer applications, that contain data with the characteristic like this and need to be sorted, can be highly benefited using ‘Squeeze Sort’ algorithm.
1.2 The Concept
The concept beneath the algorithm is, say, given a global data array as d [0: n-1] with n number of unsorted data and another global data array as s [0: n-1], which initially doesn’t contain any data elements. On each call of the algorithm with one argument‘t’ (where t<=n), it basically does two things.
First, it forms a sorted sequence p [0…k-1] (where k<=t<=n) from d [0: t-1]. Thus, the unsorted sequence d [0: t-1] is compressed as a new unsorted sequence as d [0: t-k-1]. As the name ‘Squeeze Sort’ implies that the amount of data elements getting Squeezed.
Second, the sorted sequence p [0: k-1] is then merged to an existing sorted sequence s [0: j-1] (where j<=n), and thus a new sorted sequence will be formed as s [0: j+k-1].
The algorithm will be recursively called until the unsorted sequence become (conceptually) empty. Finally there is a sorted sequence s [0: n-1], which is actually the desired sorted data.
2. UNDERSTANDING THE DETAIL
We have two global array named data [0: n-1] and sorted [0: n-1], where n indicates the number of elements to be sorted. We have another 2 global data named sorted_so_far and unsorted_so_far which indicate the number of elements sorted so far and the number of elements unsorted so far respectively.
Initially, data [0: n-1] contain n unsorted elements, where as sorted [0: n-1] is conceptually empty. My algorithm will finally keep the desired sorted data into sorted [0: n-1].
The algorithm Squeeze_Sort has an argument named ‘size’, which indicates the number of elements to be sorted.
Here is the algorithm:
n, sorted_so_far=0,unsorted_so_far=n;
data[n];sorted[n];
merge (start, end) {
h=0;j=start;i=unsorted_so_far;
mid=sorted_so_far-1;
while(h<=mid) {
if(sorted[h]>=sorted[j])
data[i++]=sorted[h++];
else {
data[i++]=sorted[j++];
if(j==n)j=mid+1;
if(j==end) break;
}
}
if(h>mid)
for(k=j;;) {
data[i++]=sorted[k++];
if(k==n)k=mid+1;
if(k==end)break;
}
else for( k=h; k<=mid;)
data[i++]=sorted[k++];
sorted_so_far=0;
for(k=unsorted_so_far
2.1 Compressing the Unsorted Data
The first portion of the algorithm checks consecutive maximum and minimum with respect to the first element of data [0: size-1] (i, e, data [0]) and stores the result in a sorted sequential manner. This results compression of data as data [0:unsorted_so_far-1] with unsorted data and two array item sorted [start: n-1] and sorted [sorted_so_far: end-1] together, consecutively form a sorted sequence.
2.2 Extending the Number of Sorted Elements
The second portion of the algorithm merges the newly found sorted sequence (sorted [start: n-1] and sorted [sorted_so_far: end-1] together consecutively) with the existing sorted array item sorted [0: sorted_so_far], thus increasing the number of sorted elements.
2.3 Recursion Until All the Elements Are Sorted
The algorithm Squeeze_Sort will be recursively called until the number of unsorted elements not becomes greater then zero.
3. SIMULATION:
The data blocks below shows the simulation process of the sorting, as described in this section.
Initial Data:
data [0:unsorted_so_far]: 2, 40, 9, 100, 7, 22, 1, 8, 11, 102, 54, 53, 5, 71, 79, 66, 3
sorted:
First Execution of Squeeze Sort:
sorted [start: end]: 1, [2], 40, 100, 102
sorted [0:sorted_so_far]: 1, 2, 40, 100, 102
data [0:unsorted_so_far]: 9, 7, 22, 8, 11, 54, 53, 5, 71, 79, 66, 3
Second Execution of Squeeze Sort:
sorted [start: end]: 3, 5, 7, [9], 22, 54, 71, 79
sorted [0:sorted_so_far]: 1, 2, 3, 5, 7, 9, 22, 40, 54, 71, 79, 100, 102
data [0:unsorted_so_far]: 8,11,53,66
Finally:
sorted [start: end]: [8], 11, 53, 66
sorted [0:sorted_so_far]: 1, 2, 3, 5, 7, 9, 11, 22, 40, 53, 54, 66, 71, 79, 100, 102
data [0:unsorted_so_far] :
To simulate the execution process of Squeeze Sort simply, let’s go through the example.
We initialize the data [0: n-1] array with some random data.
data [0:16]: {2, 40, 9, 100, 7, 22, 1, 8, 11, 102, 54, 53, 5, 71, 79, 66, 3}
sorted [0:16] :{< empty>}.
When the execution of Squeeze Sort begins, based on the first element in data [0:16], a sorted sequence is generated and stored with respect to the sorted sequence as sorted [start: n-1] and sorted [sorted_so_far: end-1] consecutively. The term start indicates the starting index, where as the term end indicates the ending tag, of newly found sorted array. The remaining data elements those can’t be involved is stored in data [0:unsorted_so_far] (i.e., the unsorted data getting compressed!). As the name implies the term sorted_so_far indicates the number of elements those have been sorted so far and stored in sorted[0: sorted_so_far-1],where as the term unsorted_so_far indicates the number of elements those have not been sorted so far and stored in data[0: unsorted_so_far-1].
During the first execution we get a new sorted sequence as sorted [start: end] = {1, [2], 40, 100, 102} which was generated based on the first element of the previous data [0:unsorted_of_far-1]. This sequence then merged to the existing sorted array sorted [0:sorted_so_far] (which is empty now). As well as, we get the remaining unsorted data in data [0:unsorted_so_far] = {9, 7, 22, 8, 11, 54, 53, 5, 71, 79, 66, 3}.
Since there is some data remaining as unsorted, a recursion of Squeeze Sort will be happen.
During the second execution of Squeeze Sort, we get the further compressed data as data [0:unsorted_so_far] = {8, 11, 53, 66} and a more sorted sequence as sorted [0:sorted_so_far] = {1, 2, 3, 5, 7, 9, 22, 40, 54, 71, 79, 100, 102}.
The third execution of the Squeeze Sort gives us the desired result as sorted [0:16] = {1, 2, 3, 5, 7, 9, 11, 22, 40, 53, 54, 66, 71, 79, 100, 102}.
4. COMPLEXITY
4.1 Time Complexity:
As usual, the time complexity of the Squeeze Sort depends on the characteristics of the distribution of the data element. However, the actual time complexity of this algorithm is proportional to the number of times its recursion occurred. Without considering the recursion, the general time complexity is O (n). So, it is clear to say that, considering recursion, if the number of recursion is k, then the time complexity can be defined as,
T (n) = k.O (n).
As the value of k can be decreased, the more performance can be achieved.
So, the best case time complexity can be achieved (as shown in the Figure 1), if k=1.
That is,
T (n) =1.O (n).
We have to consider the worse case, as well as.
The data set, containing the characteristics, as shown in Figure 2, will produce a worse case time complexity. For example, if the data elements are distributed as the manner below,
data [0: n-1] = {1, 100, 2, 99, 3, 98, 4, 97 …}, then for each time the algorithm executed (considering recursion), the data array will be compressed only 2 elements. Then, the complexity will be,
T (n) = 2 + 4 + …………+ (n-4) + (n-2) + n
=O (n²).

Figure 1: A situation that depicts the best case complexity O (n) of Squeeze Sort

Figure 2: A situation that depicts the worse case complexity O (n²) of Squeeze Sort
Comparing with two well known algorithms, we can get a rough estimate about the time complexity of Squeeze Sort. In the table below, I compared the time complexity of Quick Sort (average case time complexity is O (nlogn)) [1-4] and Bubble Sort (average case time complexity is O (n²)) [2] with Squeeze Sort.
Table: Comparative Results (Time Complexity)
|
Data
Size
(000) |
Quick
Sort |
Squeeze
Sort |
Bubble
Sort |
|
30 |
.054945 |
.21978 |
3.901099 |
|
35 |
.054945 |
.32967 |
5.10989 |
|
70 |
.10989 |
.604396 |
9.450549 |
|
77 |
.10989 |
.659341 |
10.384615 |
|
105 |
.164835 |
.879121 |
13.846154 |
|
112 |
.164835 |
.989011 |
14.835165 |
|
154 |
.21978 |
1.208791 |
20.27475 |
|
200 |
.274725 |
1.273626 |
28.813188 |
4.2 Space Complexity:
Since Squeeze Sort requires an extra array with the same size of the data array, 2n memory space is needed, so the space complexity is
S (n) =O (n).
5. CONCLUSION:
Although sorting problem is old enough, still it contains attention to the researchers. According to the various characteristics of the distribution of the real world data element and different situations those faced by the programmers, more effective and creative idea, considering sorting usually provide broader way.
I hope, researchers in this arena, will find my algorithm ‘Squeeze Sort’ useful, to grow up their better and creative ideas.
REFERENCE:
[1.] Horowitz E. and Sahni S., “Fundamentals of Computer Algorithms”, Galgotia Pubs Ltd., New Delhi, 1995.
[2.] Edward M. Reingold, “Data Structures”: Page 384, Little Brown and Company, Boston, 1999.
[3.] Robert L. Kruse, “Data Structures and Program Design in C”, Prentice-Hall, Inc., 1991.
[4.] Thomas H. Cormen, “Introduction to Algorithms”, MIT Press, 1990.
Print | posted on Thursday, July 06, 2006 12:25 PM