源码分析Arrays.sort中使用的排序算法

Java 主要排序方法为 java.util.Arrays.sort()，粗略的来讲，对于基本数据类型使用双轴快排，对于引用类型使用归并排序。接下来我们深入 Arrays.sort 的源码，更具体的分析其中所使用的排序算法。

Arrays.sort底层

一开始会判断数组是否是小数组（元素个数小于286），是则使用快速排序。

static void sort(int[] a, int left, int right,
                 int[] work, int workBase, int workLen) {
    // Use Quicksort on small arrays 对小数组使用快速排序
    if (right - left < QUICKSORT_THRESHOLD) {
        sort(a, left, right, true);
        return;
    }

但其实也并不是直接就开始使用快速排序，点进 sort(a, left, right, true) 看，会发现元素个数小于47的极小数组，会使用插入排序。

private static void sort(int[] a, int left, int right, boolean leftmost) {
    int length = right - left + 1;
    // Use insertion sort on tiny arrays 对极小数组使用插入排
    if (length < INSERTION_SORT_THRESHOLD) {
        if (leftmost) {
            for (int i = left, j = i; i < right; j = ++i) {
                int ai = a[i + 1];
                while (ai < a[j]) {
                    a[j + 1] = a[j];
                    if (j-- == left) {
                        break;
                    }
                }
                a[j + 1] = ai;
            }

如果元素个数小于286，同时又大于47，则使用双轴快排。

双轴快排的基本思想是：

从数组中选取 5 个均匀间隔的元素，并将这 5 个元素进行插入排序。

/*
 * Sort five evenly spaced elements around (and including) the
 * center element in the range. These elements will be used for
 * pivot selection as described below. The choice for spacing
 * these elements was empirically determined to work well on
 * a wide variety of inputs.
 */
int e3 = (left + right) >>> 1; // The midpoint
int e2 = e3 - seventh;
int e1 = e2 - seventh;
int e4 = e3 + seventh;
int e5 = e4 + seventh;

使用五个排序元素中的第二个和第四个作为枢轴，且 pivot1 <= pivot2。
定义两个指针 less 与 great ，less 从数组最左端向右遍历，直到找到第一个不小于枢纽1的元素，great 从数组最右端向左遍历，直到找到第一个不大于枢纽2的元素。

// 指针
int less  = left;  // The index of the first element of center part
int great = right; // The index before the first element of right part

//......

while (a[++less] < pivot1);
while (a[--great] > pivot2);

此时再将 less 与 great 中间的元素进行移动，将小于枢纽1的元素移至枢纽1左边，将大于枢纽2的元素移至枢纽2右边，最后位于两个枢纽之间的元素就是 pivot1 <= && <= pivot2 的元素了。

/*
 * Partitioning:
 *
 *   left part           center part                   right part
 * +--------------------------------------------------------------+
 * |  < pivot1  |  pivot1 <= && <= pivot2  |    ?    |  > pivot2  |
 * +--------------------------------------------------------------+
 *               ^                          ^       ^
 *               |                          |       |
 *              less                        k     great
 *
 * Invariants:
 *
 *              all in (left, less)   < pivot1
 *    pivot1 <= all in [less, k)     <= pivot2
 *              all in (great, right) > pivot2
 *
 * Pointer k is the first index of ?-part.
 */
outer:
for (int k = less - 1; ++k <= great; ) {
    int ak = a[k];
    if (ak < pivot1) { // Move a[k] to left part
        a[k] = a[less];
        a[less] = ak;
        ++less;
    } else if (ak > pivot2) { // Move a[k] to right part
        while (a[great] > pivot2) {
            if (great-- == k) {
                break outer;
            }
        }
        if (a[great] < pivot1) { // a[great] <= pivot2
            a[k] = a[less];
            a[less] = a[great];
            ++less;
        } else { // pivot1 <= a[great] <= pivot2
            a[k] = a[great];
        }
        a[great] = ak;
        --great;
    }
}
// Swap pivots into their final positions
a[left]  = a[less  - 1]; a[less  - 1] = pivot1;
a[right] = a[great + 1]; a[great + 1] = pivot2;

再对枢纽左中右三部分进行递归快排（中间部分右特殊处理）。同样的，如果这些部分的数组长度小于47，又会改为插入排序。

1
2
3

// Sort left and right parts recursively, excluding known pivots
sort(a, left, less - 2, leftmost);
sort(a, great + 2, right, false);

我们再回到第一个分支，即判断数组元素个数是否小于286，如果小于，则使用快速排序；若大于等于286，则使用归并排序。

// 检查数组是否接近有序
for (int k = left; k < right; run[count] = k) {
    if (a[k] < a[k + 1]) { // ascending
        while (++k <= right && a[k - 1] <= a[k]);
    } else if (a[k] > a[k + 1]) { // descending
        while (++k <= right && a[k - 1] >= a[k]);
        for (int lo = run[count] - 1, hi = k; ++lo < --hi; ) {
            int t = a[lo]; a[lo] = a[hi]; a[hi] = t;
        }
    } else { // equal
        for (int m = MAX_RUN_LENGTH; ++k <= right && a[k - 1] == a[k]; ) {
            if (--m == 0) {
                sort(a, left, right, true);
                return;
            }
        }
    }

    /*
     * 降序组太多，数组被认为没有结构，使用快速排序代替归并排序
     */
    if (++count == MAX_RUN_COUNT) {
        sort(a, left, right, true);
        return;
    }
}

最后再进行合并操作。

需要注意的是，以上的Arrays.sort是针对基本数据类型进行的排序，若是对Object类进行排序，则是使用归并排序，而不是用快速排序。

1.如果数组元素个数小于MIN_MERGE(32)，那么就会调用二分插入排序binarySort方法进行排序，所谓二分排序，是指在插入的过程中，使用二分查找的方法查找待插入的位置，这种查找方法会比线性查找快一点。

2.如果大于MIN_MERGE，则将数组划分成多个有序块进行归并排序。

为什么使用归并排序，而不是用快速排序？应当从稳定性的角度出发：

快速排序不是稳定的，而归并排序是稳定的。

对于基本数据类型，稳定性没有意义，而对于对象类型，稳定性是比较重要的，因为对象相等的判断可能只是判断关键属性，最好保持相等对象的非关键属性的顺序与排序前一直；另外一个原因是由于合并排序相对而言比较次数比快速排序少，移动（对象引用的移动）次数比快速排序多，而对于对象来说，比较一般比移动耗时。

总结

数组元素个数小于47，使用插入排序；数组元素个数大于等于47，小于286，使用快速排序；数组元素个数大于等于286，使用归并排序，当然，若数组降序组太多，又会使用快速排序（快排数据越无序越快）；
对于小数组来说，插入排序效率更高，每次递归到小于47的大小时，用插入排序代替快排，明显提升了性能。
双轴快排使用两个pivot，每轮把数组分成3段，在没有明显增加比较次数的情况下巧妙地减少了递归次数。