使用 1 MB RAM 对 1 百万个 8 进制数字进行排序

我有一台具有 1 MB RAM 且没有其他本地存储的计算机。我必须使用它来通过 TCP 连接接受一百万个 8 位十进制数字，对它们进行排序，然后通过另一个 TCP 连接将排序后的列表发送出去。

数字列表可能包含重复项，我不能丢弃。该代码将放置在 ROM 中，因此我无需从 1 MB 中减去代码的大小。我已经有了驱动以太网端口和处理 TCP / IP 连接的代码，并且它需要 2 KB 的状态数据，包括一个 1 KB 的缓冲区，代码将通过该缓冲区读取和写入数据。有解决这个问题的方法吗？

问题和答案的来源：

slashdot.org

cleaton.net

答案

到目前为止，这里还没有提到一个相当狡猾的把戏。我们假设您没有其他存储数据的方法，但是严格来说并非如此。

解决问题的一种方法是执行以下可怕的操作，任何情况下任何人都不应尝试以下操作：使用网络流量存储数据。不，我不是说 NAS。

您可以通过以下方式仅用几个字节的 RAM 对数字进行排序：

首先获取 2 个变量： COUNTER和VALUE 。
首先将所有寄存器设置为0 ;
每次您收到整数I ，请递增COUNTER并将VALUE设置为max(VALUE, I) ；
I的 ICMP 回显请求数据包发送到路由器。抹掉I然后重复。
每次收到返回的 ICMP 数据包时，您只需提取整数并在另一个回显请求中再次将其发送出去。这将产生大量的 ICMP 请求，其中包含整数的 ICMP 请求向前和向后扩展。

一旦COUNTER达到1000000 ，您就将所有值存储在不断的 ICMP 请求流中，并且VALUE现在包含最大整数。选择一些threshold T >> 1000000 。将COUNTER设置为零。每次您收到 ICMP 数据包时，请递增COUNTER并在另一个回显请求中将包含的整数I I=VALUE ，在这种情况下，将其传输到已排序整数的目的地。一旦COUNTER=T ，将VALUE 1 ，将COUNTER重置为零并重复。一旦VALUE达到零，您应该已经按照从最大到最小的顺序将所有整数传输到了目的地，并且仅将大约 47 位 RAM 用于两个持久变量（以及临时值所需的任何数量）。

我知道这很可怕，而且我知道可能存在各种各样的实际问题，但我认为这可能会让你们中的一些人大笑或至少使您感到恐惧。

这是一些可以解决该问题的 C ++ 代码。

~~满足内存约束的证明：~~

编辑器：本文或他的博客中都没有提供作者提供的最大内存要求的证据。由于对值进行编码所需的位数取决于先前编码的值，因此这种证明可能并非无关紧要。作者注意到，根据经验，他可能偶然发现的最大编码大小为1011732 ，并任意选择了缓冲区大小1013000

typedef unsigned int u32;

namespace WorkArea
{
    static const u32 circularSize = 253250;
    u32 circular[circularSize] = { 0 };         // consumes 1013000 bytes

    static const u32 stageSize = 8000;
    u32 stage[stageSize];                       // consumes 32000 bytes

    ...

这两个阵列一起占用 1045000 字节的存储空间。剩下的 1048576-1045000-2×1024 = 1528 个字节可用于剩余变量和堆栈空间。

它在我的至强 W3520 上运行约 23 秒。 sort1mb.exe ，则可以使用以下 Python 脚本验证程序是否正常运行。

from subprocess import *
import random

sequence = [random.randint(0, 99999999) for i in xrange(1000000)]

sorter = Popen('sort1mb.exe', stdin=PIPE, stdout=PIPE)
for value in sequence:
    sorter.stdin.write('%08d\n' % value)
sorter.stdin.close()

result = [int(line) for line in sorter.stdout]
print('OK!' if result == sorted(sequence) else 'Error!')

有关该算法的详细说明，请参见以下系列文章：

请查看第一个正确答案或以后的算术编码答案。您可能会在下面找到一些有趣的东西，但并不是 100％的防弹解决方案。

这是一个非常有趣的任务，这是另一个解决方案。我希望有人会觉得结果有用（或至少很有趣）。

阶段 1：初始数据结构，粗略压缩方法，基本结果

让我们做一些简单的数学运算：我们最初有 1M（1048576 字节）的 RAM 用于存储 10 ^ 6 8 位十进制数字。 [0; 99999999]。因此，要存储一个数字，需要 27 位（假设将使用无符号数字）。因此，要存储原始流，需要约 3.5M 的 RAM。有人已经说过这似乎不可行，但是我要说的是，只要输入 “足够好” 就可以解决任务。基本上，该想法是使用 0.29 或更高的压缩因子压缩输入数据，并以适当的方式进行排序。

让我们首先解决压缩问题。已经有一些相关的测试：

http://www.theeggeadventure.com/wikimedia/index.php/Java_Data_Compression

“我进行了一项测试，使用各种压缩形式压缩一百万个连续整数。结果如下：”

None     4000027
Deflate  2006803
Filtered 1391833
BZip2    427067
Lzma     255040

看起来 LZMA（ Lempel-Ziv-Markov 链算法）是继续的好选择。我已经准备了一个简单的 PoC，但仍有一些细节需要强调：

内存有限，因此其想法是对数字进行预排序并使用压缩的存储桶（动态大小）作为临时存储
通过预排序的数据更容易获得更好的压缩系数，因此每个存储区都有一个静态缓冲区（缓冲区中的数字要在 LZMA 之前进行排序）
每个存储桶都具有特定范围，因此可以分别对每个存储桶进行最终分类
可以正确设置存储桶的大小，因此将有足够的内存来解压缩存储的数据并分别对每个存储桶进行最终排序

内存中排序

请注意，附加代码是POC ，不能用作最终解决方案，它只是演示了使用几个较小的缓冲区以某种最佳方式（可能是压缩的）存储预排序数字的想法。不建议将 LZMA 作为最终解决方案。它用作向此 PoC 引入压缩的最快方法。

请参阅下面的 PoC 代码（请注意，这只是一个演示，需要使用LZMA-Java进行编译）：

public class MemorySortDemo {

static final int NUM_COUNT = 1000000;
static final int NUM_MAX   = 100000000;

static final int BUCKETS      = 5;
static final int DICT_SIZE    = 16 * 1024; // LZMA dictionary size
static final int BUCKET_SIZE  = 1024;
static final int BUFFER_SIZE  = 10 * 1024;
static final int BUCKET_RANGE = NUM_MAX / BUCKETS;

static class Producer {
    private Random random = new Random();
    public int produce() { return random.nextInt(NUM_MAX); }
}

static class Bucket {
    public int size, pointer;
    public int[] buffer = new int[BUFFER_SIZE];

    public ByteArrayOutputStream tempOut = new ByteArrayOutputStream();
    public DataOutputStream tempDataOut = new DataOutputStream(tempOut);
    public ByteArrayOutputStream compressedOut = new ByteArrayOutputStream();

    public void submitBuffer() throws IOException {
        Arrays.sort(buffer, 0, pointer);

        for (int j = 0; j < pointer; j++) {
            tempDataOut.writeInt(buffer[j]);
            size++;
        }            
        pointer = 0;
    }

    public void write(int value) throws IOException {
        if (isBufferFull()) {
            submitBuffer();
        }
        buffer[pointer++] = value;
    }

    public boolean isBufferFull() {
        return pointer == BUFFER_SIZE;
    }

    public byte[] compressData() throws IOException {
        tempDataOut.close();
        return compress(tempOut.toByteArray());
    }        

    private byte[] compress(byte[] input) throws IOException {
        final BufferedInputStream in = new BufferedInputStream(new ByteArrayInputStream(input));
        final DataOutputStream out = new DataOutputStream(new BufferedOutputStream(compressedOut));

        final Encoder encoder = new Encoder();
        encoder.setEndMarkerMode(true);
        encoder.setNumFastBytes(0x20);
        encoder.setDictionarySize(DICT_SIZE);
        encoder.setMatchFinder(Encoder.EMatchFinderTypeBT4);

        ByteArrayOutputStream encoderPrperties = new ByteArrayOutputStream();
        encoder.writeCoderProperties(encoderPrperties);
        encoderPrperties.flush();
        encoderPrperties.close();

        encoder.code(in, out, -1, -1, null);
        out.flush();
        out.close();
        in.close();

        return encoderPrperties.toByteArray();
    }

    public int[] decompress(byte[] properties) throws IOException {
        InputStream in = new ByteArrayInputStream(compressedOut.toByteArray());
        ByteArrayOutputStream data = new ByteArrayOutputStream(10 * 1024);
        BufferedOutputStream out = new BufferedOutputStream(data);

        Decoder decoder = new Decoder();
        decoder.setDecoderProperties(properties);
        decoder.code(in, out, 4 * size);

        out.flush();
        out.close();
        in.close();

        DataInputStream input = new DataInputStream(new ByteArrayInputStream(data.toByteArray()));
        int[] array = new int[size];
        for (int k = 0; k < size; k++) {
            array[k] = input.readInt();
        }

        return array;
    }
}

static class Sorter {
    private Bucket[] bucket = new Bucket[BUCKETS];

    public void doSort(Producer p, Consumer c) throws IOException {

        for (int i = 0; i < bucket.length; i++) {  // allocate buckets
            bucket[i] = new Bucket();
        }

        for(int i=0; i< NUM_COUNT; i++) {         // produce some data
            int value = p.produce();
            int bucketId = value/BUCKET_RANGE;
            bucket[bucketId].write(value);
            c.register(value);
        }

        for (int i = 0; i < bucket.length; i++) { // submit non-empty buffers
            bucket[i].submitBuffer();
        }

        byte[] compressProperties = null;
        for (int i = 0; i < bucket.length; i++) { // compress the data
            compressProperties = bucket[i].compressData();
        }

        printStatistics();

        for (int i = 0; i < bucket.length; i++) { // decode & sort buckets one by one
            int[] array = bucket[i].decompress(compressProperties);
            Arrays.sort(array);

            for(int v : array) {
                c.consume(v);
            }
        }
        c.finalCheck();
    }

    public void printStatistics() {
        int size = 0;
        int sizeCompressed = 0;

        for (int i = 0; i < BUCKETS; i++) {
            int bucketSize = 4*bucket[i].size;
            size += bucketSize;
            sizeCompressed += bucket[i].compressedOut.size();

            System.out.println("  bucket[" + i
                    + "] contains: " + bucket[i].size
                    + " numbers, compressed size: " + bucket[i].compressedOut.size()
                    + String.format(" compression factor: %.2f", ((double)bucket[i].compressedOut.size())/bucketSize));
        }

        System.out.println(String.format("Data size: %.2fM",(double)size/(1014*1024))
                + String.format(" compressed %.2fM",(double)sizeCompressed/(1014*1024))
                + String.format(" compression factor %.2f",(double)sizeCompressed/size));
    }
}

static class Consumer {
    private Set<Integer> values = new HashSet<>();

    int v = -1;
    public void consume(int value) {
        if(v < 0) v = value;

        if(v > value) {
            throw new IllegalArgumentException("Current value is greater than previous: " + v + " > " + value);
        }else{
            v = value;
            values.remove(value);
        }
    }

    public void register(int value) {
        values.add(value);
    }

    public void finalCheck() {
        System.out.println(values.size() > 0 ? "NOT OK: " + values.size() : "OK!");
    }
}

public static void main(String[] args) throws IOException {
    Producer p = new Producer();
    Consumer c = new Consumer();
    Sorter sorter = new Sorter();

    sorter.doSort(p, c);
}
}

使用随机数，它会产生以下结果：

bucket[0] contains: 200357 numbers, compressed size: 353679 compression factor: 0.44
bucket[1] contains: 199465 numbers, compressed size: 352127 compression factor: 0.44
bucket[2] contains: 199682 numbers, compressed size: 352464 compression factor: 0.44
bucket[3] contains: 199949 numbers, compressed size: 352947 compression factor: 0.44
bucket[4] contains: 200547 numbers, compressed size: 353914 compression factor: 0.44
Data size: 3.85M compressed 1.70M compression factor 0.44

对于一个简单的升序（使用一个存储桶），它将产生：

bucket[0] contains: 1000000 numbers, compressed size: 256700 compression factor: 0.06
Data size: 3.85M compressed 0.25M compression factor 0.06

编辑

结论：

不要试图愚弄大自然
使用更简单的压缩方式，减少内存占用
确实需要一些其他线索。通用的防弹解决方案似乎并不可行。

第 2 阶段：增强压缩效果，得出最终结论

如上一节所述，可以使用任何合适的压缩技术。因此，让我们摆脱 LZMA，转而采用更简单，更好（如果可能）的方法。有很多好的解决方案，包括算术编码，基数树等。

无论如何，简单但有用的编码方案将比另一个外部库更具说明性，并提供了一些漂亮的算法。实际的解决方案非常简单：由于存在带有部分排序数据的存储桶，因此可以使用增量代替数字。

编码方案

随机输入测试显示出更好的结果：

bucket[0] contains: 10103 numbers, compressed size: 13683 compression factor: 0.34
bucket[1] contains: 9885 numbers, compressed size: 13479 compression factor: 0.34
...
bucket[98] contains: 10026 numbers, compressed size: 13612 compression factor: 0.34
bucket[99] contains: 10058 numbers, compressed size: 13701 compression factor: 0.34
Data size: 3.85M compressed 1.31M compression factor 0.34

样例代码

public static void encode(int[] buffer, int length, BinaryOut output) {
    short size = (short)(length & 0x7FFF);

    output.write(size);
    output.write(buffer[0]);

    for(int i=1; i< size; i++) {
        int next = buffer[i] - buffer[i-1];
        int bits = getBinarySize(next);

        int len = bits;

        if(bits > 24) {
          output.write(3, 2);
          len = bits - 24;
        }else if(bits > 16) {
          output.write(2, 2);
          len = bits-16;
        }else if(bits > 8) {
          output.write(1, 2);
          len = bits - 8;
        }else{
          output.write(0, 2);
        }

        if (len > 0) {
            if ((len % 2) > 0) {
                len = len / 2;
                output.write(len, 2);
                output.write(false);
            } else {
                len = len / 2 - 1;
                output.write(len, 2);
            }

            output.write(next, bits);
        }
    }
}

public static short decode(BinaryIn input, int[] buffer, int offset) {
    short length = input.readShort();
    int value = input.readInt();
    buffer[offset] = value;

    for (int i = 1; i < length; i++) {
        int flag = input.readInt(2);

        int bits;
        int next = 0;
        switch (flag) {
            case 0:
                bits = 2 * input.readInt(2) + 2;
                next = input.readInt(bits);
                break;
            case 1:
                bits = 8 + 2 * input.readInt(2) +2;
                next = input.readInt(bits);
                break;
            case 2:
                bits = 16 + 2 * input.readInt(2) +2;
                next = input.readInt(bits);
                break;
            case 3:
                bits = 24 + 2 * input.readInt(2) +2;
                next = input.readInt(bits);
                break;
        }

        buffer[offset + i] = buffer[offset + i - 1] + next;
    }

   return length;
}

请注意，这种方法：

不消耗大量内存
与流一起使用
提供了不错的结果

完整的代码可以在这里找到，BinaryInput 和 BinaryOutput 实现可以在这里找到

定论

暂无最终结论:) 有时候，将自己升级到一个级别并从元级别的角度审查任务确实是个好主意。

花一些时间完成这项任务很有趣。顺便说一句，下面有很多有趣的答案。感谢您的关注和满意。