Compressing the NN evaluation results : part 3

(Here is part 1 and part 2)

Part 2 describes a proper entropy coded algorithm. With this algorithm, now it’s time to create a proper file storage.

File format

First, it is good to start any file with some magic number. This is so that we can easily troubleshoot loading the wrong types, and for the NN evaluation, this can be especially troublesome if we use the wrong file type.

For the neural net cache file, I decided to start the first four bytes with the magic number ‘\xfe’, ‘L’, ‘N’, ‘C’. The first character is a non-printable character so that any random text file won’t match.

After the magic number, it’s any number of NNCache entries in the following order:

Field nameSize (bytes)
Hash value8
Value4
Policy (pass)4
Length of compressed policy1
Compressed policy(from above)

In between every 1000 entries, a ‘guide’ entry is inserted, which is encoded as sixteen 0xff values. This is to serve as a ‘recovery point’ in the case part of the cache file is corrupted. The sixteen 0xff byte values are chosen as they will end up being illegal in the compressed policy because it will eventually be decoded into a stream of X31 symbols. If the decompressing code finds that there is something wrong with the decoded results, then it will search for the next ‘guide’ entry and start decompressing from there.

Because of this, the following two cases cannot be saved in disk: In the unlikely situation that one of these happens, we will not store them in cache:

  • Hash value ending up being 0xffff_ffff_ffff_ffff
  • Compressed policy ending up being larger than 255 bytes

Cache Operation

The cache can be loaded in one of the two commands:

  • lz-read_cache : Open cache file read-only, and use entries when available
  • lz-dump_cache : Open cache file read-write. Use entries when available, and if it’s a cache miss, append the evaluated result.

Whenever a cache file is loaded (on either mode), the whole content is parsed to create a hash table for random access. The hash table is simply a std::unordered_map with the keys being the hash value, and the value being the offset of the entry.

Once the table is built, accessing a cache entry is simply searching the table and then reading the cache file:

    auto iter2 = m_outfile_map.find(hash);
    if(iter2 == m_outfile_map.end()) {
        return false;  // Not found.
    }

    // at this point the file should open and run.
    std::ifstream ifs;
    ifs.open(m_filename);
    ifs.seekg(iter2->second);

    try {
        // throws and exception if ended up being parse error
        CompressedEntry e;
        e.read(ifs, hash);  // create compressed entry from stream
        e.get(result);      // decompress
    } catch (...) {
        return false;
    }

Memory management

One problem is that the table can get quite big if we want to do a lot of precomputation, even if we are trying to precompute the first 20 common moves. 1GB of precomputed results ends up being around 25 million entries, and putting 25 million entries on an unordered_map is quickly going to eat up GBs of memory. This definitely is going to hurt on low-end machines.

For now, the only way that I can make things work is to limit the table size so that we don’t exceed the cache budget (which is 10% by default). If the table starts getting to large, entries will be randomly dropped and they will be no longer searchable by that session.

This definitely is going to be an ongoing concern. A couple of possible ideas:

  • Come up with a more efficient data structure, which is fine-tuned to work with already-randomized 64-bit hashes
  • Disk-backed index tables – that is, keep the table itself in disk and access as needed

None of these are simple, nor there are good solution for any of these. I think all I should do at this point is to leave the scalability problem as a future work.

Want to try it out?

Code is here : https://github.com/ihavnoid/leela-zero/pull/new/disk-nncache

Here is a compressed net file created from the latest 40b net (ffe8ba44 – can get the net itself from here):

https://drive.google.com/open?id=181DracV9FpQ9kSvXc5v_ziL-6vZ8U-ME (914MB)

To try:

  • Run leela-zero as usual:
./leelaz -w ffe8ba44*.gz
  • Once you get the GTP prompt, it’s probably a good idea to increase the amount of memory assigned to the cache before loading the cache file:
lz-setoption name Percentage of memory for cache value 40
  • Then, load the cache
lz-read_cache ffe8ba44.lzc

You can see the first couple of moves will be blazing fast. Enjoy!

Leave a Reply

Your email address will not be published. Required fields are marked *