Two (or maybe three, depending on how you count) new features were added:
- 9×9 boards
- 13×13 boards
- Save/restore
The 9×9 and 13×13 NNs were trained by myself from scratch, using a small cluster of three NVIDIA GTX 1080 GPUs. 9×9 took two weeks to stabilize, while the 13×13 was running for the last two months and still improving constantly (so there is a lot of room for improvement) but it’s at least good enough to beat me comfortably. 🙂
Save/restore is simply encoding the current board state into a single URL string so that nothing needs to be saved on the servers. A 200-move game gets encoded into roughly 400 bytes for now, but this should also need some optimization. The current implementation is simply gzip-compressing the board state (encoded in JSON), and then add a bit of encryption for additional obscurity.
Training the 9×9 / 13×13 nets
Leela-zero uses a distributed framework (http://github.com/gcp/leela-zero-server/) for training the nets across a wide geographical region, but this isn’t really needed if anybody wants to do it himself/herself in a small number of machines. For me, all it took was a shared NFS drive across a couple of machines, a small amount of C++ coding, plus a couple of shell scripts.
The first modification was a custom GTP command ‘autotrain’. Just typing the command plays 25 self-plays and records the play results. This replaces autogtp. The added benefit is that time is spent more on playing the game rather than preparing the engine for startup. This helps a lot because the games are much shorter on smaller boards. The code looks something like this:
https://github.com/ihavnoid/leelaz-ninenine
Once the C++ code changes are here, it’s merely picking up the latest net from the training pipeline, and then running again and again.
#!/bin/bash
suffix=$1
gpunum=$2
for((i=0;i<50000;i=i+1)) ; do
resign_rate=$((i%10))
timestamp=$(date +%y%m%d_%H%M%S)_${suffix}
latest_weight=$(ls -1c training/tf/*.txt | head -1)
leelaz_cmd="src/leelaz -q -m 8 -n -d -r $resign_rate -t 5 --batchsize 5 -v 800 --noponder --gtp --gpu $gpunum"
sleep 5
echo leelaz_cmd : $leelaz_cmd
echo latest_weight : $latest_weight
echo timestamp : $timestamp
echo autotrain traindata_${timestamp} 25 \n quit | ${leelaz_cmd} -w $latest_weight
done
For the training code, it picks up the latest 1500 bundles (each with 25 games, so total 37500 games) and trains itself for 20k steps with batch size of 256, save net, and re-scan for the latest 1500 bundles, and so forth. I just left it running for the last month or two, playing 10 games every day vs. the previous day’s net. Other than that, everything else is stock leela-zero.
The 13×13 progress seems to be good:
Day W/L
----------
30 8-2
31 5-5
32 5-5
33 6-4
34 5-5
35 4-6
36 5-5
37 7-3
38 7-3
39 8-2
40 7-3
41 7-3
42 8-2
43 7-3
44 9-1
45 5-5
I am going to leave it running for another month and see how it goes, unless something goes wrong. 🙂