In part 1, I wrote an ad-hoc algorithm which applied two commonly-used techniques – variable length coding and run length coding. The ad-hoc mode seemed to start going nowhere, so I eventually settled with something more general.

Two steps:

- Encode the values 0 ~ 2047 using a run length coding method into some symbols, and then:
- Do a simple variable length coding (something like Huffman Coding) to encode those symbols

Since we know that the values are mostly zeros, it is pretty straightforward that we have to do a run length encoding of zeros. Other than that, just a smaller number of tricks will be better. I settled with the following symbols:

- V0 ~ V63 : A single value of 0 ~ 63
- Z0 ~ Z15 : 2 ~ 17 consecutive zeros
- X0 ~ X31 : For any symbol Xn, f the previous value was a V-type symbol, add 64(n+1) to that value. If the previous value was a Z-type symbol, append 16(n+1) zeros.

For example,

129 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2

gets encoded into:

V1 X2 Z0 X0 V1 V2

The next step is to encode these values in individual bits. To see how many bits should be used to encode each value, I measured the frequency of each bits, and hand-assigned a variable number of bits for each symbol, ending up being

Symbol | Bit encoding |

V0 | 0100 |

V1 | 000 |

V2 ~ V3 | X1100 |

V4 ~ V7 | XX0010 |

V8 ~ V15 | XXX1010 |

V16 ~ V31 | XXXX0110 |

V32 ~ V63 | XXXXX1110 |

Z0 | 0001 |

Z1 | 1001 |

Z2 ~ Z3 | X0101 |

Z4 ~ Z7 | XX1101 |

Z8 ~ Z15 | XXX0011 |

X0 | 1011 |

X1 | 00111 |

X2 ~ X3 | X10111 |

X4 ~ X7 | XX01111 |

X8 ~ X15 | XXX011111 |

X16~X31 | XXXX111111 |

(The X on the bit encoding is the LSBs of the symbol number – For example, V18 is 00100110. One notable thing is that V1 is more frequent than V0 which is because we encoded the majority of zeros using the Zn type symbols.)

Since we are dealing with little endian (the least significant bits comes first), if we represent the example above as a 64-bit integer, we end up getting:

01100 000 1011 0001 010111 000 = 0x0c162b8

So, how good are we? I got 261.7 bits in average (or 32.7 bytes) which is much better than the ad-hoc method!

Next step is to apply this in leela-zero, and then verify it doesn’t lose strength.