{"id":119,"date":"2019-01-18T14:23:51","date_gmt":"2019-01-18T14:23:51","guid":{"rendered":"http:\/\/cbaduk.net\/w\/?p=119"},"modified":"2019-01-18T14:23:52","modified_gmt":"2019-01-18T14:23:52","slug":"compressing-the-nn-evaluation-results-part-2","status":"publish","type":"post","link":"http:\/\/cbaduk.net\/w\/?p=119","title":{"rendered":"Compressing the NN evaluation results : part 2"},"content":{"rendered":"\n<p>In part 1, I wrote an ad-hoc algorithm which applied two commonly-used techniques &#8211; variable length coding and run length coding.  The ad-hoc mode seemed to start going nowhere, so I eventually settled with something more general.<\/p>\n\n\n\n<p>Two steps:<\/p>\n\n\n\n<ul><li>Encode the values 0 ~ 2047 using a run length coding method into some symbols, and then:<\/li><li>Do a simple variable length coding (something like Huffman Coding) to encode those symbols<\/li><\/ul>\n\n\n\n<p>Since we know that the values are mostly zeros, it is pretty straightforward that we have to do a run length encoding of zeros.  Other than that, just a smaller number of tricks will be better.  I settled with the following symbols:<\/p>\n\n\n\n<ul><li>V0 ~ V63 : A single value of 0 ~ 63<\/li><li>Z0 ~ Z15 : 2 ~ 17 consecutive zeros<\/li><li>X0 ~ X31 : For any symbol Xn, f the previous value was a V-type symbol, add 64(n+1) to that value.  If the previous value was a Z-type symbol, append 16(n+1) zeros.<\/li><\/ul>\n\n\n\n<p>For example,<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">129 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2<\/pre>\n\n\n\n<p>gets encoded into:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">V1 X2 Z0 X0 V1 V2<\/pre>\n\n\n\n<p>The next step is to encode these values in individual bits.  To see how many bits should be used to encode each value, I measured the frequency of each bits, and hand-assigned a variable number of bits for each symbol, ending up being<\/p>\n\n\n\n<table class=\"wp-block-table is-style-stripes\"><tbody><tr><td>Symbol<\/td><td>Bit encoding<\/td><\/tr><tr><td>V0<\/td><td>0100<\/td><\/tr><tr><td>V1<\/td><td>000<\/td><\/tr><tr><td>V2 ~ V3<\/td><td>X1100<\/td><\/tr><tr><td>V4 ~ V7<\/td><td>XX0010<\/td><\/tr><tr><td>V8 ~ V15<\/td><td>XXX1010<\/td><\/tr><tr><td>V16 ~ V31<\/td><td>XXXX0110<\/td><\/tr><tr><td>V32 ~ V63<\/td><td>XXXXX1110<\/td><\/tr><tr><td>Z0<\/td><td>0001<\/td><\/tr><tr><td>Z1<\/td><td>1001<\/td><\/tr><tr><td>Z2 ~ Z3<\/td><td>X0101<\/td><\/tr><tr><td>Z4 ~ Z7<\/td><td>XX1101<\/td><\/tr><tr><td>Z8 ~ Z15<\/td><td>XXX0011<\/td><\/tr><tr><td>X0<\/td><td>1011<\/td><\/tr><tr><td>X1<\/td><td>00111<\/td><\/tr><tr><td>X2 ~ X3<\/td><td>X10111<\/td><\/tr><tr><td>X4 ~ X7<\/td><td>XX01111<\/td><\/tr><tr><td>X8 ~ X15<\/td><td>XXX011111<\/td><\/tr><tr><td>X16~X31<\/td><td>XXXX111111<\/td><\/tr><\/tbody><\/table>\n\n\n\n<p>(The X on the bit encoding is the LSBs of the symbol number &#8211; For example, V18 is 00100110.  One notable thing is that V1 is more frequent than V0 which is because we encoded the majority of zeros using the Zn type symbols.)<\/p>\n\n\n\n<p>Since we are dealing with little endian (the least significant bits comes first), if we represent the example above as a 64-bit integer, we end up getting:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">01100 000 1011 0001 010111 000 = 0x0c162b8<\/pre>\n\n\n\n<p>So, how good are we?  I got 261.7 bits in average (or 32.7 bytes) which is much better than the ad-hoc method!<\/p>\n\n\n\n<p>Next step is to apply this in leela-zero, and then verify it doesn&#8217;t lose strength.<\/p>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>In part 1, I wrote an ad-hoc algorithm which applied two commonly-used techniques &#8211; variable length coding and run length coding. The ad-hoc mode seemed to start going nowhere, so I eventually settled with something more general. Two steps: Encode the values 0 ~ 2047 using a run length coding method into some symbols, and &hellip; <a href=\"http:\/\/cbaduk.net\/w\/?p=119\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Compressing the NN evaluation results : part 2&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"_links":{"self":[{"href":"http:\/\/cbaduk.net\/w\/index.php?rest_route=\/wp\/v2\/posts\/119"}],"collection":[{"href":"http:\/\/cbaduk.net\/w\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/cbaduk.net\/w\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/cbaduk.net\/w\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/cbaduk.net\/w\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=119"}],"version-history":[{"count":2,"href":"http:\/\/cbaduk.net\/w\/index.php?rest_route=\/wp\/v2\/posts\/119\/revisions"}],"predecessor-version":[{"id":121,"href":"http:\/\/cbaduk.net\/w\/index.php?rest_route=\/wp\/v2\/posts\/119\/revisions\/121"}],"wp:attachment":[{"href":"http:\/\/cbaduk.net\/w\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=119"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/cbaduk.net\/w\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=119"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/cbaduk.net\/w\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=119"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}