{"id":144,"date":"2019-07-23T13:50:24","date_gmt":"2019-07-23T13:50:24","guid":{"rendered":"http:\/\/cbaduk.net\/w\/?p=144"},"modified":"2019-07-23T13:50:25","modified_gmt":"2019-07-23T13:50:25","slug":"handicap-games","status":"publish","type":"post","link":"https:\/\/cbaduk.net\/w\/?p=144","title":{"rendered":"Handicap Games"},"content":{"rendered":"\n<p>This was updated a while ago, but I didn&#8217;t really update the blog due to being just&#8230; lazy. \ud83d\ude42  Yes, cbaduk.net plays handicapped games, and it was playing handicapped games for the last three months!  There was a lot of changes internally regarding what I was doing &#8211; sometimes it got worse, sometimes it was doing blatantly stupid moves, and now it seems that I have something.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Basic idea<\/h3>\n\n\n\n<p>The basic idea is to have a third head, as I wrote previously on <a href=\"https:\/\/github.com\/leela-zero\/leela-zero\/issues\/2331\">https:\/\/github.com\/leela-zero\/leela-zero\/issues\/2331<\/a><\/p>\n\n\n\n<ul><li>The problem is that if a player is on a losing situation, it should&nbsp;<em>add uncertainty<\/em>&nbsp;and&nbsp;<em>increase the probability of a larger territory<\/em>&nbsp;even if there is no chance of winning.<\/li><li>Thus, it seems that we need another output plane &#8211; in this case, the board occupancy seemed to be a useful feature. So, I added two outputs &#8211; each are used for predicting the end state of the board &#8211; to be specific, the third head looks like this:<\/li><\/ul>\n\n\n\n<pre class=\"wp-block-code\"><code>        # endstate head\n        conv_st = self.conv_block(flow, filter_size=1,\n                                   input_channels=self.RESIDUAL_FILTERS,\n                                   output_channels=2,\n                                   name=\"endstate_head\")\n        h_conv_st_flat = tf.reshape(conv_st, [-1, 2 * 19 * 19])\n        W_fc4 = weight_variable(\"w_fc_4\", [2 * 19 * 19, (19 * 19) * 2])\n        b_fc4 = bias_variable(\"b_fc_4\", [(19 * 19) * 2])\n        self.add_weights(W_fc4)\n        self.add_weights(b_fc4)\n        h_fc4 = tf.add(tf.matmul(h_conv_st_flat, W_fc4), b_fc4)\n<\/code><\/pre>\n\n\n\n<ul><li>To have this, we need to play the game until the very end &#8211; so that instead of resigning, the game enters an &#8216;acceleration mode&#8217; (10 playouts) once the losing side passes the resignation threshold.<\/li><li>The &#8216;endstate&#8217; plane is used as an auxiliary plane for the value head &#8211; to be specific, the winrate is 80% from the value output, and 20% from the endstate net &#8211; using this formula:<\/li><\/ul>\n\n\n\n<pre class=\"wp-block-code\"><code>   endstate_winrate = tanh( avg_delta * confidence \/ 10.0 )\n   delta = sum (number_of_my_stone - number_of_opponent_stone + komi_bias)\n   confidence = average ( (v - 0.5) * ( v - 0.5) for v in endstate_plane )\n<\/code><\/pre>\n\n\n\n<p>That is, winrate is calculated by the multiple of expected score and uncertainty &#8211; that is, the engine will prefer playing a chaotic game rather than giving the opponent clear territory.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Well&#8230;<\/h3>\n\n\n\n<p>The idea seemed to be quite good, except it seems to need to gobble up a terribly large amount of compute.  Leela zero already spend more than millions of dollars worth of compute resources to get to the current situation, and all I have is three NVIDIA GTX1080s.  Spending compute resources for doing all that was just going nowhere.  Progress was slow, and even after months it was playing somewhat sensibly and then&#8230; playing blatantly stupid moves.  It seemed that I can&#8217;t really afford to do all that training.<\/p>\n\n\n\n<p>Instead, I tried a different approach&#8230; what if, I just collected Leela Zero self-play data and recreate the endstate plane?  To elaborate,<\/p>\n\n\n\n<ul><li>Parse the self-play data and create a list of moves and policy data for each move,<\/li><li>Feed those moves and policy data to leela zero (by creating a custom command) and enter acceleration mode immediately after the end of the input<\/li><\/ul>\n\n\n\n<p>This strategy resulted in processing 100K games in roughly 40 hours.  Yes it isn&#8217;t perfect since the original plays were for a komi of 7.5, but it probably is much better than any amount of compute I can afford.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">So, how does it play?<\/h3>\n\n\n\n<p>Give it a try!  I played a couple of six-stone handicap games (obviously I was playing black) and it seems to make aggressive attacks trying to capture black&#8217;s stones.  Most of the games I ended up getting at least one dragon being slaughtered and then lose by a dozen points. \ud83d\ude42<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This was updated a while ago, but I didn&#8217;t really update the blog due to being just&#8230; lazy. \ud83d\ude42 Yes, cbaduk.net plays handicapped games, and it was playing handicapped games for the last three months! There was a lot of changes internally regarding what I was doing &#8211; sometimes it got worse, sometimes it was &hellip; <a href=\"https:\/\/cbaduk.net\/w\/?p=144\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Handicap Games&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"_links":{"self":[{"href":"https:\/\/cbaduk.net\/w\/index.php?rest_route=\/wp\/v2\/posts\/144"}],"collection":[{"href":"https:\/\/cbaduk.net\/w\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/cbaduk.net\/w\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/cbaduk.net\/w\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/cbaduk.net\/w\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=144"}],"version-history":[{"count":1,"href":"https:\/\/cbaduk.net\/w\/index.php?rest_route=\/wp\/v2\/posts\/144\/revisions"}],"predecessor-version":[{"id":145,"href":"https:\/\/cbaduk.net\/w\/index.php?rest_route=\/wp\/v2\/posts\/144\/revisions\/145"}],"wp:attachment":[{"href":"https:\/\/cbaduk.net\/w\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=144"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/cbaduk.net\/w\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=144"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/cbaduk.net\/w\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=144"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}