Perfect Snake

I like the game snake — not so much playing it, but implementing it. The natural consequence is an autopilot. This way I can just watch instead of playing. On the German version of this blog, there are already quite a few implementations with different heuristics, but nothing particularly good at playing Snake.

But now I present an autopilot which can (at least sometimes) play a perfect game of Snake.

A perfect game of Snake

In case this gif does not convince you, this autopilot can run directly in the browser at snake.schawe.me.

So, how does it work?

Neural Networks

If one does not know how to solve something, try to make a neural net come up with a solution. One example of this applied to classic Atari games was this paper, from ten years ago. We will apply this idea of reinforcement learning to snake in this post (but of course others have done this already [8, 9]).

The fundamental idea of reinforcement learning is quite simple. Just reward the model for good decision, such that it may learn to make good decisions. So here we will use the score defined as the length of the snake at the end of the game as the objective which is maximized by good decisions.

Fortunately, there is already a lot of literature how reinforcement learning can be implemented. We will use the actor-critic approach. So we construct a neural network which takes the current state of the game as input and splits into two heads. One head is the Actor with 3 outputs, which correspond to the next action to take: “right”, “left” or “straight ahead”. The other head is the Critic with one output representing an estimate of how long the snake can grow from the current situation.

For training a full game is played by following the advice of the Actor plus a bit of noise to explore new strategies. Then the Critic is trained with all encountered game states to produce estimates for the final score, which should predict the score that was indeed reached. For training the Actor, we take states of the game, make a different decision and ask the Critic how good the resulting situation is. Depending on the estimated quality, we teach the actor to make this decision more or less often. So Actor and Critic help each other at getting better and the common part of the neural net should gain an “Understanding” of the game which both can base their output on. Ingenious!

Technical Trivialities

My Implementation uses the Python libraries Keras and Tensorflow for training and multiJSnake (German post) as the environment. It is a strange decision to implement the environment in Java. The reason is that it already existed and the combination offered the opportunity to write a post on the blog of my employer.

For this post, we will just treat the environment as a black box, which enforces the rules of Snake.

Lokal Information

One of the most important decisions when designing the model is to determine the nature of the input. The simplest option, which is quite suited for testing, is using the local information around the head of the snake: three neurons (0 or 1) indicating whether the field right, left and ahead are occupied by a wall or the snake body (and eight more for the diagonals and two fields, left, right, ahead and behind for a bit more farsightedness). Also we have to indicate where the food is, which we solve with 4 further neurons (0 or 1) representing whether the food is left, right, in or against the direction of the snake’s movement.

Behind the input we build a fully connected layer and behind that we connect directly the two heads.

Layout of the local neural network (Visualisierung: netron)

And after a few thousand training games the snake moves directly towards the food and avoids itself. But it is not yet clever enough to avoid catching itself in loops. Well, even the heuristic of rsnake (German post) was better.

A few games with local information

Global Information

To avoid the snake trapping itself, we should give it global information of the playing field — it is only fair, since humans do see the whole field, too. But even with a \(10 \times 10\) field, there would be at least 100 input neurons, such that fully connected layers would lead to very large models. Instead, convolutional neural networks seem like a very good fit to solve this problem, especially since our input is of two-dimensional nature. To make life for our artificial intelligence a bit easier, we split our playing field in three channels

  1. the Head: only the position of the head has a 1, otherwise 0
  2. the body: the positions of the body have a value corresponding to the number of timesteps they will be occupied
  3. the food: only the position of the food has a 1, otherwise 0

The human view and what we show our neural net

This is not even an advantage for the snake, since a human player also sees with three color channels.

And to make life for our snake even easier, we change the output of the actor from three relative (left, right, ahead) to four absolute (north, east, south, west) directions.

Layout of the convolutional neural networks (Visualisierung: netron)

This model layout deserves to be called deep learning. The other model parameters can be looked up at github.com/surt91/multiJSnake.

And after a few tenthousand training games this model works well enough to routinely play perfect games on a \(10 \times 10\) field. And since I only trained it on a \(10 \times 10\) field, it fails on every other size.

Welcome to my Blog!

Here I publish posts in irregular intervals about things I do or want to be able to look up later.

I suggest new visitors to take a look at the following highlights instead of scrolling chronologically.

Unfortunately, most content is only available on the German version of this blog. But even if you do not speak German, this post about fractals should be nice to look at anyway.

Twitter Profile Background Colors

I collected tweets of \(>8'000'000\) Twitter users for an academic project. But Twitter does not only give you the tweets, but also many more data like the profile background color of users. It would be a shame to let these data go to waste, so I decided to process them into digital art. I wanted to show all the colors in one picture and group similar colors close to each other. This turned out to be much less trivial than I expected, since the space in which the colors live is the three dimensional RGB cube, but my image is only two dimensional. There is no “correct” way to project the colors down.

Here, I decided to put a 2D Hilbert curve through the image and paint the colors in the order they are encountered by a 3D Hilbert curve in the RGB cube. Ignoring the two default colors #F5F8FA and #C0DEED, this produces this image:

Twitter profile background colors

And thanks to the Python packages hilbertcurve and pypng the code needed to generate this image is quite harmless:

from math import ceil, sqrt, log2

from hilbertcurve.hilbertcurve import HilbertCurve
import png


"""
    turn an RGB string like `#C0DEED` into a tuple of integers,
    i.e., coordinates of the RGB cube
"""
def str2rgb(s):
    s = s.strip("#")
    return (int(s[0:2], 16), int(s[2:4], 16), int(s[4:6], 16))


"""
    `color_histogram` is a dict mapping an rgb string like `#F5F8FA`
    to the number of usages of this color
"""
def plot_background_colors(color_histogram, filename="colors.png"):
    defaults = {"F5F8FA", "C0DEED"}

    data = {str2rgb(rgb): d for rgb, d in color_histogram if rgb not in defaults}

    # calculate the size of the resulting image
    # for a 2D Hilbert curve, it mus be square with a width, which is a power of 2
    num_pixels = sum(data.values())
    min_width = ceil(sqrt(num_pixels))
    exponent = ceil(log2(min_width))
    width = 2**exponent

    # output buffer for a `width x width` png, with 4 color values per pixel
    buf = [[0 for _ in range(4 * width)] for _ in range(width)]

    hc2 = HilbertCurve(exponent, 2)
    # there are 256 = 2^8 values in each direction of the RGB cube
    hc3 = HilbertCurve(8, 3)

    sorted_rgbs = sorted(data.keys(), key=lambda x: hc3.distance_from_point(x))

    idx = 0
    for rgb in sorted_rgbs:
        for _ in range(data[rgb]):
            # get the coordinate of the next pixel
            x, y = hc2.point_from_distance(idx)
            # assign the RGBA values to the pixel
            buf[x][4 * y] = rgb[0]
            buf[x][4 * y + 1] = rgb[1]
            buf[x][4 * y + 2] = rgb[2]
            buf[x][4 * y + 3] = 255

            idx += 1

    png.from_array(buf, 'RGBA').save(filename)

The input histogram was in my case just a simple SQL query away:

SELECT profile_background_color, COUNT(profile_background_color) FROM users
    GROUP BY profile_background_color;