The Methods We Used to Solve Othello

1 Jun 2024

Author:

(1) Hiroki Takizawa, Preferred Networks, Inc., Chiyoda-ku, Tokyo, Japan (contact@hiroki-takizawa.name).

Table of Links

Discussion and Conclusions and Acknowledgements

Additional Information and Declarations and References

3 Methods

3.1 Use of Terms “Ply” and “Move”

In chess, there is a tradition where two sequential moves, one from white and the other from black, are called a “move” or “full-move”, while an individual move is called a “ply” or “half-move”. However, in this article, we refrain from using “ply” and always use “move” to denote an individual move.

3.2 Rules of Othello

The rules of Othello are as follows:

Black moves first, after which the players alternate.
If there is an empty square that satisfies the conditions for placing a stone (see below), the player to move must choose one of such empty squares and place a stone there. If there is no empty square that satisfies the condition, the player must pass their turn.
If a player places a stone such that there are opponent’s stones in a straight line (horizontal, vertical, or diagonal) between this new stone and another of the player’s stones, with no empty squares in between, then the opponent’s stones in that line are flipped to become the player’s stones.
Each player can only place a stone on the squares where placing a stone flips one or more of the opponent’s stones.
The game ends when the board is completely filled or when there is no square on which either player can place a stone.
The player with more stones on the board at the end of the game wins.

The difference in the number of stones at the end of the game is called the “score”.

3.3 Modification to Edax

Existing Othello software called Edax [22] was used to solve the position with 36 empty squares. Edax is based on alpha-beta search [13] and employs many techniques to improve search efficiency. While Edax is among the strongest software under typical match rules (e.g., 10 seconds per one move), its algorithm was considered suboptimal for the purpose of solving games over tens of minutes in situations where there is a large difference in scores. Therefore, the following two modifications were made:

• We disabled aspiration search during iterative deepening. In other words, when narrowing down alpha and beta for a solving, we modified it to take wider alpha and beta values during the shallow iterations of iterative deepening. This is because there is a tendency for the computation time to increase when the principal variation is updated. Therefore, during shallow iterations, even if a move results in a fail-high, the search should not be terminated, and instead, the best move should be sought.

• When performing move ordering, if the results from a shallow search are found in the transposition table, we ignore them if they are relatively too shallow. This is because, especially in a solving, the search depth becomes very high. In regions where the search depth is shallow, the accuracy of move ordering based on the transposition table greatly impacts performance.

The source code of modified Edax is available on GitHub and Zenodo (see Data Availability section).

3.4 Obtaining a set of target positions (with 50 empty squares) by optimal alpha-beta search

We developed an algorithm that requires predictive scores for all positions with 50 empty squares and returns a subset such that if all positions belonging to that subset are solved and all solutions match the predictions, the initial position is consequently solved. This is described by Algorithm 1. This algorithm is similar to the alpha-beta search with (α, β) = (−1, 1) fixed.

By inputting the initial position as the first argument and the dictionary of the positions and predictive scores as the second, a subset satisfying the conditions can be obtained. Notably, to keep the number of elements in the subset small, this algorithm internally performs the other alpha-beta search.

As is commonly known, alpha-beta search is most efficient when the best move is searched first. Therefore, an internal alpha-beta search is conducted to find the best move. This inner search can be made more efficient through elementary memoization. As a result, even when implementing the inner search, it’s possible to ensure that the computational time complexity does not increase.

3.5 Obtaining a set of target positions (with 36 empty squares) by optimal alpha-beta search

We implemented Algorithm 5, which requires a position with 50 empty squares and data about position(s) with 36 empty squares and outputs a set of position(s) with 36 empty squares and a corresponding result hypothesis. This algorithm can process known search outcomes for positions with 36 empty squares, and output position(s) with 36 empty squares and corresponding estimated game-theoretic value; if we can confirm that all outputted estimations are correct, then we can prove the game-theoretic value of the input position.

Importantly, it can differentiate between positions that we have obtained game-theoretic value for and those that we have only estimated the value of.

Algorithms 2, 3, and 4 are auxiliary algorithms for Algorithm 5. These three algorithms can be made more efficient through elementary memoization, which is omitted from the pseudo-codes. The source code is available at GitHub (see Data Availability section).

We solved the positions with 36 empty squares, which were obtained from Algorithm 5, using a computer cluster and Edax software. If the predicted value of the results was less than 30, we read with 4 cores; otherwise, we did so with 1

core. In the initial phase of this computation, we did not have confidence that the game-theoretical value of the initial position would be a draw, so we set the alpha-beta window to [−3, +3]. Subsequently, when reading with 4 cores, we changed it to read with a [−1, +1] window. When reading with 1 core, we always kept the window at [−3, +3]. The advantage of reading with 1 core is avoiding efficiency degradation due to parallel searching, and because the number of search positions becomes deterministic, it ensures reproducibility.

The continued benefit of keeping the window at [−3, +3] is that if the game-theoretical value of the initial position turned out to be -2 rather than a draw, there would be no need for re-computation, and there is no need to change the command-line options of Edax, meaning we do not have to save it for each problem. The downside is a possible slight increase in the number of search positions, but for positions with a significant difference, it is believed to have minimal impact compared to when the window is set to [−1, +1].

The game-theoretic values of the output positions can sometimes deviate from the estimations of the above algorithm. In such cases, by adding the results to D and rerunning the Algorithm 5, more positions to be solved can be identified. Once Algorithm 5 no longer outputs any position, it can be said that the proof of the input position with 50 empty squares has been established. This procedure can be described as Algorithm 6.

3.6 Constructing a program that never loses

To satisfy the Weakly solved condition, we implemented a Python script that acts as a perfect (i.e., never loses) player by referring to the results. The script refers to the result table while there are more than 36 empty squares, and after that, it delegates to Edax to play perfectly.

3.7 Materials

To solve positions with 36 empty squares, we used the CPUs of MN-J, a supercomputer owned by Preferred Networks Inc. MN-J refers collectively to multiple supercomputers: MN-2A, MN-2B, and MN-3, all of which appear to users as a single Kubernetes cluster. This supercomputer is equipped with several types of CPUs (Intel Xeon 6254, AMD EPYC 7713, Intel Xeon 8380, and Intel Xeon 8260M), and all CPUs feature main memory with Error Checking and Correction (ECC).

This paper is available on arxiv under CC 4.0 license.