012 Fine Reading of the Best Research Paper at Nips 2017 Part Three How to Solve Imperfect Information Games

012 Fine Reading of the Best Research Paper at NIPS 2017 Part Three - How to Solve Imperfect Information Games #

Today, we are going to share the last of the best papers of NIPS 2017, “Safe and Nested Subgame Solving for Imperfect-Information Games” (link). What is this paper about? It discusses how to solve the problem of “imperfect-information games”.

Similar to the previous two papers we shared, this paper is also highly theoretical and not suitable for beginners. Here, we will only provide a high-level summary of the main ideas of the paper. If you are interested in the content, we recommend reading the original paper.

Another noteworthy observation is that even in today’s dominant era of deep learning, the three NIPS best papers we have shared this week are not related to deep learning. This, on one hand, demonstrates that deep learning is not the entirety of artificial intelligence; on the other hand, it shows us the breadth of the field of machine learning and artificial intelligence.

Authors’ Information #

This article is written by two authors.

The first author is Noam Brown. Brown is a Ph.D. student in the Computer Science Department at Carnegie Mellon University. His main research focus is on using the ideas of reinforcement learning and game theory to solve large-scale interactions among multiple robots. The “imperfect information games” mentioned in this article is a branch problem within this research area. Brown has published multiple papers in this field, including three AAAI papers, two NIPS papers, one ICML paper, and one IJCAI paper.

A highly relevant research work to this article was published in the journal Science in 2017, discussing how game theory was used to solve the problem of “Heads-up No Limit Poker” and surpassed human performance in real competitions. This work has received significant media coverage. In 2017, Brown also interned at Google DeepMind in London. Before his Ph.D. studies, he worked in the finance sector.

The second author is Brown’s advisor, Tuomas Sandholm. Sandholm is a professor in the Computer Science Department at Carnegie Mellon University. He has long-term research experience in the fields of “mechanism design” and “auction theory” and has published over 450 academic papers with over 20,000 citations. Besides his academic achievements, Sandholm has interesting anecdotes. For example, he has a wide range of hobbies listed on his homepage, including surfing, a passion for magic, and a love for flying.

Main Contributions and Core Methods of the Paper #

First, let’s take a look at the main contributions of this paper and understand what problem it primarily addresses.

For a paper with strong theoretical content, we usually need to constantly ask ourselves what the core theme of the paper is in order to understand its main focus.

Firstly, the paper discusses a problem of “imperfect information game”. What does this mean? To understand “imperfect information game”, we must first mention “perfect information game”.

In simple terms, “perfect information game” refers to a game in which both players have complete knowledge of the current state of the game, as well as the initial state before the game and the payoff of their respective strategies. Many familiar games, such as Go and Chess, are examples of “perfect information games”. The artificial intelligence algorithms developed by DeepMind, such as AlphaGo and AlphaGo Zero, are typical examples of AI algorithms designed for “perfect information games”.

“Imperfect information game” does not mean that we have no knowledge of the opponent’s actions, but rather that the information is incomplete. What does this mean? For example, we may not know the opponent’s actions in the current round, but we know who the opponent is and their possible strategies or the payoff of their strategies.

In addition to the differences in the definition, there are also structural differences in the overall problem.

In a “perfect information game”, the optimal strategy at a certain moment often only requires information about the current node of the decision tree and all the information corresponding to the subtree below that node, without needing the information before the current node or the information of other adjacent nodes.

What does this mean? Let’s take AlphaGo as an example. Essentially, in such a “perfect information game” scenario, theoretically, we can list all the possibilities of the board and the players’ moves, and then use a decision tree to represent the current decision state. In this case, after reaching a certain decision state, we often only need to analyze the subsequent states. Although the number of such cases may be huge, from a methodological perspective, we do not need to reference other information to make the optimal decision.

The biggest characteristic of “imperfect information game” is quite the opposite. Each sub-problem or sub-game decision in this context requires referencing other information. In fact, this paper describes a fact that the decision at any decision point in an “imperfect information game” often depends on sub-games that have not been “reached”.

In this regard, the paper actually refers to a “coin tossing game” to illustrate this problem. Due to limited space, we will not go into detail about this complex problem setting, but if you are interested, you can read the paper in depth.

But in general, the core of this “coin tossing game” is to demonstrate that when two players are playing a game of coin tossing with different rewards and some relationship in the game rules, one player can always completely change their strategy based on the situation. However, if the second player relies only on the feedback from the first player’s observations to make decisions, they may completely miss the change in strategy, resulting in choosing a suboptimal approach. The key here is that the second player cannot observe the entire game state due to the constraints of the rules, and the information they obtain does not fully reflect the first player’s strategy, leading to misjudgment.

To solve this game problem, this paper proposes a core algorithm that “abstracts” the current situation according to the current conditions. This abstraction is a simplified version of the game situation, hoping that it carries sufficient information. Then, we solve the game based on this abstraction, and when solving for the true global information, we use the abstraction solution to assist our decision-making. Sometimes, this abstraction is also called a “blueprint” strategy. The core of this paper lies in how to construct such a blueprint and how to leverage it for solving.

Experimental Results of the Method #

The article conducted experiments on the dataset of “Heads-up No-Limit Hold’em” and compared it with a previous algorithm version called “Libratus” published in the journal “Science”. The artificial intelligence algorithms significantly outperformed human players.

There is an algorithm called “Unsafe Subgame Solving” which does not consider the state of “imperfect information games” as non-perfect information, but rather treats it as perfect information. This algorithm has shown good performance in many games, but sometimes it can have very poor results, meaning it lacks “robustness”. The experiments here also demonstrate the need for the series of methods proposed in this paper.

Summary #

Today I have discussed the third best research paper from NIPS 2017. The core idea of the paper is to construct a blueprint to guide us in solving imperfect information games, especially in the context of poker.

Let’s review the key points: First, we briefly introduced the authors of the paper. Second, we outlined the problem this paper aims to solve and its contributions. Third, we provided a brief overview of the experimental results of the paper.

Finally, let’s leave you with a question: Why is the entire problem of solving imperfect information games currently not relying on deep reinforcement learning? What are your intuitive thoughts on this issue?