AlphaGo vs. Li Shishi's 2nd Anniversary: ​​The Secret of AI Algorithm

The protagonist of this article is AlphaGo, the Go AI developed by the Google DeepMind team. It has attracted a lot of attention with its feat of defeating the world's top player Li Shishi in 2016. Go is an ancient chess game, and there are many choices in each step. Therefore, the next position is very predictable in the Senate - requiring the players to have strong intuition and abstract thinking ability. Because of this, people have long believed that only human beings are good at playing Go. Most researchers even believe that it will take decades for AI to truly have this ability to think. But now it has been two years since AlphaGo played against Li Shishi (March 8 to March 15), and this article is just to commemorate this great day!

But even more frightening is that AlphaGo has not stopped its own progress. Eight months later, on a Go website, in the name of "Master", he played 60 professional games with champion players from all over the world and scored a winning result.

This is of course a huge achievement in the field of artificial intelligence, and has caused a new wave of discussion around the world – should we be excited about the speed of artificial intelligence development, or worry?

Today, we will use DeepMind's original research paper published in Nature to provide a simple and clear interpretation of its content, detailing what AlphaGo is and how it works. I also hope that after reading this article, you will no longer be intimidated by the sensational headline thrown by the headlines of the media, and really excited about the development of artificial intelligence.

Of course, you don't need to master Go skills, you can also understand the point of this article. In fact, I have read only a little bit of Go on the network encyclopedia. Instead, I actually use the basic chess example to explain the algorithm. You only need to understand the basic rules of a double board game - each player takes turns taking action and finally a winner. Other than that, you don't need to know anything about physics or high numbers.

This is to minimize the barrier to entry, in order to make it easier for friends who are new to machine learning or neural networks to accept. This article also deliberately reduces the complexity of the expression, but also hope that everyone can focus on the content itself.

As we all know, the goal of the AlphaGo project is to build an AI program and ensure that it can compete with the world's top human players in the field of Go.

In order to understand the challenges brought by Go, we first talk about another chess game similar to it - chess. As early as the early 1990s, IBM created a deep blue computer that defeated the great world champion Gary Kasparov in the chess game. So how does Deep Blue do this?

In fact, Deep Blue uses a very “violent” approach. At each step of the game, Deep Blue will consider all possible reasonable moves and explore along each move to analyze future changes. Under such forward-looking analysis, the calculation results quickly formed a huge decision tree of ever-changing. After that, Deep Blue will return to the origin along the tree structure, observing which moves are most likely to produce positive results. However, what is a "positive result"? In fact, many excellent chess players have carefully designed a chess strategy for Deep Blue to help them make better decisions – for example, is it to decide to protect the King or to gain an advantage elsewhere in the disk? They built specific "evaluation algorithms" for such purposes to compare the strengths or weaknesses of different disk positions (IBM introduced the expert's chess strategy into the evaluation function in a hard-coded form). In the end, Deep Blue will choose carefully calculated moves accordingly. In the next round, the whole process repeats again.

This means that Deep Blue will consider millions of theoretical positions before each step. Therefore, the most impressive performance of Deep Blue is not in the artificial intelligence software level, but in its hardware - IBM claims that Deep Blue was one of the most powerful computers on the market at the time. It can calculate 200 million disk positions per second.

Let us now return to Go. Go is obviously more open, so if you repeat the dark blue strategy here, you will not get the desired results. Since each move has too many selectable locations, the computer simply cannot cover so many potential possibilities. For example, in the beginning of chess, there are only 20 possible ways to go; but in Go, the first-hand players will have 361 possible points - and this range of choices has been very extensive throughout the game.

This is the so-called "great search space." Moreover, in Go, it is not so easy to judge the favorable or unfavorable weight of a particular face position - in the official stage, the two sides even need to arrange for a while to finally determine who is the winner. But is there a magical way to make computers work in the field of Go? The answer is yes, deep learning can accomplish this daunting task!

Therefore, in this study, DeepMind uses neural networks to accomplish the following two tasks. They trained a "policy neural network" to determine which is the most sensible option for a particular disk location (this is similar to following a visual strategy to choose a mobile location). In addition, they trained a set of "value neural networks" to estimate the extent to which a particular disk layout is beneficial to the player (or the actual impact of the position at which to win the game). They first trained these neural networks using human chess (the most traditional but also very effective supervised learning method). After such training, our artificial intelligence can imitate the way humans play chess to a certain extent - at this time, it is like a rookie human player. Then, in order to further train the neural network, DeepMind allows the AI ​​to play millions of times with itself (that is, the part of "enhanced learning"). In this way, with more full practice, the AI's chess power has been greatly improved.

With these two networks, DeepMind's artificial intelligence solution is enough to have the same level of chess as the most advanced Go program. The difference between the two is that the original program uses the more popular preset game algorithm, namely "Monte Carlo Tree Search (MCTS)", which we will introduce later.

But obviously, we haven't talked about the real core here. DeepMind's artificial intelligence solution relies not only on strategy and valuation networks—it does not use these two networks to replace the Monte Carlo tree search; instead, it uses neural networks to further enhance the effectiveness of the MCTS algorithm. The actual results are indeed satisfactory - the performance of the MCTS has reached the height of Superman. This improved variant of MCTS is "AlphaGo", which successfully defeated Li Shishi and became one of the biggest breakthroughs in the history of artificial intelligence.

Let us recall the first paragraph of this article. As mentioned above, how does a deep blue computer build a decision tree containing millions of disk positions and moves in every step of chess—the computer needs to simulate, observe, and compare every possible drop point— This is a simple and very straightforward approach. If a general software engineer has to design a chess program, they are likely to choose a similar solution.

But let us think about how humans play chess. Suppose you are currently at a particular stage in the game. According to the rules of the game, you can make a dozen different choices - move the pieces here or move the queen there and so on. However, do you really list all the moves you can take in your head and choose from this long list? No, you will "intuitively" narrow down the feasible range by at least a few key moves (assuming you have made 3 sensible moves), and then think about if you choose one of them, then the situation on the board will What kind of change happened. For each of these moves, you may need 15 to 20 seconds to consider - but please note that within these 15 seconds, we are not very precise in deriving the next confrontation and change. In fact, humans tend to “throw” some intuitively guided choices without much thinking (of course, good players will think farther and deeper than ordinary players). This is done because your time is limited and you can't accurately predict what follow-up strategies your opponents will outline. Therefore, you can only let your instincts guide yourself. I call this part of the thinking process "spreading", please pay attention to this in the following text.

After completing the "spreading" of several sensible moves, you finally decide to give up this headache and go straight to the most scientific step you think.

After that, the opponent will respond accordingly. This step may be as early as you expected, which means you are more confident about what to do next—in other words, you don't have to spend too much time on subsequent “spreading”. Or, your opponent may have a trick that will force you to fight back and have to think more carefully about the next step.

The game continues in this way, and as the situation progresses, you will be able to more easily predict the outcome of each move, and the time spent on it will be shortened accordingly.

The reason why I have said so much is to tell the role of the MCTS algorithm in a relatively simple way - it simulates the above thinking process by repeatedly constructing the move and the position "search tree". But the innovation is that the MCTS algorithm does not make potential moves at every location (different from deep blue); instead, it intelligently selects a small group of reasonable moves and explores them. During the exploration process, it “spreads” the changes in the situation caused by these moves and compares them based on the calculated results.

(Well, as long as you understand the above, the reading of this article is basically up to standard.)

Now let's go back to the paper itself. Go is a "perfect information game." That is to say, theoretically, no matter which stage of the game you are in (even if you just walk out one or two steps), you can accurately guess who wins and who wins (assuming both players will be 'perfect 'The way to complete the disk.' I don't know who proposed this basic theory, but as a premise of this research project, it is really important.

Flat Wire Power Inductors

Flat Wire Power Inductors,Flat Copper Wire Inductors,Flat Coil High Current Inductors,Flat Wire High Power Inductors

Shenzhen Sichuangge Magneto-electric Co. , Ltd , https://www.scginductor.com