I played chess against ChatGPT-4 and lost!
Last December, I played a few chess games against ChatGPT. These always ended the same way: ChatGPT would play an accurate opening, until it forgot where its pieces were and started playing illegal moves, with full confidence of course. The truth is, GPT-3 does not know how to play chess. Playing a game against it reveals its true nature as a stochastic parrot that merely produces a believable-sounding answer from its training set. I wrote in December:
ChatGPT cannot play chess at a human level (yet). It is clearly aware of the game and able to accurately play mainline openings. But the moment the game moves out of theory, ChatGPT can no longer keep up. This shows that the language model doesn’t (yet) have any understanding of chess fundamentals, but merely repeats moves and phrases that commonly occur in a documented chess game.
ChatGPT’s confidence, combined with its “bending” of the rules of Chess, became something of a meme on the Chess side of the internet, with Reddit posts that hit the front page and YouTube videos receiving millions of views. We laughed at it. Mocked it. Used its deficiencies to justify the superiority of humans over machines.
Then, GPT-4 arrived.
First game against GPT-4
I have gained rating points since December. My current Chess.com ELO rating sits at 1435, which indicates an intermediate player. While GPT-4 is marketed as a significant step up over GPT-3, I did not expect it play particularly well. So, I started a game. Here’s what happened.
And not only did I lose, I got blown off the board, checkmated in 20 moves!
I tried to use the unusual Polish Opening (1. b4) as an anti-GPT strategy, as there are significantly fewer games played in these positions than in the popular openings. It didn’t seem to matter: GPT-4 handled the position well and took advantage of my mistakes.
What scared me the most was the chatbot’s attacking style: it sacrificed a bishop to open up my king and launch a huge attack. This is a very different approach from traditional chess computers, and more like a decision a human player who likes to attack might make: not the best move by computer evaluation, but difficult for humans to defend against.
I used the Polish Opening for a second game as well.
GPT-4 starts the game with a very common mistake: 2… Nc6 leads to the horse being kicked around the board and forced into the way of black’s own bishop. I see this move all the time when playing human players, but I was expecting GPT-4 to have seen enough games to play something stronger. Or perhaps it played the move because it’s so common?
While I won an early pawn and eventually the game, it was anything but easy. On move 27, I made a mistake that leads to forced checkmate in 2, but ChatGPT missed it. The miss was very human-like as well, focusing on my attack on the knight rather than my weak king. The game ended after ChatGPT lost its rook and queen for the same attack that would have worked a few moves ago. Perhaps it forgot the white rook on b1?
I wanted to try a game with a popular opening as well, so I started one with d4.
This game led to a rook endgame where ChatGPT had a kingside pawn majority. I was hoping GPT-4 would start to falter at this phase of the game, as the number of moves played would surely mean there are no similar games in its data set. But, to my surprise, the bot played an excellent endgame. After I sneaked my rook to the 8th rank, the Chess engine Stockfish was evaluating my position as losing. However, GPT-4 did not find a move that would maintain its advantage, and chose to repeat moves and make a draw. Again, this is an unusual but human-like decision: GPT had more pawns than I did, thus giving it winning chances, but only if you can find the win. A position like this can easily be lost as well if you happened to lose one of the pawns.
THIS IS UNREAL!!!!!!!
I did not expect GPT-4 to be able to play chess, not to mention losing to it! ChatGPT played like a human: it lost a game by making mistakes in the opening and endgame, but won one through relentless attack. It also knew how to handle a slower, positional game, and a tight rook endgame.
GPT-4 did not make a single illegal move. In fact, it corrected me the few times I imputted a move wrong — though not at perfect accuracy either as I had to abandon a few games due to the mistakes I made in transcribing moves.
You could also say that GPT-4 was playing the game blindfolded as it had no access to refreshing its memory of the current state of the board. This explains mistakes like the one that won me our second game. If you ranked GPT-4 against blindfolded human players, how high would it score? Enough to earn a FIDE title? I also played all games as white to make sure the prompt I used (which was the same as in the December post) had no effect on the results.
I ended my December post with the following sentence:
It’s likely that OpenAI’s bot will be able to beat me in the future, but until then I’ll enjoy the superiority of meat over machine!
I did not expect that day to come only a few months after.
Computers have defeated humans before, but this time is different. It took ChatGPT three months to beat me at chess. How long will it take until it can beat me at programming? Probably not months. But probably not centuries either.
GPT-4 will change the world.