I played chess against ChatGPT
Can the world’s most famous language model defeat a software engineer?
ChatGPT is a large language model designed for conversation, that Twitter users around the world have found scarily human-like in its answers. The AI model doesn’t only answer generic questions about facts, but it can also write accurate code in multiple programming languages, as well as debug code written by humans. ChatGPT also understands humor, being able to tell jokes as well as explain them. This kind of competence across multiple domains is something AI has traditionally lacked, and is often seen as a precursor towards artificial general intelligence (AGI).
Are the broad abilities of ChatGPT a sign of higher intelligence, or merely an illusion of having digested unholy amounts of textual data from public domain? Time to put the bot to test. Our medium? The oldest game of minds — Chess!
I am not a professional chess player — far from it. My ELO ranking in chess.com hovers around 1200, indicating a novice or intermediate level player. While chess is traditionally played over the board, the game has a standardised notation for recording moves — PGN. With PGN, I can relay my moves to ChatGPT and ask it to respond using the notation. I can then play the bot’s moves on a physical chess board, as pictured above. Makes sense? Let’s give it a go!
As a Queen’s Pawn player, I begin the game with 1. d4. ChatGPT responds with the principled move 1… d5, staking a claim to the centre with its own Queen’s pawn. With 2. c4 e6 3. Nc3 Nf6 4. Bg5, the game continues as a Queen’s Gambit Declined: Modern Variation opening. So far so good.
ChatGPT continues the game with an unusual move: 4… dxc4. Accepting the gambit is not a mistake by any means, but dxc4 is typically played earlier if black’s intention is to take the pawn. In the Chess.com games database, this move shrinks the number of master games we are following from 13,385 to 302.
I continue by attacking ChatGPT’s pawn with 5. e3. The AI chooses not to defend, instead developing its bishop with 5… Be7. This principled move prepares to castle the king, and unpins the knight at f6. 6. Bxc4 wins the pawn back, after which both players develop their remaining kights and castle.
6… Nc6 7. Nf3 O-O 8. O-O
As you can see from the eval bar on the left, Stockfish (chess engine evaluating the position) prefers my position over ChatGPT’s. However, the game is nothing but over. As both players have castled, the opening stage of the game has now transitioned into the middlegame. Chess.com’s database only knows of two master games in this position, a sign that we are heading into uncharted waters.This is the point where ChatGPT’s game plan begins to go downhill.
ChatGPT decides to drive away the white bishop with 8… h6 9. Bh4 g5 10. Bg3. Pushing the bishop back wins time and space, at the cost of weakening the black king by pushing the pawns in front of it. ChatGPT continues with aggression by jumping to the centre with its knight (10… Nd5).
This is inaccurate, because after 11. e4 the knight will either have to retreat or trade itself for my knight on c3 which simply wastes time. ChatGPT chooses the latter approach with 11… Nxc3 12. bxc3.
At this point, I started to run into issues, as ChatGPT stopped answering prompts. It wasn’t clear to me whether this was because the servers were getting congested (the site has seen crazy amounts of traffic over the past few days) or whether the AI was struggling to generate sensible answers for a chess middlegame. As you can see from the screenshot below, the explanations given by ChatGPT for its earlier moves no longer make much sense — the bishop on g3 is not under attack and the knight can no longer go to e4 and g4 after moving it.
After a few errors I tried to continue the game by starting a new conversation. While ChatGPT understood the game, it no longer remembered the location of its pieces and attempted to play illegal moves. While this could be the consequence of starting a new conversation, based on ChatGPT’s explanations it was always bound to happen at some point. The AI was responding with moves and words that chess players often say — regardless of whether they appear in the current game or not!
As ChatGPT could not produce a legal move even after several retries, I had no option but declare the AI as having resigned. Humanity prevails! Here’s the final position after 12 moves.
Out of interest and fairness, I also asked ChatGPT to start a game as white. It answered by describing moves with natural language. However, this didn’t work at all: ChatGPT’s response to my French Defense makes no sense as the Queen’s knight has no way of attacking the pawn on e6.
To conclude, ChatGPT cannot play chess at a human level (yet). It is clearly aware of the game and able to accurately play mainline openings. But the moment the game moves out of theory, ChatGPT can no longer keep up. This shows that the language model doesn’t (yet) have any understanding of chess fundamentals, but merely repeats moves and phrases that commonly occur in a documented chess game. The model is still a stochastic parrot, no matter what people say on Twitter.
Still, a language model with no inbuilt understanding of chess being able to play a reasonable game for eleven moves is no small feat! Playing the game really did feel like playing a human. I didn’t need to utter any special phrases like we do with Siri or Alexa — ChatGPT immediately understood what I was trying to do just by asking it to play a game of chess.
For future study, it would be worth trying different prompts at the start of the game. For example, you could ask ChatGPT to pretend to be Magnus Carlsen before starting the game. I also had some success with asking the bot not to explain its moves. Playing the whole game over one conversation would likely help as well. The free preview of ChatGPT is currently too congested to reliably explore the topic further, but this is something I’m planning to do once a paid version is released. It’s likely that OpenAI’s bot will be able to beat me in the future, but until then I’ll enjoy the superiority of meat over machine! Just don’t tell Stockfish or AlphaZero…