Even the most advanced artificial intelligence systems aren’t immune to simple challenges. That’s exactly what was seen when Google DeepMind’s flagship model, Gemini 2.5 Pro, struggled to complete the classic Game Boy title Pokémon Blue—a game many kids breeze through with ease.
AI Model Stumbles on Simple Game: Gemini’s “Panic” Under Pressure
The unexpected twist unfolded on a Twitch channel run by independent engineer Joel Zhang. His live stream showcased Gemini’s attempt to play the retro RPG, where it displayed unusual behavior, including repetitive thoughts, overreactions, and decision paralysis during gameplay. The DeepMind team described this as episodes of “Agent Panic.”
According to their report, Gemini frequently reiterated the need to heal Pokémon or exit dungeons even when the danger wasn’t immediate. For viewers, this came across as the AI spiraling into panic—a trait not expected from logic-driven large language models (LLMs). Twitch chat participants quickly caught on, pointing out recurring patterns of digital distress.
While AI doesn’t experience fear or anxiety the way humans do, the performance mirrored stress-induced human behavior. It often made inefficient decisions, especially under in-game pressure. This adds to the growing discussion about how AI decision-making under uncertainty may not be as rational or resilient as once believed.
Gemini’s Gameplay Stats Raise Concerns About AI Efficiency
Gemini’s first full attempt at finishing Pokémon Blue took a staggering 813 hours. After tweaking the system, Zhang managed to cut that time in half—406.5 hours—still far slower than the average child’s gameplay time. Social media users, always quick with humor, coined the term “LLANXIETY” to describe the AI’s jittery logic, blending “LLM” and “anxiety.”
This revelation comes on the heels of a recent Apple study asserting that most AI models don’t truly “reason.” Instead, they rely heavily on pattern recognition. When the situation shifts or becomes slightly more complex, their performance falters—a reality seemingly confirmed by Gemini’s Pokémon struggles.
For developers and AI researchers, this raises important questions about the reliability of large language models in unfamiliar or multi-step tasks. If an AI model can’t confidently handle a game with clear rules and linear objectives, can it really be trusted with high-stakes decision-making?
In a time when AI safety, transparency, and adaptability are under global scrutiny, Gemini’s journey through the Kanto region offers more than just nostalgia. It serves as a reminder that intelligence alone doesn’t equal competence—especially when the environment demands more than just data prediction.