You thought GPT-5 was going to crush the competition this year? Spoiler alert: the opposite happened. The final LMArena ranking for December 2025 just dropped, and frankly, it's an earthquake. Google's Gemini 3 Pro sits at the top with 1490 Elo points, while OpenAI's GPT-5.2... doesn't even make the top 10.
Yes, you read that right. The thing is, this year-end reveals much more than a simple performance ranking - it exposes the radically different strategies of the AI giants. So, who really won 2025?
In this article
The LMArena December 2025 ranking: the painful numbers
The LMArena Text Arena leaderboard uses a blind preferential voting system. Basically, thousands of users compare two anonymous responses and choose the best one. The resulting Elo score is brutal but honest - you can't cheat with marketing.
And here's what it looks like in December 2025:
| Rank | Model | Elo Score | Publisher |
|---|---|---|---|
| 1 | Gemini 3 Pro | 1490 | |
| 2 | Gemini 3 Flash | 1478 | |
| 3 | Grok 4.1 Thinking | 1477 | xAI |
| 4 | Claude Opus 4.5 Thinking | 1469 | Anthropic |
| 5 | Claude Opus 4.5 | 1467 | Anthropic |
| 6 | Grok 4.1 | 1464 | xAI |
| 7 | Gemini 3 Flash Thinking | 1463 | |
| 8 | GPT-5.1 High | 1455 | OpenAI |
| 9 | Gemini 2.5 Pro | 1451 | |
| 10 | Claude Sonnet 4.5 Thinking | 1450 | Anthropic |
The first thing that jumps out? Google places 4 models in the top 10. The second? GPT-5.2, the flagship model announced with great fanfare by OpenAI, ends up at 14th place with only 1428 Elo points. That's 62 points behind Gemini 3 Pro.
To be honest, I really didn't expect this. When OpenAI released GPT-5.2 a few weeks ago, everyone thought it was going to be the game changer of the year-end. Missed.
Google crushes everyone: why Gemini 3 Pro dominates
So, what's Google's secret? Honestly, it's a combination of several factors that make the difference.
A superpowered multimodal architecture
Gemini 3 Pro isn't just good at text - it blows away benchmarks on all fronts. In video processing, it achieves 87.6% accuracy on Video-MMMU, where GPT-5.1 tops out at 80.4%. In mathematical reasoning and sciences (GPQA Diamond benchmark), we're talking about 91.9% accuracy. That's unprecedented.
Vision domination
On the LMArena Vision leaderboard, Gemini 3 Pro also holds first place with 1309 Elo points, compared to 1249 for GPT-5.1 High. The gap widens even more when talking about video comprehension or complex image analysis.
The massive network effect
Google has an advantage no one else has: 650 million users have free access to Gemini. This mass adoption creates a virtuous cycle - more data, more user feedback, more improvements.
The thing is, Google played the patience card. While OpenAI was churning out releases (GPT-5, then 5.1, then 5.2 in three months), Google took its time to refine Gemini 3 Pro. And it's paying off.
OpenAI in crisis: the spectacular fall of GPT-5.2
To be honest, this is the real surprise of this ranking. How can OpenAI, the pioneer of generative AI, end up so far behind?
The unsustainable release pace problem
GPT-5, GPT-5.1, GPT-5.2... three versions in three months. It's exhausting for everyone: developers who have to migrate, users who don't have time to adapt, and even OpenAI which no longer has a "magic moment" to offer.
A Reddit user summarizes the problem well: "The absurd release frequency makes stability impossible. Professionals want a lasting version, not bugs patched every week."
Paralyzing safety filters
I tested GPT-5.2 for two weeks, and frankly, the refusals to respond have become exasperating. The model refuses perfectly legitimate requests out of excessive caution. And guess what? On LMArena, a model that refuses to respond automatically loses the vote. Ouch.
The benchmark race vs. real utility
Improving scores from 98.7% to 99.2% on synthetic metrics doesn't impress anyone in everyday use. 99% of users see no difference. Meanwhile, Claude and Gemini focus on the real user experience.
The WebDev Arena paradox
Interesting fact: GPT-5.2 High ranks 2nd on the WebDev Arena with 1484 Elo points. So the model is good for web development, but average for everything else. It's a risky niche strategy when you're selling a mainstream product.
Anthropic, the real surprise winner of 2025
While everyone was watching the Google vs OpenAI duel, Anthropic quietly placed three Claude models in the top 10 of the general ranking. And that's not all.
Absolute domination of the WebDev Arena
If you're a developer, remember this name: Claude Opus 4.5 Thinking. With 1520 Elo points on the WebDev Arena, it dethrones everyone - including Gemini 3 Pro (1478 points) and GPT-5.2 High (1484 points).
| WebDev Rank | Model | Elo Score |
|---|---|---|
| 1 | Claude Opus 4.5 Thinking | 1520 |
| 2 | GPT-5.2 High | 1484 |
| 3 | Claude Opus 4.5 | 1483 |
| 4 | Gemini 3 Pro | 1478 |
Token efficiency that changes everything
Claude Opus 4.5's "thinking" mode uses 76% fewer tokens than Claude Sonnet 4.5 Thinking for equivalent performance. Basically, you pay less for an equally good result. For developers using the API in production, it's a massive argument.
Fewer hallucinations
On the AA-Omniscience Index, Claude Opus 4.5 Thinking shows a hallucination rate of 58%, compared to 68% for Grok 4 and 72% for Gemini 3 Pro. When you're building autonomous agents or critical workflows, this reliability makes all the difference.
Which model to choose based on your needs?
Honestly, we've entered an era where the "best single model" no longer exists. Here's my quick guide based on your use case:
For web development and code
Claude Opus 4.5 Thinking - Unbeatable. It's the obvious choice for autonomous agents, code workflows and automation.
For general reasoning and multimodality
Gemini 3 Pro - If you work with video, complex images, or tasks requiring advanced reasoning, it's the king.
For image generation
GPT Image 1.5 - Paradoxically, OpenAI still dominates this domain with 1264 Elo points on the Text-to-Image leaderboard.
For accessible everyday use
Gemini 3 Flash - Second in the general ranking, free for 650 million users. Unbeatable value for money.
My advice
If you ask my opinion after analyzing all this: don't bet on a single model anymore. The era when ChatGPT was the answer to everything is over. In 2026, pros will juggle between 2-3 models depending on the tasks. Claude for code, Gemini for multimodal, and maybe GPT Image for visuals. It's less simple than before, but that's how you get the best results.
Frequently asked questions
Is GPT-5.2 really bad?
No, it's not "bad." It ranks 14th globally, which is still excellent. But for OpenAI's flagship model, it's disappointing compared to expectations. It excels in web development (2nd in WebDev Arena) but disappoints in general use.
Why is Gemini 3 Pro ranked first?
Google combined an exceptional multimodal architecture, cutting-edge performance in vision and video, and massive free access that generates constant user feedback. Patience paid off.
Is Claude Opus 4.5 worth the price?
If you do development or automation, yes, absolutely. Token efficiency and reduced hallucinations make it the most cost-effective choice for production projects.
Which model for an AI beginner?
Gemini 3 Flash. Free, second in the global ranking, accessible everywhere. It's the best entry point to discover what AI can do in 2025.
Conclusion
The LMArena December 2025 ranking marks a turning point: Google dominates with Gemini 3 Pro, Anthropic establishes itself as the developers' champion with Claude, and OpenAI goes through an identity crisis with GPT-5.2. The era of the "best single model" is over - welcome to the world of segmented AI where each player has their specialty.
For you, what does that mean? That you'll have to test multiple tools and choose based on your real needs. And frankly, that's rather good news. More competition = more innovation = better tools for everyone.
Want to go further? Discover my other analyses to master Claude, Gemini and the other models in the top 10.
About the author: Flavien Hue has been testing and analyzing artificial intelligence tools since 2023. His mission: democratizing AI by offering practical and honest guides, without unnecessary technical jargon.