How often is the AI ranking updated?

The SWEN ranking is updated automatically: Artificial Analysis data (Intelligence Index, speed) every 6 hours; API pricing via OpenRouter daily; LMArena ELO weekly. The page revalidates its cache every 5 minutes via ISR. Last sync: Jun 01, 2026.

What is the difference between Gemini 3, Gemini 3.1 and Gemini 3.5?

Google's Gemini 3 family does not follow sequential numbering. Google released Gemini 3 Flash, Gemini 3.1 Pro, Gemini 3.1 Flash Lite and Gemini 3.5 Flash — without publishing an official "Gemini 3.2". Each number denotes a distinct technical generation with significant architectural improvements. Gemini 3.1 Pro costs $2.00/1M tokens with a 1-million-token context window, competing with GPT-4o and Claude 3.7. Gemini 3.1 Flash Lite is the most economical at $0.25/1M. See https://swen.live/ranking to compare all Gemini models with GPT-5, Claude 4 and Llama 4.

Independent AI analysis

AI Ranking 2026

Name: AI Ranking 2026
Creator: SWEN
License: https://creativecommons.org/licenses/by/4.0/

The most complete AI ranking of 2026, with 577 active LLMs compared across 13 official benchmarks (GPQA, MMLU-Pro, AIME, HLE, LiveCodeBench, SciCode, IFBench, AA-LCR, Terminal-Bench and Tau²) — covering reasoning, math, coding, speed and cost — plus latency and per-token pricing metrics. Use this ranking to find the best AI models of 2026 by category.

Luis Fernando Roquette · SWEN · methodology described at the bottom of this page · last updated: Jun 01, 2026

Source: Artificial AnalysisView as table →

Use now

General use

Claude Opus 4.8 (Fast)

Anthropic

Coding

GPT-5.5

OpenAI

Cheapest

Step 3.7 Flash

StepFun

Fastest

Mercury 2

Inception

Intelligence Index

Ranking by Artificial Analysis composite score (0–100). Top 30 benchmark models.

Intelligence Index

Claude Opus 4.8 (Fast)Anthropic

61.4 2

Claude Opus 4.8 (Adaptive Reasoning, Max Effort)Anthropic

61.4 3

GPT-5.5OpenAI

60.2 4

Claude Opus 4.7Anthropic

57.3 5

Gemini 3.1 Pro PreviewGoogle

Gemini 3.5 FlashGoogle

Claude Opus 4.6 (Fast)Anthropic

52.9 14

Claude Opus 4.6 (Adaptive Reasoning, Max Effort)Anthropic

52.9 15

Muse SparkMeta

52.2 16

Claude Opus 4.7 (Fast)Anthropic

51.8 17

Qwen3.6 Max PreviewAlibaba

51.8 18

Claude Sonnet 4.6 (Adaptive Reasoning, Max Effort)Anthropic

51.7 19

DeepSeek V4 ProDeepSeek

51.5 20

GLM-5.1 (Non-reasoning)Zhipu AI

GLM-5 (Reasoning)Zhipu AI

49.8 25

Claude Opus 4.5 (Reasoning)Anthropic

Score combines GPQA Diamond, MMLU-Pro, AIME, SciCode, HLE, IFBench, Terminal-Bench and AA-LCR.Fonte: Artificial Analysis

Intelligence over time

Daily progression of the Intelligence Index for the top 8 models.

11 pontos · 30d janela

Claude Opus 4.8 (Fast)· AnthropicClaude Opus 4.8 (Adaptive Reasoning, Max Effort)· AnthropicGPT-5.5· OpenAIClaude Opus 4.7· AnthropicGemini 3.1 Pro Preview· GoogleGPT-5.4· OpenAIQwen3.7 Max· AlibabaQwen3.7 Max· Alibaba

Fonte: Artificial Analysis · SWEN daily snapshots

Coding

Math

Knowledge & reasoning

Performance

Output tokens/second

Ranking by generation speed (tokens/s). Top 20 models.

tokens/s

End-to-End Response Time

Time to first answer token (TTFA). Includes reasoning chain. Lower = better. Top 20.

seconds

For reasoning models (o1, GPT-5, Claude Thinking, DeepSeek R1, etc.), TTFA includes thinking time — can be 10x higher than TTFT.Fonte: Artificial Analysis

Context window

Tokens the model can process. Top 15.

tokens

Grok 4.20 Multi-AgentxAI

Gemini 3.1 Pro Preview Custom ToolsGoogle

1.0M 11

Gemini 3.1 Flash LiteGoogle

1.0M 12

Gemini 3.5 FlashGoogle

1.0M 13

Gemini 2.5 ProGoogle

1.0M 14

Gemini 2.5 Pro Preview 06-05Google

1.0M 15

Lyria 3 Pro PreviewGoogle

1.0M

Cost

Input price — cheapest quality models

Top 25 cheapest models in USD/1M input tokens. Cost-efficiency benchmark.

USD / 1M input

Models with price 0 are free / open-weights self-hosted.Fonte: OpenRouter · provider pricing

Human preference

LMArena Elo

Ranking by human preference in blind side-by-side comparisons.

Elo

Claude Opus 4.6 (Fast)Anthropic

1497 2

GPT-5.2 ChatOpenAI

1477 3

Gemini 3 Flash PreviewGoogle

DeepSeek V3.2 ExpDeepSeek

1423 7

Kimi K2 0711MoonshotAI

1417 8

Claude Sonnet 4.5Anthropic

1399

Fonte: LMArena

Advanced capabilities

Video models

Editorial quality — video models

Subjective score (0–10) based on visual quality, physics, duration and cost. SWEN editorial review.

Score /10

Runway Gen-3 AlphaRunway

8.9 4

Pika 2.1Pika Labs

8.6 5

Hailuo MiniMax Video-01MiniMax

8.4 6

Wan 2.1Alibaba

8.2 7

Luma Dream Machine 1.6Luma AI

8.1 8

Stable Video Diffusion 3DStability AI

7.8

For LLMs we use objective benchmarks. For video there is no industry-standard index yet — this is our curated assessment.Fonte: SWEN editorial review

Explore more

View as table (filters + comparator)GitHub Radar — trending open source Benchmark by model Tools Model profiles Editorial comparisons Tutorials Glossary

Frequently asked questions about the AI ranking

What is the most intelligent AI in the world in 2026?

According to the AA Intelligence Index — a composite index aggregating GPQA Diamond, MMLU-Pro, AIME, HLE and LiveCodeBench — Claude Opus 4.8 (Fast) (Anthropic) leads the ranking in 2026 with a score of 61.4/100, followed by Claude Opus 4.8 (Adaptive Reasoning, Max Effort) (61.4) and GPT-5.5 (60.2). The Intelligence Index is calculated by Artificial Analysis based on independent evaluations and reflects real technical capability in reasoning, math, science and coding. It differs from LMArena ELO, which measures human preference in open conversations. For tasks requiring deep reasoning, code or scientific analysis, models at the top of the Intelligence Index typically perform best. For everyday conversations and creativity, ELO is a more representative guide. See the updated ranking for real-time positions.

What is the difference between ELO and Intelligence Index?

ELO comes from LMArena (Chatbot Arena), where real users compare responses from two anonymized models and pick the best one. It is a measure of subjective human preference — reflecting naturalness, usefulness and perceived quality in everyday conversations. A model with a high ELO may not be the most accurate on technical tasks, but it is what people prefer to use. The AA Intelligence Index, calculated by Artificial Analysis, is objective: it aggregates results from standardized benchmarks such as GPQA Diamond (PhD-level questions), MMLU-Pro (broad academic knowledge), AIME (olympiad math), HLE (frontier scientific knowledge) and LiveCodeBench (programming). The higher the score, the more technical capability the model demonstrated in controlled evaluations. Use ELO to choose a general conversational assistant; use the Intelligence Index to select models for technical or scientific pipelines.

Which AI is best for coding in 2026?

For coding, the most relevant benchmarks are LiveCodeBench — code challenges evaluated with real execution — and the AA Coding Index. In 2026, GPT-5.5 leads the coding ranking (59.1/100), with GPT-5.4 in second and Claude Opus 4.8 (Adaptive Reasoning, Max Effort) in third. The ideal choice depends on context: for code generation via API, cost per token and context window matter as much as accuracy. For interactive IDE development (Cursor, VS Code), latency is critical. For multi-file projects, context windows above 100K tokens are required. See the full table to compare coding models by score, price and speed.

How often is the ranking updated?

The SWEN ranking is updated automatically and continuously from three main sources. Artificial Analysis benchmark data (Intelligence Index, Coding Index, Math Index, inference speed) is synced every 6 hours via automated integration. API pricing — input and output per 1M tokens — is updated daily via OpenRouter, reflecting provider changes in near real time. LMArena ELO (Chatbot Arena) is synced weekly. The page revalidates its cache every 5 minutes via ISR (Incremental Static Regeneration): when a new model enters or a score changes, the ranking updates within 5 minutes without a manual rebuild. The last sync occurred on Jun 01, 2026.

What is the difference between Gemini 3, 3.1 and 3.5?

Google's Gemini 3 family does not follow sequential linear numbering. Google released Gemini 3 Flash, Gemini 3.1 Pro/Flash Lite and Gemini 3.5 Flash — without publishing an official “Gemini 3.2”. Each number denotes a distinct technical generation: 3.1 brought reasoning improvements; 3.5 expanded capability at an intermediate cost. Gemini 3.1 Pro costs $2.00/1M tokens with a 1-million-token context window, positioning itself as an alternative to GPT-4o and Claude 3.7. See the full Gemini 3 family comparison →

What is Google's Gemini Spark?

“Gemini Spark” is a name circulating online that Google has never officially launched as a product. The term appeared in APK teardowns linked to a possible ultra-lightweight version of Gemini for edge devices. Google's confirmed lightweight models are: Gemini Nano (on-device, Pixel 8 Pro/Pixel 9) and Gemini Flash(via API, $0.075/1M tokens). Any prediction about “Gemini Spark” is speculation until official confirmation. Read what is known about Gemini Spark →

Methodology & sources

Artificial Analysis — provides Intelligence Index, Coding Index, Math Index and individual benchmarks (GPQA Diamond, MMLU-Pro, HLE, AIME, MATH-500, LiveCodeBench). Synced every 6h via automated cron.

LMArena — Human preference ELO in blind side-by-side comparisons. Updated weekly.

OpenRouter — provider pricing in USD per 1M tokens. Updated daily.

Historical snapshots — daily score capture at 06:30 UTC to feed temporal evolution charts. Started on Jun 01, 2026.

Benchmarks are indicative — always test on your specific use case before deciding. Performance varies by inference provider (same model, different latency).

Most intelligent

Fastest

Cheapest

Intelligence Index

Intelligence over time

Coding

AA Coding Index

LiveCodeBench

Math

AA Math Index

AIME 2025

MATH-500

Knowledge & reasoning

MMLU-Pro

GPQA Diamond

HLE — Humanity's Last Exam

Performance

Output tokens/second

Time to First Token (TTFT)

End-to-End Response Time

Context window

Cost

Input price — cheapest quality models

Human preference

LMArena Elo

Advanced capabilities

SciCode

IFBench — Instruction Following

AA-LCR — Long Context Reasoning

Terminal-Bench Hard

Tau²-Bench — Tool Use

Video models

Editorial quality — video models

Explore more

Frequently asked questions about the AI ranking

What is the most intelligent AI in the world in 2026?

What is the difference between ELO and Intelligence Index?

Which AI is best for coding in 2026?

How often is the ranking updated?

What is the difference between Gemini 3, 3.1 and 3.5?

What is Google's Gemini Spark?