Best AI for Code in 2026Claude vs GPT vs Gemini — Ranking

Which AI codes best in 2026? Ranking of 272 models by SWE-bench, HumanEval and LiveCodeBench. Compare Claude, GPT-4o, Gemini and DeepSeek for code, debugging and function generation.

Synced: June 01, 2026 272 models with coding benchmarks

Use Cases

Code Autocomplete

Inline suggestions as you type. Ideal for IDEs like Cursor and VS Code.

Top models: Gemini 3 Pro Preview (high), Gemini 3 Flash Preview (Reasoning), Gemini 3 Flash Preview

Code Generation

Create functions, classes and full projects from natural language descriptions.

Top models: Gemini 3 Pro Preview (high), Gemini 3 Flash Preview (Reasoning), Gemini 3 Flash Preview

Debug & Code Review

Identify bugs, suggest fixes and review pull requests automatically.

Top models: Gemini 3 Pro Preview (high), Gemini 3 Flash Preview (Reasoning), Gemini 3 Flash Preview

Coding Ranking — Top Models

#ModelCompanyCoding ScoreBenchmarkContextInput PriceOpen Source
🥇Gemini 3 Pro Preview (high)Google
92.0
LiveCodeBench$2.00
🥈Gemini 3 Flash Preview (Reasoning)Google
91.0
LiveCodeBench$0.50
🥉Gemini 3 Flash PreviewGoogle
90.8
LiveCodeBench1.0M tokens$0.50
4DeepSeek V3.2 SpecialeDeepSeek
90.0
LiveCodeBench164K tokens
5GPT-5.2OpenAI
89.0
LiveCodeBench400K tokens$1.75
6GPT-5.2 ChatOpenAI
88.9
LiveCodeBench128K tokens$1.75
7GPT-5.2 ProOpenAI
88.9
LiveCodeBench400K tokens$21.00
8gpt-oss-120bOpenAI
88.0
LiveCodeBench131K tokens$0.15
9Claude Opus 4.5 (Reasoning)Anthropic
87.0
LiveCodeBench$6.25
10GPT-5.1OpenAI
87.0
LiveCodeBench400K tokens$1.25
11GPT-5.1 ChatOpenAI
86.8
LiveCodeBench128K tokens$1.25
12DeepSeek V3.2 Exp (Reasoning)DeepSeek
86.0
LiveCodeBench$0.28
13Gemini 3 Pro Preview (low)Google
86.0
LiveCodeBench$2.00
14o4 MiniOpenAI
86.0
LiveCodeBench200K tokens$1.10
15o4 Mini HighOpenAI
85.9
LiveCodeBench200K tokens$1.10
16Kimi K2 ThinkingKimi
85.0
LiveCodeBench262K tokens$0.60
17GPT-5OpenAI
85.0
LiveCodeBench400K tokens$1.25
18GPT-5.1-CodexOpenAI
85.0
LiveCodeBench400K tokens$1.25
19GPT-5.1-Codex-MaxOpenAI
84.9
LiveCodeBench400K tokens$1.25
20GPT-5 CodexOpenAI
84.0
LiveCodeBench400K tokens$1.25
21GPT-5 MiniOpenAI
84.0
LiveCodeBench400K tokens$0.25
22GPT-5.1-Codex-MiniOpenAI
84.0
LiveCodeBench400K tokens$0.25
23MiniMax: MiniMax M2.7MiniMax
83.0
LiveCodeBench197K tokens$0.30
24ERNIE 5.0 Thinking PreviewBaidu
81.0
LiveCodeBench
25MiniMax-M2MiniMax
81.0
LiveCodeBench205K tokens$0.30
26MiniMax: MiniMax M2.1MiniMax
81.0
LiveCodeBench197K tokens$0.30
27o3OpenAI
81.0
LiveCodeBench200K tokens$2.00
28o3 ProOpenAI
80.8
LiveCodeBench200K tokens$20.00
29Gemini 2.5 ProGoogle
80.1
LiveCodeBench1.0M tokens$1.25
30DeepSeek V3.1 TerminusDeepSeek
80.0
LiveCodeBench164K tokens$0.27
31Gemini 2.5 Pro Preview (Mar' 25)Google
80.0
LiveCodeBench
32Gemini 3 Flash Preview (Non-reasoning)Google
80.0
LiveCodeBench$0.50
33Qwen: Qwen3 235B A22B Instruct 2507Alibaba
79.0
LiveCodeBench262K tokens$0.20
34GPT-5 NanoOpenAI
79.0
LiveCodeBench400K tokens$0.05
35DeepSeek V3.2 ExpDeepSeek
78.9
LiveCodeBench164K tokens$0.27
36DeepSeek V3.2 Exp (Non-reasoning)DeepSeek
78.9
LiveCodeBench$0.28
37Qwen: Qwen3 235B A22B Thinking 2507Alibaba
78.8
LiveCodeBench131K tokens$0.15
38GPT-5.3 ChatOpenAI
78.2
LiveBench Coding128K tokens$1.75
39Qwen3 Next 80B A3B (Reasoning)Alibaba
78.0
LiveCodeBench$0.50
40DeepSeek V3.1DeepSeek
78.0
LiveCodeBench164K tokens$0.56
41gpt-oss-20bOpenAI
78.0
LiveCodeBench131K tokens$0.06
42Gemini 2.5 Pro Preview 06-05Google
77.8
LiveCodeBench1.0M tokens$1.25
43Doubao Seed CodeByteDance Seed
77.0
LiveCodeBench
44Seed-OSS-36B-InstructByteDance Seed
77.0
LiveCodeBench$0.21
45DeepSeek R1 (Jan '25)DeepSeek
77.0
LiveCodeBench$1.68
46Gemini 2.5 Pro Preview (May' 25)Google
77.0
LiveCodeBench$1.25
47K-EXAONE (Reasoning)LG AI
77.0
LiveCodeBench
48Doubao Seed CodeByteDance
76.6
LiveCodeBench
49Claude Sonnet 4.5Anthropic
76.1
LiveBench Coding1.0M tokens$3.00
50KAT-Coder-Pro V1KwaiKAT
75.0
LiveCodeBench$0.30
51EXAONE 4.0 32B (Reasoning)LG AI Research
75.0
LiveCodeBench
52Magistral Medium 1.2Mistral AI
75.0
LiveCodeBench
53Qwen3 VL 32B (Reasoning)Alibaba
74.0
LiveCodeBench$0.70
54Claude Opus 4.5Anthropic
74.0
LiveCodeBench200K tokens$6.25
55Llama Nemotron Super 49B v1.5 (Reasoning)NVIDIA
74.0
LiveCodeBench$0.10
56NVIDIA Nemotron 3 Nano 30B A3B (Reasoning)NVIDIA
74.0
LiveCodeBench$0.06
57Nova 2.0 Pro Preview (medium)Amazon
73.0
LiveCodeBench$1.25
58o3 Mini HighOpenAI
73.0
LiveCodeBench200K tokens$1.10
59GPT-5 ProOpenAI
72.1
LiveBench Coding400K tokens$15.00
60Magistral Small 1.2Mistral
72.0
LiveCodeBench
61NVIDIA Nemotron Nano 9B V2 (Reasoning)NVIDIA
72.0
LiveCodeBench$0.04
62o3 MiniOpenAI
72.0
LiveCodeBench200K tokens$1.10
63Qwen3 30B A3B 2507 (Reasoning)Alibaba
71.0
LiveCodeBench$0.28
64Nova 2.0 Lite (high)Amazon
71.0
LiveCodeBench$0.30
65Claude 4.5 Sonnet (Reasoning)Anthropic
71.0
LiveCodeBench$3.75
66Gemini 2.5 Flash Preview (Sep '25) (Reasoning)Google
71.0
LiveCodeBench
67MiniMax M1 80kMiniMax
71.0
LiveCodeBench$0.55
68Qwen3 VL 30B A3B (Reasoning)Alibaba
70.0
LiveCodeBench$0.20
69Olmo 3.1 32B ThinkAllen Institute for AI
70.0
LiveCodeBench
70Gemini 2.5 Flash Preview (Reasoning)Google
70.0
LiveCodeBench
71NVIDIA Nemotron Nano 9B V2 (Non-reasoning)NVIDIA
70.0
LiveCodeBench131K tokens$0.05
72Cogito v2.1 (Reasoning)Deep Cogito
69.0
LiveCodeBench$1.25
73Gemini 2.5 Flash-Lite Preview (Sep '25) (Reasoning)Google
69.0
LiveCodeBench$0.10
74K2-V2 (medium)MBZUAI Institute of Foundation Models
69.0
LiveCodeBench
75Hermes 4 - Llama-3.1 405B (Reasoning)Nous Research
69.0
LiveCodeBench$1.00
76NVIDIA Nemotron Nano 12B v2 VL (Reasoning)NVIDIA
69.0
LiveCodeBench$0.20
77Deep Cogito: Cogito v2.1 671BDeep Cogito
68.8
LiveCodeBench128K tokens$1.25
78Gemini 3.1 Flash LiteGoogle
68.5
LiveBench Coding1.0M tokens$0.25
79Qwen: Qwen3 Next 80B A3B InstructAlibaba
68.0
LiveCodeBench262K tokens$0.50
80Qwen3 Omni 30B A3B (Reasoning)Alibaba
68.0
LiveCodeBench$0.25
81Ling-1TInclusionAI
68.0
LiveCodeBench
82o1OpenAI
68.0
LiveCodeBench200K tokens$15.00
83o1-previewOpenAI
67.9
LiveCodeBench$16.50
84o1-proOpenAI
67.9
LiveCodeBench200K tokens$150.00
85Olmo 3 32B ThinkAllenAI
67.0
LiveCodeBench66K tokens
86Mistral: Devstral 2 2512Mistral AI
66.8
LiveBench Coding262K tokens$0.40
87Nova 2.0 Omni (medium)Amazon
66.0
LiveCodeBench$0.30
88Claude 4 Sonnet (Reasoning)Anthropic
66.0
LiveCodeBench$3.75
89Mi:dm K 2.5 ProKorea Telecom
66.0
LiveCodeBench
90MiniMax M1 40kMiniMax
66.0
LiveCodeBench
91Arcee AI: Trinity Large ThinkingArcee AI
65.7
LiveBench Coding262K tokens$0.22
92Claude 4.1 Opus (Non-reasoning)Anthropic
65.4
LiveCodeBench$18.75
93Qwen3 Max (Preview)Alibaba
65.0
LiveCodeBench$1.20
94Qwen3 VL 235B A22B (Reasoning)Alibaba
65.0
LiveCodeBench$0.84
95Claude 4.1 Opus (Reasoning)Anthropic
65.0
LiveCodeBench$18.75
96Motif-2-12.7B-ReasoningMotif Technologies
65.0
LiveCodeBench
97Hermes 4 - Llama-3.1 70B (Reasoning)Nous Research
65.0
LiveCodeBench$0.13
98Qwen3 4B 2507 (Reasoning)Alibaba
64.0
LiveCodeBench
99Claude 4 Opus (Reasoning)Anthropic
64.0
LiveCodeBench$18.75
100Gemini 2.5 Flash-Lite Preview (Sep '25) (Non-reasoning)Google
64.0
LiveCodeBench$0.10
101Ring-1TInclusionAI
64.0
LiveCodeBench
102Llama 3.1 Nemotron Ultra 253B v1 (Reasoning)NVIDIA
64.0
LiveCodeBench$0.60
103QwQ 32BAlibaba
63.1
LiveCodeBench$0.66
104Ring-flash-2.0InclusionAI
63.0
LiveCodeBench$0.14
105HyperCLOVA X SEED Think (32B)Naver
63.0
LiveCodeBench
106Qwen3 235B A22B (Reasoning)Alibaba
62.0
LiveCodeBench$0.70
107Olmo 3 7B ThinkAllen Institute for AI
62.0
LiveCodeBench
108Claude 4.5 Haiku (Reasoning)Anthropic
62.0
LiveCodeBench$1.25
109DeepSeek: R1DeepSeek
61.7
LiveCodeBench164K tokens$0.70
110MoonshotAI: Kimi K2 0905MoonshotAI
61.0
LiveCodeBench262K tokens$0.60
111GPT-5.5OpenAI
59.1
AA Coding Index1.1M tokens$5.00
112Qwen: Qwen3 VL 235B A22B InstructAlibaba
59.0
LiveCodeBench262K tokens$0.30
113Qwen3 Coder 480B A35B InstructAlibaba
59.0
LiveCodeBench$0.30
114Nova 2.0 Omni (low)Amazon
59.0
LiveCodeBench$0.30
115Claude 4.5 Sonnet (Non-reasoning)Anthropic
59.0
LiveCodeBench$3.75
116DeepSeek V3.2DeepSeek
59.0
LiveCodeBench131K tokens$0.50
117Gemini 2.5 Flash LiteGoogle
59.0
LiveCodeBench1.0M tokens$0.10
118Gemini 3.1 Pro PreviewGoogle
59.0
SciCode1.0M tokens$2.00
119Ling-flash-2.0InclusionAI
59.0
LiveCodeBench$0.14
120Mi:dm K 2.5 Pro PreviewKorea Telecom
58.0
LiveCodeBench
121o1-miniOpenAI
58.0
LiveCodeBench
122GPT-5.4OpenAI
57.2
AA Coding Index1.1M tokens$2.50
123Anthropic: Claude Opus 4.8 (Fast)Anthropic
56.7
AA Coding Index1.0M tokens$10.00
124Claude Opus 4.8 (Adaptive Reasoning, Max Effort)Anthropic
56.7
AA Coding Index1.0M tokens$6.25
125Kimi K2Moonshot AI
56.0
LiveCodeBench131K tokens$0.58
126GPT-5 (minimal)OpenAI
56.0
LiveCodeBench$1.25
127Qwen3 32B (Reasoning)Alibaba
55.0
LiveCodeBench$0.20
128Claude Opus 4.7Anthropic
55.0
SciCode1.0M tokens$6.25
129Hermes 4 - Llama-3.1 405B (Non-reasoning)Nous Research
55.0
LiveCodeBench$1.00
130GPT-5 mini (minimal)OpenAI
55.0
LiveCodeBench$0.25
131GPT-5.2-CodexOpenAI
55.0
SciCode400K tokens$1.75
132Qwen3 32B (Non-reasoning)Alibaba
54.6
LiveCodeBench$0.15
133GPT-5 ChatOpenAI
54.3
LiveCodeBench128K tokens$1.25
134K2-V2 (high)MBZUAI Institute of Foundation Models
54.1
LiveCodeBench
135Qwen3 Max Thinking (Preview)Alibaba
54.0
LiveCodeBench$1.20
136Claude Opus 4Anthropic
54.0
LiveCodeBench200K tokens$18.75
137MoonshotAI: Kimi K2.6MoonshotAI
54.0
SciCode262K tokens$0.95
138GPT-5 (ChatGPT)OpenAI
54.0
LiveCodeBench$1.25
139Claude Opus 4.7 (Fast)Anthropic
53.1
AA Coding Index1.0M tokens$30.00
140GPT-5.3-CodexOpenAI
53.1
AA Coding Index400K tokens$1.75
141Google: Gemini 3.5 FlashGoogle
53.0
SciCode1.0M tokens$1.50
142Magistral Medium 1Mistral
52.7
LiveCodeBench
143Qwen3 14B (Reasoning)Alibaba
52.0
LiveCodeBench$0.23
144Qwen3 30B A3B 2507 InstructAlibaba
52.0
LiveCodeBench$0.15
145Claude Opus 4.6 (Adaptive Reasoning, Max Effort)Anthropic
52.0
SciCode$6.25
146Exaone 4.0 1.2B (Non-reasoning)LG AI Research
52.0
LiveCodeBench
147Muse SparkMeta
52.0
SciCode
148GPT-5.4 MiniOpenAI
51.5
AA Coding Index400K tokens$0.75
149Magistral Small 1Mistral
51.4
LiveCodeBench
150Qwen: Qwen3 VL 32B InstructAlibaba
51.0
LiveCodeBench131K tokens$0.70
151Qwen3 30B A3B (Reasoning)Alibaba
51.0
LiveCodeBench$0.09
152Claude Haiku 4.5Anthropic
51.0
LiveCodeBench200K tokens$1.25
153DeepSeek R1 0528 Qwen3 8BDeepSeek
51.0
LiveCodeBench
154Claude Sonnet 4.6 (Adaptive Reasoning, Max Effort)Anthropic
50.9
AA Coding Index$3.75
155Qwen3.7 MaxAlibaba
50.1
AA Coding Index$2.50
156DeepSeek V4 ProDeepSeek
50.0
SciCode1.0M tokens$0.43
157Gemini 2.5 FlashGoogle
50.0
LiveCodeBench1.0M tokens$0.30
158GPT-5.5 Instant (May 2026)OpenAI
50.0
SciCode$5.00
159Gemini 3.5 Flash (minimal)Google
49.0
SciCode$1.50
160MoonshotAI: Kimi K2.5MoonshotAI
49.0
SciCode262K tokens$0.60
161Llama 3.1 Nemotron Nano 4B v1.1 (Reasoning)NVIDIA
49.0
LiveCodeBench
162Qwen: Qwen3 30B A3B Thinking 2507Alibaba
48.9
LiveBench Coding131K tokens$0.08
163Claude Opus 4.6 (Fast)Anthropic
48.1
AA Coding Index1.0M tokens$30.00
164Qwen: Qwen3 VL 30B A3B InstructAlibaba
48.0
LiveCodeBench131K tokens$0.20
165GPT-4.1 MiniOpenAI
48.0
LiveCodeBench1.0M tokens$0.40
166Claude Opus 4.6Anthropic
47.6
AA Coding Index1.0M tokens$6.25
167Qwen3 4B (Reasoning)Alibaba
47.0
LiveCodeBench$0.11
168Qwen3.6 Max PreviewAlibaba
47.0
SciCode$1.30
169Claude 3.7 Sonnet (thinking)Anthropic
47.0
LiveCodeBench200K tokens
170Claude Sonnet 4.6Anthropic
47.0
SciCode1.0M tokens$3.75
171Baidu: ERNIE 4.5 300B A47B Baidu
47.0
LiveCodeBench123K tokens$0.28
172EXAONE 4.0 32B (Non-reasoning)LG AI Research
47.0
LiveCodeBench
173Mistral Large 3Mistral
47.0
LiveCodeBench$4.00
174GPT-5 nano (minimal)OpenAI
47.0
LiveCodeBench$0.05
175GPT-5.4 NanoOpenAI
47.0
SciCode400K tokens$0.20
176GPT-4.1OpenAI
46.0
LiveCodeBench1.0M tokens$2.00
177Kwaipilot: KAT-Coder-Pro V2Kwaipilot
45.6
AA Coding Index256K tokens$0.30
178Claude Sonnet 4Anthropic
45.0
LiveCodeBench1.0M tokens$3.75
179DeepSeek V4 FlashDeepSeek
45.0
SciCode1.0M tokens$0.14
180Devstral 2Mistral
45.0
LiveCodeBench
181Claude Sonnet 4.6 (Non-reasoning, Low Effort)Anthropic
44.0
SciCode$3.75
182Gemma 4 31BGoogle
43.0
SciCode262K tokens$0.14
183Ling-mini-2.0InclusionAI
43.0
LiveCodeBench
184MiniMax: MiniMax M2.5MiniMax
43.0
SciCode197K tokens$0.30
185GPT-4o (March 2025, chatgpt-4o-latest)OpenAI
43.0
LiveCodeBench
186Qwen3 Omni 30B A3B InstructAlibaba
42.0
LiveCodeBench$0.25
187Gemini 3.1 Flash Lite PreviewGoogle
42.0
SciCode1.0M tokens$0.25
188Ring-2.6-1TInclusionAI
42.0
SciCode$0.30
189Qwen3 8B (Reasoning)Alibaba
41.0
LiveCodeBench$0.11
190Qwen3.5 Omni PlusAlibaba
41.0
SciCode$0.40
191DeepSeek V3 0324DeepSeek
41.0
LiveCodeBench$1.20
192Gemini 2.5 Flash Preview (Non-reasoning)Google
41.0
LiveCodeBench
193Mistral: Mistral Medium 3.1Mistral AI
41.0
LiveCodeBench131K tokens$0.40
194GPT-5.4 ProOpenAI
41.0
AA Coding Index1.1M tokens$30.00
195Qwen: Qwen3 Coder 30B A3B InstructAlibaba
40.0
LiveCodeBench160K tokens$0.19
196Gemma 4 26B A4B Google
40.0
SciCode262K tokens$0.13
197Llama 4 MaverickMeta
40.0
LiveCodeBench1.0M tokens$0.35
198Mistral: Mistral Medium 3Mistral AI
40.0
LiveCodeBench131K tokens$0.40
199Mistral: Mistral Medium 3.5Mistral AI
40.0
SciCode262K tokens$1.50
200Claude 3.7 SonnetAnthropic
39.0
LiveCodeBench200K tokens$3.75
201Inception: Mercury 2Inception
39.0
SciCode128K tokens$0.25
202Claude 3.5 Sonnet (June '24)Anthropic
38.1
LiveCodeBench$3.75
203Claude 3.5 Sonnet (Oct '24)Anthropic
38.0
LiveCodeBench$3.75
204Command A+Cohere
38.0
SciCode
205DeepSeek R1 Distill Qwen 14BDeepSeek
38.0
LiveCodeBench
206DeepSeek: R1 Distill Qwen 32BDeepSeek
38.0
SciCode128K tokens
207Kimi Linear 48B A3B InstructKimi
38.0
LiveCodeBench
208Mistral: Mistral Small 4Mistral AI
38.0
SciCode262K tokens$0.20
209Qwen3 4B 2507 InstructAlibaba
37.7
LiveCodeBench
210Ling-2.6-1TInclusion AI
37.0
SciCode$0.30
211Qwen2.5 MaxAlibaba
36.0
LiveCodeBench$1.60
212Trinity Large ThinkingArcee AI
36.0
SciCode$0.23
213NVIDIA Nemotron 3 Nano 30B A3B (Non-reasoning)NVIDIA
36.0
LiveCodeBench262K tokens$0.05
214NVIDIA Nemotron 3 Super 120B A12B (Reasoning)NVIDIA
36.0
SciCode1.0M tokens$0.30
215DeepSeek V3DeepSeek
35.9
LiveCodeBench131K tokens$0.23
216Mistral: Ministral 3 14B 2512Mistral AI
35.1
LiveCodeBench262K tokens$0.20
217Qwen3 VL 8B (Reasoning)Alibaba
35.0
LiveCodeBench$0.18
218Gemini 2.0 Pro Experimental (Feb '25)Google
35.0
LiveCodeBench
219Devstral Small 2Mistral
35.0
LiveCodeBench$0.10
220Ministral 3 14BMistral
35.0
LiveCodeBench$0.20
221Nemotron Cascade 2 30B A3BNVIDIA
35.0
SciCode
222NVIDIA Nemotron Nano 12B v2 VL (Non-reasoning)NVIDIA
35.0
LiveCodeBench$0.20
223QwQ 32B-PreviewAlibaba
34.0
LiveCodeBench
224Gemini 2.0 Flash (experimental)Google
34.0
SciCode
225Mistral: Devstral MediumMistral AI
34.0
LiveCodeBench131K tokens$0.40
226Qwen: Qwen3 VL 8B InstructAlibaba
33.0
LiveCodeBench131K tokens$0.18
227Gemini 2.0 FlashGoogle
33.0
LiveCodeBench1.0M tokens$0.15
228Gemini 2.0 Flash Thinking Experimental (Jan '25)Google
33.0
SciCode
229K2 Think V2MBZUAI Institute of Foundation Models
33.0
SciCode
230GPT-4.1 NanoOpenAI
33.0
LiveCodeBench1.0M tokens$0.10
231GPT-4o (2024-08-06)OpenAI
33.0
SciCode128K tokens$2.50
232GPT-4o (ChatGPT)OpenAI
33.0
SciCode
233OpenAI: GPT-4oOpenAI
33.0
SciCode128K tokens$2.50
234OpenAI: GPT-4o (2024-05-13)OpenAI
33.0
LiveCodeBench128K tokens$5.00
235Gemini 2.0 Flash Thinking Experimental (Dec '24)Google
32.1
LiveCodeBench
236Qwen: Qwen3 30B A3B Instruct 2507Alibaba
32.0
LiveCodeBench262K tokens$0.08
237Qwen3 VL 4B (Reasoning)Alibaba
32.0
LiveCodeBench
238Amazon: Nova Premier 1.0Amazon
32.0
LiveCodeBench1.0M tokens$2.50
239Gemini 1.5 Pro (Sep '24)Google
32.0
LiveCodeBench
240GPT-4 TurboOpenAI
32.0
SciCode128K tokens$10.00
241Qwen3 1.7B (Reasoning)Alibaba
31.0
LiveCodeBench$0.11
242Nova 2.0 Omni (Non-reasoning)Amazon
31.0
LiveCodeBench$0.30
243Claude 3.5 HaikuAnthropic
31.0
LiveCodeBench200K tokens$1.00
244R1 Distill Llama 70BDeepSeek
31.0
SciCode131K tokens$0.70
245Llama 3.1 Instruct 405BMeta
31.0
LiveCodeBench$2.75
246GPT-4o (2024-11-20)OpenAI
31.0
LiveCodeBench128K tokens$2.50
247Mistral: Ministral 3 8B 2512Mistral AI
30.3
LiveCodeBench262K tokens$0.15
248Qwen2.5 Coder 32B InstructAlibaba
30.0
LiveCodeBench33K tokens
249Llama 3.1 Tulu3 405BAllen Institute for AI
30.0
SciCode
250Llama 4 ScoutMeta
30.0
LiveCodeBench10.0M tokens$0.17
251Ministral 3 8BMistral
30.0
LiveCodeBench$0.15
252GPT-4 Turbo PreviewOpenAI
29.1
LiveCodeBench128K tokens$10.00
253OpenAI: GPT-4 Turbo (older v1106)OpenAI
29.1
LiveCodeBench128K tokens$10.00
254Qwen3 VL 4B InstructAlibaba
29.0
LiveCodeBench
255JT-35B-FlashChina Mobile
29.0
SciCode
256Llama 3.3 70B InstructMeta
29.0
LiveCodeBench131K tokens$0.58
257Mistral Large 2 (Nov '24)Mistral
29.0
LiveCodeBench$2.00
258Mistral: Pixtral Large 2411Mistral AI
29.0
SciCode131K tokens$2.00
259Llama Nemotron Super 49B v1.5 (Non-reasoning)NVIDIA
29.0
LiveCodeBench$0.10
260Qwen2.5 72B InstructAlibaba
28.0
LiveCodeBench33K tokens$0.36
261Qwen3 14B (Non-reasoning)Alibaba
28.0
LiveCodeBench$0.23
262Claude 3 OpusAnthropic
28.0
LiveCodeBench$18.75
263EXAONE 4.5 33BLG AI
28.0
SciCode
264LongCat Flash LiteLongCat
28.0
SciCode
265Mistral Small 3.2Mistral
28.0
LiveCodeBench$0.09
266Hermes 4 - Llama-3.1 70B (Non-reasoning)Nous Research
28.0
SciCode$0.13
267Llama 3.3 Nemotron Super 49B v1 (Non-reasoning)NVIDIA
28.0
LiveCodeBench
268Llama 3.3 Nemotron Super 49B v1 (Reasoning)NVIDIA
28.0
LiveCodeBench
269Nemotron 3 Nano Omni 30B A3B ReasoningNVIDIA
28.0
SciCode$0.07
270Mistral: Mistral Small 3.2 24BMistral AI
27.5
LiveCodeBench128K tokens$0.07
271JT-MINIChina Mobile
27.0
SciCode
272Gemini 1.5 Flash (Sep '24)Google
27.0
SciCode

+ 228 models without coding benchmarks available.View all models

Complete Guide: AI for Programming in 2026

The State of AI for Code in 2026

Artificial intelligence has fundamentally changed software development. In 2026, large language models (LLMs) can generate working code in dozens of languages, fix bugs in production codebases, and build complete applications from plain-English descriptions. SWE-bench — the most rigorous coding benchmark — evaluates models on real software engineering tasks pulled from GitHub issues.

SWE-bench: The Gold Standard

SWE-bench (Software Engineering Benchmark) is widely considered the gold standard for evaluating LLM coding ability. Unlike academic benchmarks like HumanEval (which tests isolated functions), SWE-bench presents real issues from popular repositories such as Django, Flask, scikit-learn, and requests. The model must understand the project context, locate the relevant files, and generate a patch that resolves the bug — mirroring the actual workflow of a professional developer.

The “Verified” variant (SWE-bench Verified) is curated by human engineers to ensure every task has a clear, verifiable solution. Scores on this benchmark correlate strongly with real-world coding performance, making it the single most informative metric when choosing an AI coding assistant.

HumanEval and LiveCodeBench

HumanEval, created by OpenAI, tests a model's ability to generate Python functions from docstrings. It is simpler than SWE-bench but useful for gauging basic code fluency. LiveCodeBench raises the bar by using problems that are refreshed regularly, reducing the risk of data contamination — a concern when a model may have seen the answers during training.

How to Choose the Best AI Model for Code

The right model depends on your specific use case. For real-time code autocomplete (Cursor, Copilot), speed and latency matter more than peak benchmark scores — lighter models like GPT-4o-mini and Claude Haiku deliver an excellent speed-to-quality ratio. For full project generation or complex debugging, frontier models like Claude Opus, GPT-4o, and Gemini Ultra are better suited, despite higher costs.

Teams with strict data control requirements (compliance, security) should consider open-source models like DeepSeek Coder, Code Llama, and StarCoder, which can be deployed on-premises with competitive performance. The trade-off between proprietary and open-source involves cost, latency, privacy, and quality considerations.

AI-Powered Coding Tools

The leading AI-assisted development tools in 2026 include Cursor (a full IDE with Claude and GPT support), GitHub Copilot (a VS Code extension powered by OpenAI models), Windsurf (formerly Codeium, focused on accessibility), and Amazon CodeWhisperer (integrated with the AWS ecosystem). Each tool uses different models under the hood, and the quality of generated code depends directly on the LLM powering it.

Trends for 2026 and Beyond

The most significant trends in AI for code include autonomous software engineering agents (that solve complex tasks without supervision), automated test generation, intelligent refactoring, and native CI/CD pipeline integration. The frontier is shifting from “code assistant” to “autonomous engineer”, with models increasingly capable of navigating large codebases and making architectural decisions.

Frequently Asked Questions

What is the best AI for coding?

In 2026, the top models on coding benchmarks are Gemini 3 Pro Preview (high), Gemini 3 Flash Preview (Reasoning), Gemini 3 Flash Preview. The best choice depends on your use case: code autocomplete, full project generation, debugging, or code review.

ChatGPT or Claude for code?

Both are excellent for programming. Claude tends to perform better with long contexts (large codebases) and complex instructions. GPT excels at rapid generation and inline edits. Test both for your specific workflow.

What is SWE-bench?

SWE-bench (Software Engineering Benchmark) evaluates how well models can resolve real issues from open-source GitHub repositories. It is considered the most realistic coding benchmark because it tests bug resolution in real projects, not academic exercises.

Which free LLMs are good for coding?

Open-source models like DeepSeek Coder, Qwen Coder, and Code Llama offer excellent coding performance with no API cost. You can run them locally with Ollama or access them for free on platforms like Together AI and Groq.

What coding benchmarks matter most?

SWE-bench Verified is the gold standard for real-world coding ability. HumanEval tests basic function generation, while LiveCodeBench uses regularly updated problems to reduce data contamination. For a complete picture, look at all three.

Explore Other Categories