Cost vs Performance
True intelligence isn't just solving problems—it's solving them efficiently
Bubble size = model size
Higher & left = better efficiency
GPT-4o
Claude 3.5 Sonnet
Gemini 2.0 Pro
DeepSeek-R1
Qwen2.5-72B
Llama-3.3-70B
Mistral Large 2
Arctic-SQL-32B
CodeS-15B
XiYan-SQL-32B
Best Overall
Claude 3.5 Sonnet
75.6% @ $3.00/task
Best Efficiency
CodeS-15B
56.4% @ $0.08/task
Best Budget
Arctic-SQL-32B
70.0% @ $0.25/task