AnalyticsBench

v1.0 • 2,847 tasks • 95 databases

Cost vs Performance

True intelligence isn't just solving problems—it's solving them efficiently

Bubble size = model size

Higher & left = better efficiency

GPT-4o
Claude 3.5 Sonnet
Gemini 2.0 Pro
DeepSeek-R1
Qwen2.5-72B
Llama-3.3-70B
Mistral Large 2
Arctic-SQL-32B
CodeS-15B
XiYan-SQL-32B

Best Overall

Claude 3.5 Sonnet

75.6% @ $3.00/task

Best Efficiency

CodeS-15B

56.4% @ $0.08/task

Best Budget

Arctic-SQL-32B

70.0% @ $0.25/task