September 2, 2025

Claude Sonnet 4.5 Benchmark Deep Dive: 77.2% SWE-bench Crushes GPT-5 and Gemini

77.2% on SWE-bench Verified. That single number just rewrote the rules of the AI coding model market. Anthropic’s Claude Sonnet 4.5 benchmark results don’t just represent […]