August 7, 2025

GPT-5 SWE-Bench Coding Performance Hits 74.9% — But Real-World Tests Tell a Different Story

SWE-Bench Verified 74.9%. Aider Polyglot 88%. Multi-file refactoring 91%. Looking at GPT-5’s coding benchmarks alone, you’d think OpenAI just cracked the code on AI-assisted development. But […]