GPQA Diamond 93.8%. Humanity’s Last Exam 41.0%. ARC-AGI-2 45.1%. These are the numbers Google’s Gemini 3 Pro Preview Deep Think mode posted — and accessing them […]
Every category on LMArena — swept. A math-olympiad-level reasoning mode that thinks in parallel. A lightweight model that uses 30% fewer tokens while actually getting better. […]