llm-benchmark - Sean Kim — Arts and Tech

October 29, 2025

Published by Sean Kim on October 29, 2025

Categories

Tech & Hardware

M4 Max AI Inference Benchmarks: 20 tok/s on Llama 70B Changes Everything for Local AI

Running a 70-billion-parameter language model on a laptop at 20 tokens per second — no cloud, no GPU server rack, no $10,000 NVIDIA card. That’s what […]

August 8, 2025

Published by Sean Kim on August 8, 2025

Categories

AI Tools & Services

Claude Opus 4.1: Anthropic’s Sharpest Coding Model Scores 74.5% on SWE-bench

Claude Opus 4.1 just dropped three days ago, and the benchmark numbers are telling a story that every developer building on AI should pay attention to […]

June 6, 2025

Published by Sean Kim on June 6, 2025

Categories

AI Tools & Services

Claude 3.5 Sonnet Agentic Coding: How 49% on SWE-bench Rewrote the Rules for AI Developer Tools

A year ago, most developers treated AI coding assistants as glorified autocomplete. Then Anthropic dropped a model that scored 49% on SWE-bench Verified — solving nearly […]

May 28, 2025

Published by Sean Kim on May 28, 2025

Categories

AI Tools & Services

Databricks DBRX: How 36B Active Parameters Beat 70B Dense Models — One Year Later

A 132B parameter model activates just 36B parameters at inference — and still outperforms models nearly twice its active size. That is not a theoretical claim. […]