Apple M5 Chip: 6 Performance Estimates Before WWDC 2025 Changes Everything

June 3, 2025

Summer NAMM 2025: 7 Best Budget Gear Announcements Under $300

June 4, 2025

WWDC 2025 Core ML Gets MLTensor, Foundation Models Framework Gives Developers a Free 3B On-Device LLM — Here’s Everything That Changed

Published by Sean Kim on June 4, 2025

Foundation Models Framework: Apple Intelligence in Three Lines of Code

The headline feature of WWDC 2025’s machine learning announcements is the Foundation Models framework. For the first time, developers get direct programmatic access to the same on-device large language model that powers Apple Intelligence — and the barrier to entry is remarkably low.

import FoundationModels
let session = LanguageModelSession()
let response = try await session.sendMessage("Summarize this article")

That’s not a simplified example — it’s the actual implementation. The LanguageModelSession API handles model loading, inference optimization, and memory management automatically. No API keys to manage, no usage limits to track, no monthly bills to worry about.

The 3B Model Under the Hood

According to Apple’s machine learning research team, the on-device model packs approximately 3 billion parameters compressed to just 2 bits per weight using Quantization-Aware Training (QAT). The architecture features a 5:3 depth ratio design where key-value caches are shared between blocks, reducing memory usage by 37.5% compared to traditional implementations.

Context length: Up to 65,000 tokens — enough for substantial document processing
Language support: 15 languages out of the box
KV cache: 8-bit quantization for efficient memory use
Performance: Competitive with Qwen-2.5-3B across all supported languages, matching some 4B models in English

Guided Generation and @Generable: Type-Safe AI Outputs

One of the most developer-friendly innovations is guided generation. Instead of parsing JSON strings and hoping the model follows your schema, developers can now use the @Generable macro to define structured outputs directly in Swift types. The framework customizes model decoding at the token level to prevent structural errors — no JSON schema handling needed.

The companion @Guide macro adds natural language annotations to properties, giving the model precise instructions on how to populate each field. Combined with streaming support, developers can build responsive UIs that progressively display generated content.

Tool Calling: Extending the Model’s Reach

The Foundation Models framework includes a Swift protocol-based tool calling system. Developers define tools that the model can invoke to access live data — weather, calendar events, app-specific databases — and the framework automatically handles parallel and serial invocations with source citation for fact-checking.

Practical use cases are compelling: an education app generating personalized quizzes from a user’s notes without any cloud API costs, or an outdoors app adding natural language search that works offline in remote areas.

WWDC 2025 Core ML Upgrades: MLTensor, Model Functions, and Stateful Inference

While the Foundation Models framework grabbed headlines, the Core ML updates at WWDC 2025 are equally significant for developers working with custom models. Three changes stand out.

MLTensor: A Familiar API for Array Operations

The new MLTensor type provides an efficient API for multi-dimensional array operations executed directly on Apple silicon. If you’ve used PyTorch tensors or NumPy arrays, the interface will feel immediately familiar — but with automatic hardware optimization across CPU, GPU, and Neural Engine. MLTensor supports common math and transformation operations typical of machine learning workflows, and these operations leverage Apple silicon’s compute capabilities for high-performance execution without requiring developers to manually target specific hardware.

Multi-Function Models

Core ML models can now hold multiple functions within a single model file. This is critical for LLM deployment — you might have a base model function, multiple LoRA adapter functions, and specialized task functions all packaged together. The runtime manages resource allocation and switching between functions efficiently.

Stateful Inference: The LLM Game Changer

Previously, Core ML treated every model as a pure stateless function: inputs in, outputs out. For LLM token generation, this meant copying large KV cache states back and forth between the model and calling code on every iteration — a significant performance bottleneck.

The new state management system lets models maintain internal state across inference iterations. For LLM workloads, this eliminates the copy overhead entirely, resulting in noticeably faster token generation. Combined with Core ML Tools’ improved compression techniques — more granular and composable weight compression for LLMs and diffusion models — running sophisticated models on-device becomes genuinely practical.

WWDC 2025 Core ML Xcode 26 coding tools with AI assistant — Xcode 26 coding tools with built-in AI assistant (Source: Apple)

MLX: Apple’s Open-Source ML Framework Grows Up

The MLX framework, Apple’s open-source machine learning library designed specifically for Apple silicon, received significant attention at WWDC 2025. MLX exploits the unified memory architecture of Apple silicon — CPU and GPU share physical memory, eliminating the copy overhead that plagues traditional GPU computing.

Languages: Python, Swift, C++, and C with community bindings
Fine-tuning: Train and fine-tune models directly on Apple silicon with full memory sharing
Distributed learning: Scale training across multiple Apple silicon devices
Model ecosystem: Hundreds of frontier models available on the Hugging Face MLX community

For developers who want more control than the Foundation Models framework offers — or who need to run custom architectures — MLX bridges the gap between cloud-based training and on-device deployment. You can fine-tune a model on your Mac, convert it with Core ML Tools, and deploy it on iPhone, all within Apple’s ecosystem.

Beyond Core ML: The Broader ML API Ecosystem

WWDC 2025 also brought updates to Apple’s higher-level ML APIs that many developers use without even thinking about the underlying models. For developers who need finer-grained control, Apple also introduced the BNNS Graph Builder — a new API for creating operation graphs optimized for real-time CPU execution with strict latency and memory constraints. This is particularly relevant for audio processing apps and real-time signal pipelines where GPU scheduling overhead is unacceptable.

SpeechAnalyzer: Next-Generation Speech-to-Text

The new SpeechAnalyzer API replaces the aging SFSpeechRecognizer with a faster, more flexible speech-to-text model optimized for long-form and distant audio — lectures, meetings, conversations. It runs entirely on-device with minimal code, making it practical for apps that process extended audio sessions.

Vision Framework: Document Recognition and More

The Vision framework now includes document recognition that groups different document structures for easier processing, plus lens smudge detection for camera-based apps. With over 30 APIs for image analysis, the framework covers everything from text recognition to body pose estimation.

Xcode 26: AI-Powered Development

Xcode 26 integrates AI directly into the development workflow. The AI Assistant provides full method stub completion, SwiftUI view generation, test suggestions for edge-case detection, and auto-generated inline documentation. Developers can also use ChatGPT directly in the sidebar — or run local models on Apple silicon Macs for completely private coding assistance.

The Privacy Equation: Why On-Device Matters for Developers

Every ML announcement at WWDC 2025 reinforced a single architectural philosophy: compute on the device whenever possible. For developers, this translates into concrete advantages beyond privacy.

Zero marginal cost: Foundation Models framework inference is free — no per-token charges, no rate limits, no surprise bills
Offline capability: Apps work in airplane mode, remote areas, or spotty network conditions
Latency: No network round-trips mean faster perceived response times
Regulatory compliance: User data never leaves the device, simplifying GDPR, HIPAA, and similar compliance requirements
User trust: “Processed on your device” is increasingly a competitive feature, not just a privacy checkbox

Apple’s server-side model — using a novel Parallel-Track Mixture-of-Experts architecture trained on 14 trillion text tokens — handles tasks that exceed on-device capabilities through Private Cloud Compute, but the clear direction is to push as much intelligence to the edge as possible.

What This Means for Your Next Project

If you’re building for Apple platforms, WWDC 2025 fundamentally expands what’s possible without a cloud backend. The Foundation Models framework alone eliminates the biggest barrier to ML adoption — cost and complexity. Core ML’s stateful inference and MLTensor make custom model deployment practical. And MLX gives researchers and advanced developers a genuine alternative to cloud-based training.

The developers who move fastest on these APIs will have a significant head start. An education app with offline AI tutoring, a health app with private on-device analysis, a creative tool with responsive AI assistance — these are no longer aspirational ideas. With WWDC 2025’s toolkit, they’re weekend projects.

For audio developers specifically, the combination of SpeechAnalyzer’s improved distant-audio recognition, BNNS Graph’s real-time processing capabilities, and the Foundation Models framework’s text generation opens up workflows that previously required multiple cloud services. Imagine a DAW plugin that transcribes vocal takes, generates lyric suggestions, and tags sessions with AI-powered metadata — all running locally on an M-series Mac with zero latency and zero recurring costs.

Apple’s WWDC 2025 ML stack isn’t revolutionary in any single component — competitors offer larger models, more flexible APIs, and broader hardware support. But the integration story is uniquely compelling. From Xcode’s AI assistant to Foundation Models to Core ML to Metal, the entire pipeline lives within one ecosystem, one memory architecture, and one privacy model. For developers already invested in Apple platforms, that coherence is the real advantage.

Building automation pipelines or integrating AI into your development workflow? Sean Kim has been shipping AI-powered production systems since before it was trendy.

Get Tech Consultation →

Visit imseankim.com

Get weekly AI, music, and tech trends delivered to your inbox.

Sean Kim

Comments are closed.

Apple M5 Chip: 6 Performance Estimates Before WWDC 2025 Changes Everything

Summer NAMM 2025: 7 Best Budget Gear Announcements Under $300

Apple M5 Chip: 6 Performance Estimates Before WWDC 2025 Changes Everything

Summer NAMM 2025: 7 Best Budget Gear Announcements Under $300

Foundation Models Framework: Apple Intelligence in Three Lines of Code

The 3B Model Under the Hood

Guided Generation and @Generable: Type-Safe AI Outputs

Tool Calling: Extending the Model’s Reach

WWDC 2025 Core ML Upgrades: MLTensor, Model Functions, and Stateful Inference

MLTensor: A Familiar API for Array Operations

Multi-Function Models

Stateful Inference: The LLM Game Changer

MLX: Apple’s Open-Source ML Framework Grows Up

Beyond Core ML: The Broader ML API Ecosystem

SpeechAnalyzer: Next-Generation Speech-to-Text

Vision Framework: Document Recognition and More

Xcode 26: AI-Powered Development

The Privacy Equation: Why On-Device Matters for Developers

What This Means for Your Next Project

GitHub Copilot Data Training Starts April 24: 7 Reasons to Opt Out Now Before Your Private Code Feeds AI Models

Unreal Engine 5.7 PCG Framework at GDC 2026: Build Massive Procedural Worlds 2x Faster with Zero Code

TypeScript 6.0: The Last JavaScript-Based Release Before the Go Rewrite — What Changes, What Dies, and How to Migrate