Docker Model Runner Brings vLLM to macOS with Apple Silicon
Docker Blog
by Yiwen XuFebruary 26, 2026
AI-Generated Deep Dive Summary
Docker Model Runner has introduced a groundbreaking update with the release of **vllm-metal**, a new backend that enables high-performance LLM inference on macOS devices powered by Apple Silicon. This development marks a significant milestone for developers working with large language models (LLMs), as it brings vLLM, a popular inference engine known for its high-throughput capabilities, to macOS users for the first time. By leveraging Apple's Metal GPU framework and unified memory architecture, **vllm-metal** delivers efficient and scalable LLM serving directly through Docker workflows.
The integration of **vllm-metal** is seamless, as it plugs into vLLM’s existing engine, scheduler, and OpenAI-compatible API server. This means developers can use the same APIs they are familiar with—such as those for OpenAI and Anthropic—to run models like Claude Code on macOS. The plugin architecture ensures compatibility while handling the specifics of Apple Silicon hardware
Verticals
devopscontainers
Originally published on Docker Blog on 2/26/2026