Docker Model Runner Brings vLLM to macOS with Apple Silicon

Docker Blog

by Yiwen Xu

February 26, 2026

AI-Generated Deep Dive Summary

Docker Model Runner has introduced a groundbreaking update with the release of **vllm-metal**, a new backend that enables high-performance LLM inference on macOS devices powered by Apple Silicon. This development marks a significant milestone for developers working with large language models (LLMs), as it brings vLLM, a popular inference engine known for its high-throughput capabilities, to macOS users for the first time. By leveraging Apple's Metal GPU framework and unified memory architecture, **vllm-metal** delivers efficient and scalable LLM serving directly through Docker workflows. The integration of **vllm-metal** is seamless, as it plugs into vLLM’s existing engine, scheduler, and OpenAI-compatible API server. This means developers can use the same APIs they are familiar with—such as those for OpenAI and Anthropic—to run models like Claude Code on macOS. The plugin architecture ensures compatibility while handling the specifics of Apple Silicon hardware

Verticals

devopscontainers

Originally published on Docker Blog on 2/26/2026