Inception Labs says its diffusion LLM is 10x faster than Claude, ChatGPT, Gemini
The New Stack
by TNS StaffMarch 2, 2026
AI-Generated Deep Dive Summary
Inception Labs has introduced Mercury 2, a groundbreaking large language model (LLM) that leverages diffusion technology, marking a significant shift from the traditional autoregressive approach used by major AI players like Claude, ChatGPT, and Gemini. Unlike conventional models that generate text sequentially, one token at a time, Mercury 2 starts with a rough draft and refines it in parallel, akin to how image models like Stable Diffusion create images from noise. This method enables Mercury 2 to produce over 1,000 tokens per second—five to ten times faster than optimized models from OpenAI, Anthropic, and Google, according to Inception Labs' testing.
The key innovation lies in the parallel computation capabilities of diffusion models, which align perfectly with GPU architecture. This efficiency is further enhanced by Nvidia's investment in optimizing Mercury 2's serving engine. While Mercury 2 currently matches the quality of Claude Haiku and Google Flash-class models, it falls short of competing with higher-tier models like Claude Opus or OpenAI’s GPT-4. However, Inception Labs CEO Stefano Ermon argues that the economic advantages of diffusion models will become more apparent as they scale, particularly in reinforcement learning scenarios where inference bottlenecks are minimized.
For developers and cloud professionals, Mercury 2's speed is a game-changer. It reduces latency, making it ideal for real-time applications and improving user experience. The model’s efficiency also translates to cost savings, as fewer resources are needed to achieve the same performance levels. With AWS Bedrock integration on the horizon, Mercury 2 aims to further simplify adoption in cloud environments.
Inception Labs’ approach highlights a potential paradigm shift in AI development. By focusing on parallel computation and GPU optimization, diffusion models like Mercury 2 could redefine how AI applications are built and deployed, offering developers faster, more scalable solutions. As the field of generative AI continues to evolve, Inception’s advancements in diffusion technology may set a new standard for performance and efficiency in cloud-based AI systems.
Verticals
devopscloud
Originally published on The New Stack on 3/2/2026