Read Locks Are Not Your Friends

Hacker News

February 22, 2026

AI-Generated Deep Dive Summary

**Read Locks Can Be a Hidden Bottleneck in High-Performance Systems** In an unexpected twist for developers seeking optimized performance in multi-threaded environments, read locks (RwLock) have been found to underperform compared to write locks (Mutex) in certain scenarios. This revelation comes from a detailed experiment conducted on commercial hardware, where a read-heavy cache workload using Rust's parking_lot::RwLock showed that RwLock was approximately 5× slower than Mutex due to atomic contention and cache-line ping-pong issues. The study highlights how even seemingly optimal choices can lead to unintended consequences in modern multi-core architectures. The experiment was conducted on an Apple Silicon M4 chip with 10 cores and 16GB of RAM, running Rust 1.92.0. The goal was to maximize the throughput of a Least Recently Used (LRU) tensor cache's get() operation. Initially, it was assumed that using read locks would allow multiple threads to access data simultaneously, improving performance. However, the results were surprising: write locks consistently outperformed read locks by around 5×. This counterintuitive outcome underscores the importance of understanding hardware-level interactions when choosing concurrency primitives. The root cause lies in the hardware's cache-line management mechanisms. When multiple cores attempt to increment an internal atomic counter (used to track active readers), a phenomenon called "cache line ping-pong" occurs. Each core vies for exclusive ownership of the cache line containing this counter, leading to frequent flushes and reloads from higher-level caches or memory. These operations introduce significant overhead, especially in fast-paced workloads where the time

Verticals

techstartups

Originally published on Hacker News on 2/22/2026