Linear Representations and Superposition

Hacker News
February 15, 2026
AI-Generated Deep Dive Summary
The article explores two key concepts in the field of mechanistic interpretability: the Linear Representation Hypothesis (LRH) and superposition, which aim to uncover how large language models (LLMs) operate internally. The LRH posits that certain linguistic or contextual concepts can be represented as vectors within the embedding space of these models. This hypothesis is supported by studies showing that differences in word embeddings correspond to specific directional vectors, such as gender or tense changes. For instance, the difference between "king" and "queen" can be approximated through a linear transformation, indicating a structured representation of such concepts. Park et al.'s work builds on this by formalizing LRH within simplified LLM models, demonstrating that embedding and unembedding spaces share isomorphic representations. This means that interventions in one space directly influence the other, allowing for unified concept analysis across both input and output layers. Their research validates LRH with empirical evidence from models like Llama 2, showing that concepts like tense or language can be captured through linear transformations. Superposition complements this by suggesting that multiple concepts combine additively within these embedding spaces. This property implies that the influence of one concept (e.g., gender) does not interfere with another (e.g., language), maintaining their independence. However, ensuring orthogonality in representations
Verticals
techstartups
Originally published on Hacker News on 2/15/2026