Words with Spaces - Linguabase

Hacker News
February 23, 2026
AI-Generated Deep Dive Summary
English has hundreds of thousands of compound phrases that name things—beyond mere descriptions. For example, "boiling water" isn’t just a descriptor; it’s a hazard, a cooking stage, or a state of matter. Yet traditional dictionaries like Merriam-Webster and Oxford cover only about 3% of these multi-word expressions (MWEs), which often contain spaces and are overlooked due to their perceived obscurity. The article highlights that while single words form the foundation of language, compound phrases add conceptual weight. For instance, "hot dog" refers to a food item, not an animal, and "red tape" signifies bureaucracy, not adhesive. Despite their significance, dictionaries focus on individual words, leaving most MWEs unaddressed. Even Wiktionary, a crowd-sourced resource with extensive coverage, only includes about 30% of these phrases. The article explores the vast combinatorial nature of language, estimating over 250 billion possible two-word combinations in English. While many are nonsensical, around 15% are plausible, such as "wooden chair" or "morning coffee." Among these, some crystallize into meaningful expressions that carry conceptual weight. For example, "climate change" is a complex issue, while "piece of cake" conveys ease. The implications for language understanding and technology are significant. Compound phrases represent a deep reservoir of real MWEs that could enhance AI language models, word games, and linguistic tools. However, their underrepresentation in traditional dictionaries limits accessibility and utility. The article underscores the need for better coverage of these expressions to enrich language learning, communication, and creative applications. In conclusion, while dictionaries focus on single words, compound phrases offer a vast, untapped resource for understanding language’s complexity and nuance. This gap in dictionary coverage not only affects linguistic knowledge but also limits the potential for tech innovations reliant on language data. Addressing this gap could unlock new possibilities for language technology and applications, making it a critical area of exploration for both linguists and technologists.
Verticals
techstartups
Originally published on Hacker News on 2/23/2026