The Internet Archive records its 1 trillionth website

Popular Science
by Andrew Paul
February 20, 2026
AI-Generated Deep Dive Summary
The Internet Archive records its 1 trillionth website
The Internet Archive, a nonprofit digital library dedicated to preserving the web's history, has reached a monumental milestone: saving its 1 trillionth webpage. This achievement marks a significant step in digital conservation, particularly as the internet becomes increasingly fleeting and challenging to navigate. Since its inception in 1996, the organization has relied on web crawlers and volunteer contributions to archive publicly accessible websites, texts, music, and other media. Its vast collection now includes over 866 billion webpages, 41 million texts, and millions of other digital assets, totaling an estimated 100,000 terabytes of data—equivalent to the storage capacity of 50,000 high-end iPhones. The importance of such preservation is underscored by the internet's inherent impermanence. For instance, the accidental deletion of user-uploaded content from MySpace in 2019 highlights the vulnerability of digital content. The Internet Archive aims to prevent such losses, creating a "permanent record" of the web's evolution. This effort ensures that even as tech companies and AI systems consume vast amounts of online data, there remains an accessible repository for researchers, journalists, and the public to explore historical and cultural artifacts. However, theArchive faces challenges in maintaining its mission. As large language model AI systems demand more datasets, many media outlets are restricting access to their content to protect intellectual property. While this decision is understandable from a business perspective, it poses risks to preserving the internet's delicate information ecosystem. Balancing these pressures while securing future growth will be crucial for the Archive to continue its vital work and reach milestones like its 2 trillionth preservation. In an era where digital content is often ephemeral and contested, the Internet Archive stands as a critical resource for understanding our collective digital past—and present. Its achievements highlight not only technological progress but also the importance of collaboration between archivists, media companies, and tech innovators to ensure that humanity's digital heritage remains accessible for generations to come.
Verticals
sciencetech
Originally published on Popular Science on 2/20/2026