Researcher at the Joint Genome Institute

Tracking the evolution of big data: A timeline

Big data has been the buzz in public-sector circles for just a few years now, but its roots run deep. Here’s a look at key events over the past 30 years that have affected the way data is collected, managed and analyzed, and help explain why big data is such a big deal today.

IBM releases DB2, its latest relational database management system using structure query language (both developed in the 1970s) that would become a mainstay in government.

Object-oriented programming (OOP) languages, such as Eiffel, start to catch on. Although OOP dates to the 1960s, it would over the next decade become the dominant programming language.

Archie, the first tool used for searching on the Internet, is created.

The World Wide Web, using HyperText Transfer Protocol (HTTP) and the HyperText Markup Language (HTML), appears as a publicly available service for sharing information.

Gopher, a TCP/IP application layer protocol for distributing, searching and retrieving documents over the Internet, is released as an alternative to the early World Wide Web. Gopher’s rise leads to two new search programs, Veronica and Jughead.

The W3Catalog, the World Wide Web's first primitive search engine, is released.

Sun releases the Java platform, with the Java language first invented in 1991. It would become one of the most widely used languages in government, particularly in Web applications that will increasingly replace face-to-face and paper transactions.

The Global Positioning System, in the works since 1972, achieves full operational capability.

Michael Cox and David Ellsworth of NASA’s Ames Research Center publish a paper on visualization which they discuss the challenges of working with data sets too large for the computing resources at hand. “We call this the problem of big data,” they write, possibly coining the term in its current context.

Carlo Strozzi develops an open-source relational database and calls it NoSQL. A decade later, a movement to develop NoSQL databases to work with large, unstructured data sets gains momentum.

Google is founded.

Tim Berners-Lee, inventor of the World Wide Web, coins the term “Semantic Web,” a “dream” for machine-to-machine interactions in which computers “become capable of analyzing all the data on the Web.”
Wikipedia is launched.

In wake of the Sept. 11, 2001, attacks, DARPA begins work on its Total Information Awareness System, combining biometrics, language processing, predictive modeling and database technologies in one of many new data-gathering and analysis efforts by agencies.

The amount of digital information created by computers and other data systems in this one year surpasses the amount of information created in all of human history prior to 2003, according to IDC and EMC studies.

Apache Hadoop, destined to become a foundation of government big data efforts, is created.

The National Science Board recommends that NSF create a career path for “a sufficient number of high-quality data scientists” to mange the growing collection of digital information.

The number of devices connected to the Internet exceeds the world’s population.

IBM's Watson scans and analyzes 4 terabytes (200 million pages) of data in seconds to defeat two human players on “Jeopardy!”

Work begins in UnQL, a query language for NoSQL databases.

The Obama administration announces the Big Data Research and Development Initiative, consisting of 84 programs in six departments. The National Science Foundation publishes “Core Techniques and Technologies for Advancing Big Data Science & Engineering.”

IDC and EMC estimate that 2.8 zettabytes of data will be created in 2012 but that only 3 percent of what could be useable for big data is tagged and less is analyzed. The report predicts that the digital world will by 2020 hold 40 zettabytes, 57 times the number of grains of sand of all the beaches in the world.

About the Author

Connect with the GCN staff on Twitter @GCNtech.


  • Records management: Look beyond the NARA mandates

    Pandemic tests electronic records management

    Between the rush enable more virtual collaboration, stalled digitization of archived records and managing records that reside in datasets, records management executives are sorting through new challenges.

  • boy learning at home (Travelpixs/

    Tucson’s community wireless bridges the digital divide

    The city built cell sites at government-owned facilities such as fire departments and libraries that were already connected to Tucson’s existing fiber backbone.

Stay Connected