4 stages of the big data process

The phrase "garbage in, garbage out" is appearing with increasing frequency in discussion of big data – and with good reason. Thomas Redman, the self-styled Data Doc, has been posting some good articles on All Analytics about what he calls the "D4 process" that characterizes four increasingly difficult stages of the big data development for governments and corporations.

1. Data acquisition.  You can acquire potentially interesting data with little regard for quality. On the flip side, however, if you don’t ensure quality at this step, you make later steps increasingly difficult. Even machine created and harvested data is not immune from errors.

2. Discovery. It is more difficult to discover something truly interesting when the data is bad. Additionally, it's possible to draw invalid conclusions from relatively scrubbed data if error patterns aren't understood.

3. Delivery.  It is even more difficult to get someone to use the results when they don’t trust the data. The fruit of big data applications is in the "small data" or the "connected dots. One small data quality issue or missed connection between data sets will destroy the credibility of the project.

4. Dollars. While it is nearly impossible to make money from poor-quality data, the government can't deliver poor-quality data, period.

About the Author

Connect with the GCN staff on Twitter @GCNtech.


  • Records management: Look beyond the NARA mandates

    Pandemic tests electronic records management

    Between the rush enable more virtual collaboration, stalled digitization of archived records and managing records that reside in datasets, records management executives are sorting through new challenges.

  • boy learning at home (Travelpixs/Shutterstock.com)

    Tucson’s community wireless bridges the digital divide

    The city built cell sites at government-owned facilities such as fire departments and libraries that were already connected to Tucson’s existing fiber backbone.

Stay Connected