4 stages of the big data process

The phrase "garbage in, garbage out" is appearing with increasing frequency in discussion of big data – and with good reason. Thomas Redman, the self-styled Data Doc, has been posting some good articles on All Analytics about what he calls the "D4 process" that characterizes four increasingly difficult stages of the big data development for governments and corporations.

1. Data acquisition.  You can acquire potentially interesting data with little regard for quality. On the flip side, however, if you don’t ensure quality at this step, you make later steps increasingly difficult. Even machine created and harvested data is not immune from errors.

2. Discovery. It is more difficult to discover something truly interesting when the data is bad. Additionally, it's possible to draw invalid conclusions from relatively scrubbed data if error patterns aren't understood.

3. Delivery.  It is even more difficult to get someone to use the results when they don’t trust the data. The fruit of big data applications is in the "small data" or the "connected dots. One small data quality issue or missed connection between data sets will destroy the credibility of the project.

4. Dollars. While it is nearly impossible to make money from poor-quality data, the government can't deliver poor-quality data, period.

About the Author

Connect with the GCN staff on Twitter @GCNtech.


  • Russia prying into state, local networks

    A Russian state-sponsored advanced persistent threat actor targeting state, local, territorial and tribal government networks exfiltrated data from at least two victims.

  • Marines on patrol (US Marines)

    Using AVs to tell friend from foe

    The Defense Advanced Research Projects Agency is looking for ways autonomous vehicles can make it easier for commanders to detect and track threats among civilians in complex urban environments without escalating tensions.

Stay Connected