4 stages of the big data process

The phrase "garbage in, garbage out" is appearing with increasing frequency in discussion of big data – and with good reason. Thomas Redman, the self-styled Data Doc, has been posting some good articles on All Analytics about what he calls the "D4 process" that characterizes four increasingly difficult stages of the big data development for governments and corporations.

1. Data acquisition.  You can acquire potentially interesting data with little regard for quality. On the flip side, however, if you don’t ensure quality at this step, you make later steps increasingly difficult. Even machine created and harvested data is not immune from errors.

2. Discovery. It is more difficult to discover something truly interesting when the data is bad. Additionally, it's possible to draw invalid conclusions from relatively scrubbed data if error patterns aren't understood.

3. Delivery.  It is even more difficult to get someone to use the results when they don’t trust the data. The fruit of big data applications is in the "small data" or the "connected dots. One small data quality issue or missed connection between data sets will destroy the credibility of the project.

4. Dollars. While it is nearly impossible to make money from poor-quality data, the government can't deliver poor-quality data, period.

About the Author

Connect with the GCN staff on Twitter @GCNtech.

Reader Comments

Please post your comments here. Comments are moderated, so they may not appear immediately after submitting. We will not post comments that we consider abusive or off-topic.

Please type the letters/numbers you see above