When big data doesn’t equal big knowledge

While big data analytics have been touted for their ability to find signals in a sea of noise, they cannot tell what those signals mean. Without a solid grasp of what data is being mined, knowledge of its accuracy and why and how it is being mined, big data can end up causing more problems than it solves.

This problem can be most acutely seen in the public health arena, where the amount of data is increasing exponentially.

“Paradoxically, the proportion of false alarms among all proposed ‘findings’ may increase when one can measure more things, Muin Khoury and John Ioannidis  wrote in a recent report, “Big Data Meets Public Health.”  That’s what happened when  Google dramatically overestimated peak flu levels, basing the analysis on flu-related Internet searches.

Analytics, in other words, is only as good as its data foundation -- which, in some cases, is shaky. “Research accuracy is dictated by the weakest link," the authors said, with current analytics often based on  “convenient samples of people or information available on the Internet.”  

Information gleaned from the Internet needs to be integrated with other data and interpreted with “knowledge management, knowledge synthesis and knowledge translation,” Khoury and Ioannidis  stated. Machine learning algorithms can help -- although, again, as Microsoft learned when its Twitter bot Tay went off the rails, parameters must be set to avoid havoc when data is collected.

Put another way, big data is a collection of “raw observations that have limited value by themselves. What gives a raw observation value…is placing it in an interpretive context to yield information,” wrote Dr. Ida Sim in an article in the Annals of Internal Medicine.  “An algorithm may detect a pattern in a database but have no way of recognizing whether the result is true, spurious or affected by bias.”

Solid results from big data analytics goes beyond the data itself.  Using big data for true precision medicine, not only requires “clean, complete and standardized datasets,” but cooperation and collaboration from those involved, such as the federal government, research organizations and health IT developers, Jennifer Bresnick wrote in an article in Health IT Analytics.

As the amount of data continues to grow so will the problem of incorrect analysis.

However,   Khoury and Ioannidis said, “the combination of a strong epidemiologic foundation, robust knowledge integration, principles of evidence-based medicine and an expanded translation research agenda can put big data on the right course.”

About the Author

Kathleen Hickey is a freelance writer for GCN.

inside gcn

  • man vs robot race (Zenzen/Shutterstock.com)

    Agencies see big upsides to RPA

Reader Comments

Tue, Apr 12, 2016 Tom Muscarello

I have found over the years that the only way to ensure data quality is to clean it up. This can be a sizeable undertaking. Analytic techniques might also require normalizing the structured data. Finally, make sure that you figure out the right questions you want to ask. If you are just "prospecting" be sure that you can validate any patterns found. Big Data doesn't equal big knowledge. But if you know the information context you can uncover knowledge hidden in the data.

Thu, Apr 7, 2016 Peter Fretty

For big data to equal big knowledge there needs to be a focus on data quality. There also needs to be a strategy in place for how the data is collected to ensure the organization is able to ask questions of a useful dataset. Some great posts on this topic at Big Data Forum. Peter Fretty, IDG blogger for SAS

Please post your comments here. Comments are moderated, so they may not appear immediately after submitting. We will not post comments that we consider abusive or off-topic.

Please type the letters/numbers you see above

More from 1105 Public Sector Media Group