Canary in a data mine: How analytics detects early signs of bio threats
- By Patrick Marshall
- Oct 30, 2012
This is the third of a four-part series on text analytics.
One of the most ambitious attempts to bring the power of text analytics to bear in the interest of public safety is about to go into field testing. Funded by the Homeland Security Department, the National Collaborative for Bio-Preparedness (NCB-Prepared) is designed to monitor emergency medical services reports, poison center data and a wide array of other data sets, including social media, to detect signs of biological threats.
NCB-Prepared is in a demonstration phase of development by its primary partners, the University of North Carolina at Chapel Hill, North Carolina State University and the SAS Institute.
“We also intersect with state agencies and other groups,” said Dr. Charles Cairns, chair of emergency medicine at UNC and principal investigator at NCP-Prepared. “The overall theme is that we can get data early, data closer to the point of illness or injury, data that represents the earliest signals in a health threat.”
The project already has demonstrated great promise, Cairns said. One of the first things researchers did was take a look at EMS records. “Using that approach we were able to detect a gastrointestinal outbreak a full two months before it was recognized by the standard reporting. So the power of this approach was demonstrated by using EMS records,” he said.
The NCB-Prepared analytics system employs SAS text analytics software running on NC State’s cloud-based Virtual Computing Lab to scan rapidly expanding data sets for patterns that could indicate an emerging threat to public health. The system is designed to be scalable to accommodate both growing data sets and adoption by public health agencies across the country.
“We’ve already expanded to include information from South Carolina as well as North Carolina,” Cairns said. “And we’re looking beyond poison center data and EMS data, to take a look at population data and health care infrastructure data. We look at aspects of social media, and we now have some national data sets that start focusing on things like foodborne illness.”
Cairns added that the hope is to eventually have veterinary data, wildlife data, pharmacy retail data and as many data sources as can provide insight into early recognition and better situational awareness.
“Currently we’re looking at 13 million records, and that number is expanding rapidly,” he said. “Frankly, it has been just extraordinarily successful. We’ve had another 20 data set owners contact us wanting to participate.”
The project is expected to move into field testing by the end of this year.
YESTERDAY: How agencies use text analytics to read between the lines of terabytes of data.
TOMORROW: What’s next: Text analytics ready for the heavy lifting
Patrick Marshall is a freelance technology writer for GCN.