Text analytics ready for the heavy lifting of agencies' data mining
- By Patrick Marshall
- Oct 31, 2012
This is the fourth of a four-part series on text analytics.
All those involved with text analytics agree that the need for tools to analyze unstructured text is only going to grow.
“The government is struggling in all organizations with how to harness big data,” said Fiona McNeill, text analytics product marketing manager at SAS. “You don’t have to boil the ocean when you have text analytics. You can extract just what is relevant to begin with and then investigate that for the value.”
Chris Biow, federal CTO at MarkLogic, agrees. “Any agency in the government that deals in any respect with the public should be to using text analytics now,” he told GCN. “It’s maybe only being used now in 20 percent of the cases where it should. It’s as broad as treaty compliance versus watching public sentiment toward the United States overseas to predict a riot. All of that is out there.”
Unfortunately, the reluctance of public sector organizations to talk about their implementations of text analytics means there are few case studies to guide those interested in possibly implementing it.
“I think a lot of the stuff is in a toolkit phase,” said Jamie Popkin, managing vice president at Gartner Group. He advises those considering implementing text analytics to talk first to their existing vendors. “Much of this technology comes from vendors that you’re already doing business with,” he said. “Be careful about platform proliferation. You probably already have three or four vendors you’re doing business with that all could offer you text analytics as part of their existing applications.”
MarkLogic’s Biow said the most critical thing in initial implementations of text analytics is to manage expectations because machines still are not nearly as good at analyzing text as humans are. “The machine’s advantage is that it can do all the text,” he explained. “[But] you don’t have enough human beings to read it all. The machines will make a pass-over and humans can then refine that. The machines are getting better in terms of the complexity and detail that they can extract, but not necessarily in terms of the quality. That’s why it’s important to set expectations.”
“The best practice here,” Biow said, “is setting reasonable expectations. And results can definitely be improved as your users, library scientists and text analytics vendors start working together.”
YESTERDAY: Canary in a data mine: How analytics detects early signs of bio threats
Patrick Marshall is a freelance technology writer for GCN.