Intell tool would track social media like a virus
- By John Breeden II
- Feb 12, 2013
Government has gotten quite good over the years at modeling the spread of various diseases and using that data to predict and fight major outbreaks and epidemics. Now the Office of Naval Research wants to see if it’s possible to track ideas in a similar fashion.
If hostile ideas and their spread via social media can be tracked and modeled, might it be possible to predict protests, revolutions or even individual attacks before they happen?
Robert McCormack, a principle investigator at Aptima, Inc., thinks so. McCormack is exploring the flow of ideas and events during the Arab Spring using the Aptima's Epidemiological Modeling of the Evolution of Messages (E-MEME) system for ONR.
The spread of anti-government feeling from Tunisia to other countries in North Africa and the Middle East followed a pattern similar to that of a virus, though carried via electronic communications, he said. Applying the principles of epidemiology to the spread of ideas could help analysts forecast their flow.
E-MEME is designed to become a tool for analysts, and McCormack, in an interview, stressed that the human element is always part of the equation. E-MEME works by first using Aptima’s latent variable analysis text analytics tool, known as LaVA, to scour large sets of information from multiple sources of Internet data such as news feeds, blogs and social media.
Using natural language processing, E-MEME would get a read on popular topics being discussed around the world, within specific geographical areas or among certain communities. Setting up that process requires a human analyst, who decides what to focus on, who to follow and which leaders are of importance.
Once that data is collected, LaVA analyzes it to determine the topics being discussed. Out of a 1,000 possible themes, McCormack said that LaVA typically determines about 100 that have relevance and similar, interconnected relationships.
Then E-MEME goes to work continually monitoring those channels and their relationships to each other to see if any ideas have caught on with the masses. For example, in a recent test of the program focused on Pakistan, E-MEME looked at the difference between what official news broadcasts were saying and what the people in those areas were talking about. On the social side, McCormack saw everything talked about from the economy to protests to World Cup soccer. But that didn’t always match what the official news channels were saying.
“What we would see using epidemiological models is that after a topic was discussed in the news, a few days later we would see it make the rounds in blogs,” McCormack said, comparing some ideas to germs that “infect” a population while others simply fizzle out.
“The interesting thing is that topics that you might think would be big with the public sometimes don’t catch on at all, and you don’t see the blogs picking them up even if the official news channels cover them,” he said. “In the Pakistani study, Iranian nuclear weapons was one topic that did see a big reaction.”
Armed with that information, McCormack said that an analyst could classify the topic of Iran trying to build a nuclear weapon as an infectious idea within Pakistan. An analyst armed with that information could then predict a strong reaction within Pakistan, in the event of a significant new report about the topic.
E-MEME might also be used to predict how long it takes an infectious idea to cross over geographical areas or how likely it is to spread at all. Typically an idea that is going to spread will do so within a few days, McCormack said, though some move faster than others — just like real diseases. And they can also jump to new populations and regions, just like diseases.
E-MEME can accurately predict the topics speed and how far they will spread based on data it has previously gathered on similar topics in the region, just like how medical software can tell how quickly a certain strain of flu will spread based on previous experience with it.
With E-MEME working well now in smaller, controlled tests, the next phase will be to ramp it up using clustered computers to drive the software and the cloud for storing data to give it unlimited storage capacity.
Mark Weston, Aptima’s lead for big data, said that E-MEME has no upper limit on how much data it can analyze. In fact, more is probably better, he said. “The more data we give to E-MEME, the more accurate the predictions are going to be,” Weston said. “For phase two, we want it to use cluster and cloud technology to have it analyze millions of blogs, all nicely broken up into compartments.”
That day may come soon. McCormack said the due date for E-MEME to be running on a large scale is this April. Later on this year, the plan is to incorporate the analysis software into military exercises. If that goes well, it will become another tool in the arsenal of intelligence analysts and others involved in military, tactical, operational and strategic operations.