Emerging Tech

Blog archive
algorithms for fact checking

Fact verification as easy as spellcheck?

Computers are very good at digesting massive amounts of data, from Facebook feeds to NASA satellite data streams to medical records and insurance claims.  What they’re not good at is determining the accuracy of all that data.

Not that artificial intelligence designers aren’t trying.  But building inference engines that can perform logical operations on new, unanticipated data sets is incredibly difficult.

Researchers at Indiana University decided to try a different approach to the problem.  Instead of trying to build complex logic into a program, researchers proposed something simpler.  Why not try measure the likelihood of a statement being true by analyzing the proximity of its terms and the specificity of its connectors?

OK, I admit that this approach is not intuitive to non-mathematically inclined humans.  But that’s what Giovanni Luca Ciampaglia, a postdoctoral fellow at Indiana University’s Bloomington School of Informatics and Computing, and his colleagues are researching, and it seems to work.

The first step is to create a knowledge graph for a given data set.  When assessing the likely truth or falsity of a statement, the algorithms developed by Ciampaglia measure the number of steps required to get from the beginning data point in the statement to the end data point.  Also, the algorithm factors in the “generic” quality of nodes that link the data points. 

“Let’s say you want to check the statement, ‘Barack Obama is a socialist,’” said Ciampaglia.  “Barack Obama is a person, and Joseph Stalin was also a person and was a communist, which is a special brand of socialist.  So I found a connection there.”  Of course, Ciampaglia said, “nobody would really buy this kind of argument, because you have to go to the concept of ‘human,’ and that concept is very general, because there are 6 billion humans.”

“Our algorithm isn’t really doing logic,” Ciampaglia explained.  “It’s a measure of semantic relatedness.  It’s about finding the shortest path between two data points.”

The following example shows the semantic steps between "Barack Obama" and "Islam." The sheer number of steps and the references to Canadians indicates that President Obama and Islam are not closely related concepts, making the statement unlikely to be true.

chart showing the semantic steps between obama and muslim

To test the algorithm, Ciampaglia’s team built a knowledge graph of Wikipedia data with 3 million concepts and 23 million links.  The team’s automated fact-checking system – which assigned “truth scores” to statements being tested – consistently matched the results provided by human fact checkers.

Ciampaglia said his work is still a long way from a being a generally useful product.  For starters, he said, a fact-checking tool based on his algorithm would first need to be able to digest and understand the text in the data set.  And while natural language processing is a hot area of research, the tools are not yet up to the job of managing large data sets accurately.

“It will not be tomorrow or next summer,” he said.  “But we will eventually get there.  All these technologies are coming together, and what we did provides a direction for this last step.”

Ciampaglia said the eventual product will be something like a spellchecker or a grammar checker, in that it would alert the user to a likely falsehood in a statement by issuing an alert when there is too much distance between nodes or the connections are too generic. 

It could be used for fraud detection and in screening data streamed during a disaster.  “In a disaster or crisis, you want to get out only correct information,” he said.

 

chart showing the semantic steps between obama and muslim 

Posted by Patrick Marshall on Jun 30, 2015 at 11:05 AM


inside gcn

  • data science (chombosan/Shutterstock.com)

    4 steps to excellence in data analysis

Reader Comments

Please post your comments here. Comments are moderated, so they may not appear immediately after submitting. We will not post comments that we consider abusive or off-topic.

Please type the letters/numbers you see above

resources

HTML - No Current Item Deck
  • Transforming Constituent Services with Business Process Management
  • Improving Performance in Hybrid Clouds
  • Data Center Consolidation & Energy Efficiency in Federal Facilities

More from 1105 Public Sector Media Group