Emerging Tech

Blog archive
algorithms for fact checking

Fact verification as easy as spellcheck?

Computers are very good at digesting massive amounts of data, from Facebook feeds to NASA satellite data streams to medical records and insurance claims.  What they’re not good at is determining the accuracy of all that data.

Not that artificial intelligence designers aren’t trying.  But building inference engines that can perform logical operations on new, unanticipated data sets is incredibly difficult.

Researchers at Indiana University decided to try a different approach to the problem.  Instead of trying to build complex logic into a program, researchers proposed something simpler.  Why not try measure the likelihood of a statement being true by analyzing the proximity of its terms and the specificity of its connectors?

OK, I admit that this approach is not intuitive to non-mathematically inclined humans.  But that’s what Giovanni Luca Ciampaglia, a postdoctoral fellow at Indiana University’s Bloomington School of Informatics and Computing, and his colleagues are researching, and it seems to work.

The first step is to create a knowledge graph for a given data set.  When assessing the likely truth or falsity of a statement, the algorithms developed by Ciampaglia measure the number of steps required to get from the beginning data point in the statement to the end data point.  Also, the algorithm factors in the “generic” quality of nodes that link the data points. 

“Let’s say you want to check the statement, ‘Barack Obama is a socialist,’” said Ciampaglia.  “Barack Obama is a person, and Joseph Stalin was also a person and was a communist, which is a special brand of socialist.  So I found a connection there.”  Of course, Ciampaglia said, “nobody would really buy this kind of argument, because you have to go to the concept of ‘human,’ and that concept is very general, because there are 6 billion humans.”

“Our algorithm isn’t really doing logic,” Ciampaglia explained.  “It’s a measure of semantic relatedness.  It’s about finding the shortest path between two data points.”

The following example shows the semantic steps between "Barack Obama" and "Islam." The sheer number of steps and the references to Canadians indicates that President Obama and Islam are not closely related concepts, making the statement unlikely to be true.

chart showing the semantic steps between obama and muslim

To test the algorithm, Ciampaglia’s team built a knowledge graph of Wikipedia data with 3 million concepts and 23 million links.  The team’s automated fact-checking system – which assigned “truth scores” to statements being tested – consistently matched the results provided by human fact checkers.

Ciampaglia said his work is still a long way from a being a generally useful product.  For starters, he said, a fact-checking tool based on his algorithm would first need to be able to digest and understand the text in the data set.  And while natural language processing is a hot area of research, the tools are not yet up to the job of managing large data sets accurately.

“It will not be tomorrow or next summer,” he said.  “But we will eventually get there.  All these technologies are coming together, and what we did provides a direction for this last step.”

Ciampaglia said the eventual product will be something like a spellchecker or a grammar checker, in that it would alert the user to a likely falsehood in a statement by issuing an alert when there is too much distance between nodes or the connections are too generic. 

It could be used for fraud detection and in screening data streamed during a disaster.  “In a disaster or crisis, you want to get out only correct information,” he said.

 

chart showing the semantic steps between obama and muslim 

Posted by Patrick Marshall on Jun 30, 2015 at 11:05 AM


Featured

  • Telecommunications
    Stock photo ID: 658810513 By asharkyu

    GSA extends EIS deadline to 2023

    Agencies are getting up to three more years on existing telecom contracts before having to shift to the $50 billion Enterprise Infrastructure Solutions vehicle.

  • Workforce
    Shutterstock image ID: 569172169 By Zenzen

    OMB looks to retrain feds to fill cyber needs

    The federal government is taking steps to fill high-demand, skills-gap positions in tech by retraining employees already working within agencies without a cyber or IT background.

  • Acquisition
    GSA Headquarters (Photo by Rena Schild/Shutterstock)

    GSA to consolidate multiple award schedules

    The General Services Administration plans to consolidate dozens of its buying schedules across product areas including IT and services to reduce duplication.

Stay Connected

Sign up for our newsletter.

I agree to this site's Privacy Policy.