Emerging Tech

Blog archive
searching on computer (Urupong Phunkoed/Shutterstock.com)

Using crowds to teach AI to search smarter

As anyone who searched the internet in the late 1990s is aware, language-based search engines are getting steadily smarter and delivering more relevant results. Search engines still struggle when it comes to classifying some types of content, though, especially images, videos and language content that employs slang or jargon.   

To help address such shortcomings, a team at the University of Texas at Austin is using crowdsourced input to train its machine-learning algorithms to create a more intelligent search engine. 

“A lot of the machine learning today is built on the idea that the best way to transfer human knowledge into realizing intelligent systems is to have people provide lots of examples of what they want the system to do,” said Matthew Lease, associate professor in the School of Information. “The system then induces patterns and figures how to generalize that to new unseen examples.”

Since the accuracy of a machine-learning system is often driven by the quantity of the data, sometimes more than the algorithms themselves, Lease said that “anything that changes the scale of data that we can get is a game changer.”  That is where crowdsourcing comes in.

By using people to read articles in medical journals and breaking news stories to extract and label the key details – events, people and places – the researchers can give the machine learning system more examples of correctly labeled content. “Crowdsourcing is a way that we have found to really ramp up the scale of label data that we collect," he said. 

Conceding that crowds of lay people are less accurate than experts, Lease added that there are ways to check their work. “You may already have some examples of what you want, and you simply check how well the data you are collecting agrees with that,” he said. When the data from the crowd disagrees with the algorithm, it can be flagged for a human to check it.

Lease's team was able to train a neural network so it could accurately extract relevant information in unannotated texts, improving upon existing tagging and training methods. They also found they could estimate the quality of each crowdsourcer's work, which was not only useful for error analysis, but also let them identify the best person to annotate each particular text.

Lease said he even envisions tying crowdsourcing to search engines in real time.  Taking a picture of plate of food and searching for the calorie count, for example, isn't yet possible because computer vision technology isn't advanced enough.

But with crowdsourcing, "we can let the machine learning system take us 80 percent of the way and then in real time reach out to the crowd to close the loop," he said. "The user using application might not even know … what is being done by AI and what is being done by the crowd.”

Lease has received grants from the National Science Foundation, the Institute of Museum and Library Services and the Defense Advanced Research Projects Agency to improve search engines through integrating crowdsourcing.

Posted by Patrick Marshall on Aug 16, 2017 at 1:45 PM


  • Comment
    Pilot Class. The author and Barbie Flowers are first row third and second from right, respectively.

    How VA is disrupting tech delivery

    A former Digital Service specialist at the Department of Veterans Affairs explains efforts to transition government from a legacy "project" approach to a more user-centered "product" method.

  • Cloud
    cloud migration

    DHS cloud push comes with complications

    A pressing data center closure schedule and an ensuing scramble to move applications means that some Homeland Security components might need more than one hop to get to the cloud.

  • Comment
    Blue Signage and logo of the U.S. Department of Veterans Affairs

    Doing digital differently at VA

    The Department of Veterans Affairs CIO explains why digital transformation is not optional.

Stay Connected

Sign up for our newsletter.

I agree to this site's Privacy Policy.