Reality Check

Blog archive
search

Semantic search: Still more luck than technology

While watching the movie “Detachment” on Netflix, I grew curious about the scene where no parents show up for parents’ night at the high school. I wondered how prevalent poorly attended parents’ nights were across high schools in the United States. So I Googled it, typing in the following search:  “Parents no show on high school parent’s night.” I received plenty of hits, but none on the problem of parents not attending a parents’ night at high school.  

And there lies the problem: I know what I want to learn – in this case, statistics or stories indicating how common this phenomena is across America, but I am not translating that query into the right keywords, or more precisely the “exact” keywords, that will provide relevant results. In the jargon of information retrieval, this would be considered a low precision search because out of many hits there were no relevant ones.

Thinking further about the query, I came up with a new Google search: “parental involvement in urban schools.” Boom! I hit the mother lode of relevant results. What happened here?

The key difference was that instead of describing the symptoms of the phenomena I was interested in, I had to extract the meaning myself and interpret a possible cause of those symptoms. In essence, my search became a test of my hypothesis on the cause of those symptoms. Shaking my head, it was clear to me that search has a long way to go if it depends on the searcher coming up with a “magic phrase” that matches the most common description of the relevant results.

Fortunately, improved search is on everyone’s radar.

In an interview with Bloomberg TV, Yahoo’s Marissa Mayer said search can be improved through personalization and context.  Her key point on personalization is that a search engine should be able to extract context from your search history, location, social data, etc. to deliver more relevant content. That resulted in her oft-quoted phrase, “In the future, you become the query.” Facebook is also experimenting with search via its Graph Search.   

Google is also zeroing in on the weakness of current search results and is working hard to deliver semantic search. It has added the Knowledge Graph and the new Hummingbird algorithm. Finally, Business Insider predicts a new war over semantic search with Apple and startups gunning for Google.

What does this mean for government information managers? Basically, really important information discovery can’t rely on keywords and traditional search engines. Careful metadata curation, good categorization and understanding your users will provide more relevant results.

As for Google, I just hope they can soon get rid of the “I’m Feeling Lucky” button. Until they do, I don’t think they will have succeeded in convincing anyone that semantic search is more technology than luck. 

Michael C. Daconta (mdaconta@incadencecorp.com) is the Vice President of Advanced Technology at InCadence Strategic Solutions and the former Metadata Program Manager for the Homeland Security Department. His new book is entitled, The Great Cloud Migration: Your Roadmap to Cloud Computing, Big Data and Linked Data.

Posted by Michael C. Daconta on Jan 15, 2014 at 10:28 AM


Reader Comments

Mon, Jan 20, 2014

This requires a much deeper conversation and understanding of the subject at scale. It involves different parts of the equation to include, but not limited to, parsing, entity extraction, inference, deep learning, contextual meaning, and artificial intelligence. It is in fact the latter that is the most important. "The aim to give these services the power to actually understand what their users are saying without help from other humans." Geoff Hinton, Computer Scientist, Artificial Neural Networks. http://www.wired.com/wiredenterprise/2014/01/geoffrey-hinton-deep-learning

Fri, Jan 17, 2014 Don O'Neill

If you want semantics searches, you want IBM Watson. IBM Watson operates on a higher plane. It is a cut above. It is a game changer. Each IBM Watson advancement represents the successful completion of a Grand Challenge including Blue Gene with its computational speed, Deep Blue with its demonstrated mastery of chess, Jeopardy with its lightening quick and accurate quiz game responses, cognitive computing and the promise of deep semantic discovery, cognitive systems now the inflection point of broad useful application, and the next challenge on the horizon which is to pass the U.S. Medical Examination. I believe in IBM Watson... for one reason, IBM believes its own message! I know this because IBM is putting money, $1B over the next 5 years, into the success of cognitive computing, and the IBM CEO is betting her job on it. If successful, the biggest news is that IBM is plowing this field by itself, so as a minimum it will have a ten year leap over the competition. At the start of 2014, IBM Watson is two years into a twenty year program of commercializing cognitive computing; the 3rd Era of computing and the Next Big Thing are underway.

Thu, Jan 16, 2014 Kanwar sangha United States

Great Article! Semantics is the future and we are seeing Google making small strides. We are also trying to build semantics for the travel vertical and it is a very difficult problem. The data crawled from the web is structured internally by our algo and then a NLP is run. Slowly but surely we are getting there :)! (findmycarrots.com)

Thu, Jan 16, 2014 Richard Ordowich

I find it amusing that we expect to be able to find information from sources that are semantically illiterate. Most of the data and information online and in most databases was not designed semantically. Typically, a programmer was the only person who created the name, definition and rules governing the data. And programmers were not trained in linguistics or semantics. For that matter neither were CIO’s and DBA’s. As a result most of the data we have is ill formed at best. What is lacking is data literacy. Teaching of semantics, syntax and linguistics should be prerequisites in education along with some philosophy about data and information.

Please post your comments here. Comments are moderated, so they may not appear immediately after submitting. We will not post comments that we consider abusive or off-topic.

Please type the letters/numbers you see above

resources

HTML - No Current Item Deck
  • Transforming Constituent Services with Business Process Management
  • Improving Performance in Hybrid Clouds
  • Data Center Consolidation & Energy Efficiency in Federal Facilities