Michael Daconta | The next wave of semantic applications
In May, Stephen Wolfram of Mathematica fame plans to launch a computational knowledge engine on the Web that computes the answers to natural language questions. Recently, he has demonstrated the tool for information technology luminaries such as Doug Lenat (creator of Cyc) and Nova Spivack (chief executive officer of Radar Networks, the maker of Twine). They, in turn, have blogged about its capabilities and impact.
Of course, launching anything semantic on the Web brings up the obligatory comparisons to Google and speculation about whether it will be yet another would-be Google-killer. However, they take very different approaches to information queries, with the only real similarity being a minimalist, single text field in which to enter a query.
If you ask a question of Google, it retrieves possibly thousands of hits. Ask a question of Wolfram Alpha and it returns one answer. The reason is not that one approach is better than the other but that they attack the problem of relevance from different perspectives: Google is meant to retrieve documents with some probability that one contains what you are looking for. Wolfram Alpha knows how to answer many questions — but not all — and computes the answer to those questions via specific handcrafted algorithms.
How is that important to the government? Let me first digress a bit with a story about a recent event. Last month, I spoke at a packed seminar on e-discovery in which lawyers eagerly listened to basic concepts of metadata, information discovery and knowledge representation. Those smart, non-IT people were leaning forward and actively listening for one simple reason: Their current IT tools cannot solve their problems.
Thousands, hundreds of thousands or millions of hits are not the answer because the problem is one of precision. I have blogged about this phenomenon before in relation to the Data Optimization Pyramid. That pyramid — composed of unmanaged data at the base, managed data in the middle and knowledge bases at the apex — represents the principle that not all data is equal, and therefore, different data requires different levels of optimization.
The lawyers at the seminar were experiencing pain because their current level of data optimization did not match their current required level of precision. That mismatch leads to the painful realization that more data needs to move from the managed data level to the next level of semantic applications. So, in that light, Wolfram Alpha represents another pragmatic application of semantic processing. Not perfect artificial intelligence, not perfect knowledge representation, but another good step in the right direction.
Just like financial assets, technologies rise and fall in hype bubbles. However, throughout those cycles, dedicated people continue to work on improving the technology and solving real problems. With Wolfram Alpha, we see the emergence of the next wave of semantic applications, which have learned from the retrenchment of the previous wave and have come back with new approaches.
I categorize the strategies in this new wave into three buckets: honey pots, heuristics and hard science.
Semantic honey pots are pools of context-rich data that serve as resources for semantic processing. The category includes Web 2.0 sites such as DBPedia, FreeBase, semantic wikis and possibly even Data.gov.
Heuristics marry practical insights and algorithms to make a best guess or provide an approximate solution that works in most cases. In this category are entity analytics (sophisticated scoring algorithms), business rules engines and Twine.
Finally, in the hard-science category, we have a set of problems that are generally intractable with brute-force methods such as natural language recognition, general human understanding (Cyc) and machine learning.
From the blogs about the forthcoming release, Wolfram Alpha seems to be somewhere between heuristics and hard science. It will be exciting to see what new techniques and approaches Stephen Wolfram brings to semantic applications. It could be another watershed event in semantic computing.