The knowledge graphs behind Microsoft's virtual assistant can be applied by agencies to build searchable stores of organizational knowledge.
“Cortana” is Microsoft’s codename for a personal, virtual assistant like Apple’s Siri. It borrows its name and persona from the wildly popular Halo video game series, in which Cortana was the artificial intelligence holograph that assisted you, as “Master Chief,” in saving the galaxy. In order to compete in the smartphone market, Microsoft needs a smart personal assistant for its Windows phone, but in this article we will dig into what fuels Cortana’s intelligence and how your organization can benefit by leveraging the same techniques.
Behind Cortana is the Satori knowledge graph that currently powers the Bing search engine. A knowledge graph is a data structure that stores names, places and things as the nodes of the graph, and the relationships between them as the links of the graph. Broader than that, a graph is a very common computer science data structure with nodes and links (also called edges) that is used to represent many real-world structures like road networks and communication networks. Even Web pages are a form of in-memory graphs (called trees).
Knowledge graphs are also related to social graphs popularized by Facebook and Google+. The Satori knowledge graph uses the W3C Resource Description Framework (RDF) standard and has over 300 million nodes and 800 million edges.
RDF is the same standard used for linked data and is well suited to creating knowledge graphs, as every RDF statement follows the pattern of Subject, Predicate and Object, where the Subject and Object are nodes in the graph and the Predicate is the edge between them. Thus, RDF maps the grammar of a sentence (i.e. “Mike knows Lynne”) to the parts of a graph. This kind of linked data is currently being leveraged to store open data on data.gov, healthcare.data.gov and data.gov.uk. Storing both knowledge and open data using linked data enables a new form of knowledge management for smart organizations. For the federal government, it’s a way to head off the impending knowledge management crisis.
The average age of today’s civil servant is 47, and about 30 percent of the federal workforce will be eligible to retire by 2016, according to the Government Accountability Office. Now that the recession seems to be receding, the wave of retirements has begun, with a 21 percent increase over last year. That means organizational knowledge and information is literally walking out the door unless you have knowledge management systems augmented with a knowledge graph to capture and organize what retirees know.
So, how do you create your own knowledge graph? The best way to answer that is to work backward from what we have covered, which means begin with linked data and use it to organize and publish your data sets to data.gov. You create linked data by following four simple rules:
1. Use URIs as names for things. A Universal Resource Identifier (URI) identifies a resource by name or location (or both). There are two types of URIs: a Uniform Resource Name (URN) and a Uniform Resource Locator (URL). We are most familiar with URLs as they are what we put in the browser to retrieve a Web page, specifying the network location and the protocol for retrieving. The protocol part of the URL precedes the colon. For example: http://www.w3.org/Icons/w3c_home.gif is the location of an image file at the W3C website that is retrieved by the browser using the HyperText Transfer Protocol (HTTP). A URN, on the other hand, defines a name for something but does not describe how to retrieve it.
2. Use HTTP URIs so that people can look up those names. Instead of using a URN, use a URL so that the data referred to can be simply retrieved over the web using the HTTP protocol. We do this in our browser every day by typing a URL into the browser location bar. Typing http://www.w3.org into a browser asks the browser to fetch the file index.html from the Web server (which the browser speaks to with HTTP commands like get and post) at the domain name www.w3.org.
3. Provide useful information in the URI, using the standards (RDF, SPARQL). If a URL identifies a topic, at the end of that topic provide a data file that describes attributes of that concept in a standard format. The New York Times, for example, publishes RDF files describing the topics in the paper available at data.nytimes.com.
4. Include links in the data to other URIs so users can discover more things. Your concepts will refer to other concepts in a chain (or graph) of things. An important future form of authority will be the number of graphs that link to a concept that you define as corroborating the truth of your definition of that concept. This is especially important for the inherently governmental concepts where the government acts as the authoritative “backstop” for such a concept. An example of this would be concepts like “unemployment rate” from the Bureau of Labor Statistics.
Once you’re familiar with how to create linked data, you can move up the ladder of techniques to social graphs, knowledge graphs and intelligent assistants. Linked data enables you to create both social graphs and knowledge graphs. In fact, a social graph is really just a type of knowledge graph focused on people. Using the familiar triad of “peoples, places and things,” you move from creating a social graph to completing the triad with interlinked knowledge graphs on the places and things of your organization.
After creating your knowledge graphs you can expand them organically by linking to concepts in your IT systems and slowly integrate them into the daily activities of your organization. Given this growing knowledge base, you can then leverage virtual assistants (similar to Cortana) to ask and answer questions about your organization’s knowledge. Extrapolating further, you could even be able to access and leverage that knowledge via mobile applications to truly give your employees knowledge at their fingertips.
Michael C. Daconta (firstname.lastname@example.org) is the Vice President of Advanced Technology at InCadence Strategic Solutions and the former Metadata Program Manager for the Homeland Security Department. His new book is entitled, The Great Cloud Migration: Your Roadmap to Cloud Computing, Big Data and Linked Data.
NEXT STORY: New Christie 3D projector is no mirage