Taxonomy's not just design, it's an art
- By Joab Jackson
- Feb 03, 2004
Michael C. Daconta, Web master builder
If anyone understands the acronym soup of Web services, it's Michael C. Daconta. He's director of Web and technology services for systems integrator APG McDonald Bradley Inc. of McLean, Va. As part of that job, Daconta is chief architect of the Defense Intelligence Agency's Virtual Knowledge Base, a project to compile a directory of Defense Department data through Extensible Markup Language ontologies.
A vocal commentator on software issues, Daconta has authored numerous technical papers and books. Most recently, he co-wrote the 2003 book The Semantic Web, along with Leo Obrst and Kevin Smith. The book is a primer on how XML, Web services and the emerging semantic Web fit together.
Before working on the Virtual Knowledge Base, Daconta helped create a set of electronic mortgage standards for Fannie Mae. In the Army, Daconta worked as a programming section chief on combat and intelligence simulation software at Fort Huachuca, Ariz.
Daconta received a bachelor's degree in computer science from New York University and a master's in computer science from Nova Southeastern University.
Associate editor Joab Jackson interviewed Daconta at GCN's offices in Washington.GCN: What is the semantic Web and how does it differ from Extensible Markup Language?
DACONTA: The semantic Web is a web of machine-processable data. The current Web is human-readable data.
With XML, you get part of the way to the semantic Web. You get syntactic interoperability, which means that both applications must agree on the syntax beforehand, and that both applications understand the meaning of that syntax [to exchange data].
If my machine has an item called 'price,' your machine must know what 'price' means. If you use 'cost' and not 'price,' then our programs cannot communicate.
The semantic Web bridges that. If I have an item in my system called 'person' and you are looking for 'terrorist,' then we need an understanding that 'terrorist' is a type of 'person.'
The semantic Web bridges these differences by modeling data at a higher level. We're not just talking about the syntax of the words of the message, but modeling what those words mean.GCN: Will everyone use a single taxonomy for one big semantic Web, or will organizations build their own semantic Webs?
DACONTA: There clearly will not be just one semantic Web. A lot of people are looking at taxonomies, so they have to be careful.
There is a right way and a wrong way to design a taxonomy. You can do a taxonomy informally, but that will only get you certain capabilities'for example, a hierarchical tree for people to browse. If you want more functionality, such as query expansions [to search for documents], you need a formal taxonomy. That simply means that every subtopic is a subclass to a topic.GCN: How could the semantic Web benefit the typical user?
DACONTA: The semantic Web will be visible in many different ways.
Tim Berners-Lee [creator of the protocols for the Web and coiner of the term 'semantic Web'] wrote an article for Scientific American in which he talked about Web software agents that could automatically schedule a doctor's appointment for people.
The software agent should be able to look at both your schedule and the doctor's schedule, find information on the doctor's specialty, as well as understand what your medical complaint is.
The personal agent on your desktop and the doctor's agent need to know how to communicate, so these programs must have semantic-enabled data to be understood.GCN: What is smart data?
DACONTA: Smart data is putting the intelligence in the data, rather than hard-coding it in an application. This is really important for net-centric computing. You never know beforehand who needs data because it's an always-changing population.
Brig. Gen. Michael Ennis [director of intelligence at Marines Corps headquarters] is a big proponent of XML efforts. He wants to assemble his own data. So he's pushing for Lego-like approaches to produce data. And we're still at the blob stage. We have to get down to fine-grained production of content.GCN: How will the Defense Department use the Defense Intelligence Agency's Virtual Knowledge Base?
DACONTA: There are two major components: a knowledge registry and a federation scheme for data sources.
When I say data sources, I mean unstructured, semistructured and structured data sources. There are some important distinctions. For example, there are registries for Web pages and for Web services. But there's no composite registry.
The compelling goals are targetless query and targetless production.
By targetless, I mean that if you want information, you shouldn't have to know where the information resides, just to say, 'I need to know about X.' The idea is that Virtual Knowledge Base will know where X is stored.
This is not data warehousing. Data warehousing is about centralizing information. Virtual Knowledge Base will be an interoperability framework between existing data sources. We won't try to own the data sources. Data sources will need to describe themselves and be accessible in a standard way.GCN: What is the road map for the Virtual Knowledge Base?
DACONTA: The Virtual Knowledge Base is on an aggressive schedule. Our goals are being accelerated by our work on Defense's Horizontal Fusion transformational program. Horizontal fusion means piloting and testing concepts that will be rolled into the formal Net-Centric Enterprise Services program.
The first goal for the Virtual Knowledge Base was to complete the basic functionality in a centralized manner. It was written down in the concept of operations, and we did that before the program was approved.
The second goal is moving from a centralized registry to a distributed registry. The third major phase is to integrate a distributed registry down to the regional level.
Anything that a unit wants to share with the rest of the community can be automatically registered with the Virtual Knowledge Base.
We're getting agencies feeding us with their products now. There is work involved. Our architecture is based on XML, and if your organization hasn't moved to XML, then we can't talk to you.
The final step is a seamless link between digital production and discovery. All the content creation environments should be hooked into the same knowledge structure that the Virtual Knowledge Base uses for discovery. So as soon as content is produced, we can disassemble it into its component parts.GCN: What traps do agencies run into when using metadata?
DACONTA: One trap is not modeling it correctly'not modeling it according to object-oriented principles.
It is easy to model things incorrectly in XML. As an example, XML has no formal relation between a parent element and a child element.
I can have 'person' as a parent element, and a child element could be 'hair color.' But I could just as easily have a child element be 'works for.'
'Works for' and 'hair color' are very different things. 'Hair color' is intrinsic to the person and cannot be separated from that person. But 'works for' is not intrinsic to a person and can change from time to time.
If you model these things on object-oriented principles, then intrinsic characteristics should be attributes instead of subelements. Why?
An element cannot be separated from intrinsic attributes. GCN: Your resume says you are a distant relation of impressionist painter Henri Matisse. Do you see any similarities in the work he did compared with what you do?
DACONTA: My grandmother always told us that Henri Matisse was a third or fourth great-uncle. I believe we're in the tree somewhere, but I don't know exactly where. I'll probably have to take a trip to Europe to find out.
I do a lot of artistic things. I draw, paint, play guitar, sing. I bring the same creativities to bear on the technology in the design of programs, in the design of metadata.
People intuitively know an elegant design. Sometimes it is efficiency. Sometimes it is simplicity. You just say, 'That is the right way to go.' You sense the elegance of the design. You see how things smoothly fit together with few moving parts so that it does not get into a morass by being too complex.