Resource description tool can add smarts to your Web pages
HTML+RDFa standard might finally introduce Web managers to the Semantic Web
- By Joab Jackson
- Oct 23, 2009
The rule-keepers of the Web have proffered a working draft of a standard that would allow Web managers to incorporate a bit of dormant intelligence into their Web sites.
More specifically, the World Wide Web Consortium has published the first draft of the HTML+RDFa, which the standards body describes as "a mechanism for embedding RDF in HTML."
The W3C's Hypertext Markup Language is universal markup language for rendering Web pages. Traditionally, it has been used to shape the appearance and layout of a Web page, not annotate any of the contents of the page. Over the past few years, though, the W3C and other bodies has been working to develop standards for adding to annotate content within Web pages, including the Resource Description Framework, which can describe the relationship between the two entities. RDFa, which stands for RDF-in-attributes, is an extension of RDF for annotating contents encoded in a markup language.
"While HTML provides a mechanism to express the structure of a document (title, paragraphs, links), RDFa provides a mechanism to express the meaning in a document (people, places, events)," the draft states.
The HTML+RDFa standard sets the rules for embedding RDFa annotations within Web pages, making them machine-readable, meaning another computer can examine parse the contents of a Web page and make logical assertions about the contents of the page.
The advantage that HTML+RDFa specification provides is that it allows Web managers to embed the RDF into the HTML document directly, rather than create a separate file, noted Tim Finin, a computer science professor at University of Maryland Baltimore County who has done a lot of work on artificial intelligence and the Semantic Web.
"You could have a document with text for people to look at and also data that a machine could extract that would say the same thing" Finin said. "Whereas before you would have to publish a HTML document and a RDF document and somehow link them. And if you changed one you would have to change the other. It's better to have just one."
In many presentations, Web standard founder Sir Tim Berners-Lee has stated that the next step for the Web is to move from being just human-readable to machine-readable as well—and the U.S. and U.K. governments have been on the forefront of this movement with initiatives such as Data.gov.
Presumably HTML+RDFa could speed this development, as it eases the burden of creating separate RDF annotations for data on Web pages.
"This is a promising development," asserted Michael Daconta cautiously, in an e-mail to GCN. Daconta is the chief technology officer at Accelerated Information Management LLC and the former metadata program manager for the Homeland Security Department.
Daconta cautioned that the field of semantic markup still faces a "chicken-and-egg problem," in which Web managers need tools to embed RDF on their pages, and organizations will need to tools to parse RDF information for their own services, but such tools for either party probably won't be created until RDF starts to become more widely used.
Still, use of RDF is growing. The Linked Open Data project now has, at last count, over 4.2 billion assertions encoded in RDF across a wide variety of different data-annotating projects, such as GeoNames and DBpedia. The British Broadcasting Corporation has tested RDF to augment searches of its huge program guide. Best Buy, and eBay have encoded their commercial listings in RDF. Outside parties have rendered data feeds from Data.Gov into RDF format.
"I see a real benefit in helping companies and the government with eDiscovery in relation to their web content," Daconta said. "In general, the 'wild west' days of content creation are coming to an end as organizations realize that the cost of long-term maintenance of information is too high. To bend that cost curve requires a governed and semantic-based information production process."
Joab Jackson is the senior technology editor for Government Computer News.