Permanent data tags would keep data visible
- By Joab Jackson
- Sep 10, 2004
The beauty of tagging documents with permanent identifiers is that there won't be 'a bunch of versions floating around on the Web, and people not knowing which version is authoritative,' says James Erwin, director of information science and technology for DTIC.
Although the government posts millions of important documents on Web sites, there's no guarantee that Web crawlers will hit on any one document's URLs or find pertinent data buried within.
The Categorization of Government Information Working Group of the Interagency Committee on Government Information is drawing up guidelines to apply permanent identification tags to all public government documents, regardless of where they reside.
A working group team led by James Erwin, director of information science and technology for the Defense Technical Information Center, will submit recommendations later this year to the Office of Management and Budget requiring that agencies use the tags.
The group favors what are called searchable identifiers as permanent addresses.
These identifiers would 'uniquely identify an information object and support persistent access to that object across time and space,' said Erwin, speaking at a recent conference held by the Federal Library and Information Center Committee.
Searchable identifiers would also make it possible to find metadata about a piece of information, such as the information's date of creation.
The identifiers also would distinguish the information from its storage systems, Erwin said.Tag, you're it
'Persistent identification lets you put content and metadata out on the Web so other people can reference your content, either in their visual documents or in their library catalogs,' Erwin said.
So, if an agency transferred old documents from its site to the National Archives and Records Administration, the addresses would remain the same, even though they would have a new host agency.
DTIC uses a searchable-identifier system to categorize 120,000 technical reports. Other Defense Department librarians have incorporated DTIC's handle system, along with attached descriptive data, into their own electronic catalogs.
'Customers can search on a site that has a lot of non-DTIC information, but when they click on a handle, they go to DTIC for the authoritative information,' Erwin said.
DOD's Advanced Distributed Learning initiative also uses the searchable-identifier approach. The program's Shareable Content Object Reference Model standard breaks educational materials into smaller instructional objects that can be strung together to make new lessons, Erwin said.
The DOD learning initiative has mandated that all new educational units include searchable identifiers, which DTIC will index to reveal how much educational material is available.
'The handle system will provide access to those individual objects,' Erwin said.
Persistent identifiers will cut down on the number of available versions of a document, he said. Instead of each interested party keeping a version, which could be obsolete, they store a pointer to the document's permanent location.
'You won't have a bunch of versions floating around on the Web, and people not knowing which version is authoritative,' Erwin said.
The handle system was developed by some of the same people who drafted protocols used on the Internet today. David Ely, along with Robert Kahn and Vinton Cerf'who cowrote the original TCP/IP'have been working on the handle system since the early 1980s.
Kahn, now CEO of the Corporation for National Research Initiatives of Reston, Va., said his group operates a global handle system registry, which assigns each organization a unique prefix. The organization itself assigns suffixes to individual documents.
A typical address, Kahn said, might look something like 1234.hq/mtg-notes, where 1234.hq designates an organization and mtg-notes names a specific document. A searcher would have to type only 1234.hq/mtg-notes.
Although current browsers do not support persistent identifiers, Kahn said he expects they eventually will. For the time being, browser plug-ins would work.
Kahn said the handle system 'was designed for worldwide scaling and to identify an arbitrarily large set of objects or other network resources.'
Erwin's group is still considering how to implement a governmentwide information identification system. They want it to be distributed, with each agency managing its own name space. The addresses should be easy to read by users, he said. And any system that OMB proposes needs to last forever, Erwin added.
'You want a scheme that will transcend your technological advances,' he said. 'So you want something that will go beyond those boundaries.' n
Joab Jackson is the senior technology editor for Government Computer News.