GSA: Metadata not essential for search
- By Joab Jackson
- Dec 19, 2005
Metadata and other advanced preparation techniques may not be required to make government information available for public search, according to feedback the General Services Administration received from a request for information
it posted earlier this year.
The results show a majority of search experts discounting the necessity of metadata. Approximately 56 percent of respondents saw no need for preparing material for search engines, at least for general material, while another 44 percent saw the necessity for preparing documents before they are posted for public consumption, either by adding tags identifying their context, creating controlled vocabularies for classification or by manually cataloging items.
In September, the GSA's Office of Governmentwide Policy, along with the Office of Management and Budget, issued an RFI asking about 'efficient and effective information retrieval and sharing.' The Efficient and Effective Information Retrieval and Sharing RFI queried whether it is necessary to add metadata tags on government documents, or if commercial search engine software is sufficiently intuitive to provide coherent results without external contextual data.
Of the 47 responses to the survey, about 56 percent 'favored the use of search technologies over other solutions requiring human investment in the advance preparation of content,' according to an analysis of the results that will be posted today on the CIO Council Web site
'Search technology has progressed far enough so that manual categorization and metadata tagging of textural documents is no longer necessary and any perceived gain in accessibility does not justify the cost of categorization,' commented the Energy Department's Office of Scientific and Technical Information, which hosts the Science.gov search service.
Not everyone agreed with this assessment, however. Approximately 14 percent of recipients stressed the need for performing 'significant advance preparation of content,' while another 30 percent favored 'some advance preparation.'
'For either information or records to be trustworthy, they must have additional information either embedded within the content itself or information associated with the content that can provide some degree of assurance of authenticity, reliability and integrity, now and in the future,' the National Archives responded.
The report conceded that the noninterventionist approach would be most appropriate to Web sites, e-mail, and other forms of unstructured and semistructured content. Other materials could benefit from advanced classification techniques. Highly technical subject matter could be better located with the aid of manually generated controlled vocabularies, as would databases material, classified information and multimedia assets such as video and audio files.
The survey has also found that only 13 percent of vendors or government agencies support the Government Information Locator System. GILS is based on the ISO 23950 search interoperability standard. Forty-two of the responses came from industry. Three government agencies and two academics responded as well.
'One can conclude from this study and other available literature including The Search [a book about Google written by John Battelle] that, with respect to disseminating [f]ederal information to the public at large, publishing directly to the Internet all agency information intended for public use and thereby exposing it to freely available or other search functions is the most cost-beneficial way to enable the efficient and effective retrieval and sharing of government information,' the report concluded.
Joab Jackson is the senior technology editor for Government Computer News.