Finding data, wherever it's stowed, is the objective
- By Jason Miller
- Sep 10, 2004
'With these standards, all servers would understand the searchers' requests the same way, and the results they receive would be more complete,'
' USGS' Eliot Christian
J. Adam Fenster
The Library of Congress has dozens of catalogs of public records that are searchable only by the agency's in-house systems.
In fact, many federal databases are invisible to typical Web search engines and crawlers because these technologies do not gain access to specific parts of many databases.
But if Eliot Christian has his way, the Office of Management and Budget will require agencies to buy search engines that use interoperability standards based on ISO 23950'the kind of search tools used by libraries for passing queries and results between complex information systems'to probe for government data and what is often referred to as 'the deep Web.'Seek, but shall you find?
Christian, data and information systems manager at the Geological Survey, said Web sites with search engines using the standards would seek out information from many new sources, including state and local government databases.
'Without interoperability, searchers get partial, and even questionable, results because the searches do not feed off one another,' said Christian, who is also chairman of the Categorization of Government Information Working Group. 'Web crawlers can only interact with pages set for display, not dynamic content. With these standards, all servers would understand the searchers' requests the same way, and the results they receive would be more complete.'
He said since pages don't have the same content on different browsers or on personal digital assistants, those pages are considered dynamic.
During the Internet boom, agencies developed search indexes without standards and stopped relying on libraries for information dissemination, Christian said. This limited how and from where Web search engines gathered results.
Christian's working group will submit its recommendations to the executive steering committee of the Interagency Committee on Government Information, which by Dec. 17 will send suggestions to OMB and the National Archives and Records Administration. OMB and NARA have one year to craft and implement a policy based on the recommendations.
The interoperability standard would create for agencies and search engine vendors what libraries have been requiring for decades. Libraries have listed information in a standard way, including title, author and subject, making it easier to search and find information stowed in library and librarylike databases, Christian said.
Although most agencies have not paid much attention to the library standard, some communities, such as users of geospatial data, have adopted ISO 23950.
The Geospatial One-Stop e-government project uses the interoperability protocol to let users search federal, state and local government and private databases of geospatial information.
Additionally, the Library of Congress is using the standards for many noncatalog databases.
Larry Dixson, a senior network specialist for the library, said the standard provides searchers with a consistent view of results.
'The library community supports many different types of syntax, but ISO 23950 doesn't care about syntax, it creates a description of the search and maps it to a local search engine and comes up with results,' he said. 'Many institutions worldwide search us, and we search libraries across the world as well.'
Dixson said one benefit of the protocol is that librarians do not have to learn the different rules proprietary search engines require. The standard maps the user's request to the database's requirements.Library language
The Library of Congress also is managing the development of the next-generation ISO 23950, called the Search and Retrieve Web Service protocol.
Loosely based on the original 23950 standard, SRW uses Extensible Markup Language schemas, Simple Object Access Protocol and other Web services technologies to perform searches, said Ray Denenberg, a senior network engineer for the library.
Along with work on the follow-on standard, the library also is spearheading development of the Search and Retrieve Web URL Service. SRU is similar to SRW, but relies on hypertext protocol requests instead of XML schemas.
'With SRW, both the client and the server understand the definition of a schema, so you can specify access points and retrieval elements that are understood by different servers,' Denenberg said.
Dixson said ISO 23950 can be complex for implementers to use in a Web services environment, but SRW and SRU simplify the process.
'It amazes me that technology companies [who make search engines] have taken this long to see the value in this,' Christian said.
'We haven't yet got to the Internet for searching. We have the Internet for graphic user interface, but not the application of how to find the stuff. I've talked to so many people who after having seen this are excited about it and wonder why it hasn't caught on.'