USDA goes deep with Web searches
System makes 141 years of digitized documents available
- By Rutrell Yasin
- Oct 07, 2007
The Agriculture Department is making digitized, archived publications more accessible to the public through a technology partnership between ZyLAB, a maker of information access software, and Google.
Officials at USDA's National Agricultural Library (NAL) have hired ZyLAB to add functionality that would help users find and access public-archives information in secure ZyIMAGE Extensible Markup Language repositories via the Google Internet search engine.
Last year, NAL launched its Digital Repository (NALDR) using ZyLAB's ZyIMAGE system to provide access to the full text of selected USDA publications (naldr.nal.usda.gov). An Internet search engine such as Google will offer additional access to public archives stored in ZyIMAGE, USDA officials said.
'NALDR contains a wide variety of publications that have been digitized by NAL dating back to 1864,' said Carol Ditzler, head of USDA's collection services branch. 'This is all public information, and we want to ensure that it is easily found by the public but stored in a secure manner.'
With the growing popularity of search engines such as Google, it made sense to incorporate that functionality into Zy-IMAGE, she said.
Documents ' paper, electronic, e-mail ' can be stored in their original format in the ZyLAB XML repository, and the company's search capabilities allow users to navigate quickly from page hit to page hit, said Johannes Scholtes, president at ZyLAB North America.
'There's no question that one of the most significant problems for Web search engines for 10 years has been access to what sometimes is called the deep Web or the invisible Web,' said Whit Andrews, a research vice president at Gartner.
'It's difficult to find information that resides within a dynamic repository or database, or which resides on the other side of a difficult-to-penetrate application such as a content management application,' Andrews said. 'Unless there is a URL that refers to a document, typically most search engines can't find it.'
The ZyLAB/Google partnership will make it easier for search
engines to find these hidden documents, Andrews said.
The World Wide Web Consortium developed a special mechanism called Sitemap to overcome the challenge of information stored in repositories such as the secure XML ZyLAB solution, company officials said.
The Sitemap protocol lets webmasters inform search engines about URLs on their sites that are available for crawling.
The Internet crawlers can index the data in the ZyIMAGE repositories by launching a URL for each document.
The text of a document is then returned to the crawler and added to the Internet search engine index. A Google user can point to the URL that shows the document, even if it is an image-based scanned document, ZyLAB officials said.
'Will we see more of this? I would sincerely hope so,' Andrews said.
Ditzler said NAL wants ZyLAB to extend the functionality to Yahoo and MSN search engines.
'Keep in mind that it has taken a long time for this to take place, Andrews said. 'But, absolutely, the ability to make information locatable to the public is a central part of the mandate not only for significant government agencies, but for commercial organizations as well.'
Rutrell Yasin is is a freelance technology writer for GCN.