Science.gov 4.0 delves deep into the Web
- By Trudy Walsh
- Feb 16, 2007
The latest version of Science.gov, the search portal that trawls the Web for scientific information in 30 federal scientific databases and more than 1,800 Web sites, features a relevancy ranking architecture that can retrieve the full text of documents.
Launched today, Version 4.0 uses DeepRank, a relevancy ranking algorithm that returns more targeted results than previous versions.
DeepRank uses information gathered from the full-text document to perform relevancy ranking. Earlier versions of Science.gov relied on MetaRank, which ranked queries based on metadata, bibliographic information such as title, author, date or abstract, and QuickRank, which relied on the document's title and short snippets of information.
DeepRank actually downloads and indexes documents, said Walter Warnick, director of the Energy Department's Office of Scientific and Technical Information. Commercial search engines such as Google crawl the Web by attempting 'to visit each Web page they can find and make an index of that page. Science.gov does federated searching,' searching pre-identified databases. 'When the hits come back, they have to be sorted,' Warnick said. 'Otherwise patrons will be overwhelmed with hundreds of thousands of hits.'
All three relevancy ranking algorithms'DeepRank, MetaRank and QuickRank'were developed by Deep Web Technologies of Santa Fe, N.M.
Science.gov is free and requires no registration. The portal is hosted by the Energy Department's Office of Scientific and Technical Information. Members of the Science.gov Alliance include the Agriculture, Commerce, Defense, Education, Energy, Health and Human Services and Interior departments, and the Environmental Protection Agency, the Government Printing Office, NASA, and the National Science Foundation. Some support is also provided by the National Archives and Records Adminstration.
Trudy Walsh is a senior writer for GCN.