New FirstGov engine will search agency databases

The General Services Administration is touting the new search engine for its FirstGov.gov portal as the technology that will make it the Web site it was intended to be'a single place to look for virtually any information regarding the government.

GSA last month flipped the switch to run FirstGov on an engine developed by Fast Search & Transfer of Oslo, Norway, that gives the site capabilities such as searching Adobe Portable Document Format or Microsoft Word documents. Such capabilities are already available at commercial portals such as Google.com and Yahoo.com.

'GSA has made significant improvements,' said Chris Sherman, president of Searchwise, a search engine analysis company in Boulder, Colo. 'It is definitely faster, and the relevancy has improved. GSA has made significant progress toward making this the end-all of search engines that they had always hoped to have.'

GSA in March awarded a five-year, $10.5 million contract to AT&T Corp. to replace the original search engine, which was donated by Federal Search Foundation Inc. of Washington. The switchover was scheduled to take place March 31, but equipment problems delayed the transfer six weeks.

The postponement cost the agency $35,000 for continued search engine services from Fed-Search. Fed-Search, which signed a memorandum of understanding with GSA to provide those services until August 2001, officially will disband, president and chief executive officer David Binetti said.

Many improvements

The new search engine should improve the site in many ways, said Deborah Diaz, deputy associate administrator in GSA's Office of FirstGov. It will filter out duplicate files, spider 180 million pages on a weekly basis, and access documents in PDF, Microsoft Word and Excel as well as HTML.

After GSA ensures these functions are working well, Diaz said, additional capabilities will be added. They will include advanced search features such as rooting or stemming, which tells the engine to search not only for a word such as big, but also for variants such as bigger and biggest.

The site will use word filters and will search catalogs of pictures and videos, and specific domains or IP addresses.
By the middle of July, FirstGov will let users search agency databases on a limited scale, Diaz said.

Extensible Markup Language will let agencies push through only the data they want to be searchable. Diaz said this is the most secure and easiest way of making databases accessible.
Another way to make databases searchable is to link the portal directly to the parts of databases that an agency wants in the public domain, Diaz said. This is more complicated because security protocols still remain to be worked out, she said.

Diaz said GSA will work with the 35 agencies using FirstGov.gov as their search engine to get their databases online first and then open up to the Office of Management and Budget's 24 e-government initiatives.

She would not, however, say which agency databases would come online initially.
Sherman said the ability to search databases is a 'nice next step' because it will broaden the awareness of what is available through the site.

GSA also is putting together the requirements for a governmentwide content management system. The request for proposals will be issued later this summer, Diaz said.

'We are developing templates and standards for an automated content management system,' she said. 'This will be used by FirstGov but open to any agency that wants to license it. Everyone is struggling with filtering their sites and keeping them relevant.'

Sherman said the scalability of the new search engine lets GSA make such changes without affecting the speed of the engine.

'I imagine Fast will tune the search engine to get better relevance by analyzing user logs to see how well the engine is working,' Sherman said. 'The overall performance will improve over time.'

Reader Comments

Please post your comments here. Comments are moderated, so they may not appear immediately after submitting. We will not post comments that we consider abusive or off-topic.

Please type the letters/numbers you see above