GPO seeks tools to harvest overlooked documents on the Web

GPO seeks tools to harvest overlooked documents on the Web

The Government Printing Office is looking for IT products and services that will help it find, harvest and review documents from federal Web sites.

GPO wants to use Web crawler and data-mining technologies to retrieve publications from Web sites to identify those that agencies have not catalogued for its Federal Depository Library Program and the Cataloging and Indexing Program. The request for proposals appeared last week.


Federal agencies are increasingly publishing information only in electronic formats and frequently fail to inform GPO of new publications that should be included in the depository library and cataloging programs.

Web crawler and data-mining technologies will help GPO identify and collect such documents. GPO plans to launch a pilot with the Environmental Protection Agency to crawl the main EPA and sub-agency Web sites.

GPO expects the project to be completed six months from the award date and to be valued at up to $75,000 under a firm-fixed-price contract. Responses are due Jan. 31.

The federal library depository provides government information to 1,250 libraries across the nation. The catalog program is made up of bibliographic records of federal information published by the executive, judicial and legislative branches. GPO prepares machine-readable records for the Online Computer Library Center bibliographic network.

About the Author

Mary Mosquera is a reporter for Federal Computer Week.

inside gcn

  • health data

    Improving the VA patient journey with data transparency

Reader Comments

Please post your comments here. Comments are moderated, so they may not appear immediately after submitting. We will not post comments that we consider abusive or off-topic.

Please type the letters/numbers you see above

More from 1105 Public Sector Media Group