Archiving Obama administration websites for digital posterity
- By Amanda Ziadeh
- Sep 06, 2016
With the end of the Obama presidency nearing, web archivists are working to document federal government websites and social media content before they transition to the new administration on Jan. 20, 2017.
The End of Term Web Archive collaboration began in the summer of 2008 to document the state of legislative, executive and judicial branch websites at the end of the Bush administration.
The team gathers all federal .gov sites, federal content on .mil and .com domains and social media content. The team is also asking the public to nominate .gov websites and content with the End of Term nomination tool.
Referred to as the “web harvest,” the archive will document web content and make that information available for public access and long-term preservation, according to a post by Abbie Grotke, lead IT specialist for the Library of Congress Web Archiving Team.
The Library of Congress worked with partner organizations to crawl the websites, develop a front- interface to the project and support data transfers. According to the End of Term Web Archive, the scale of the project drove development of new web harvesting and access technologies including:
- The Heritrix web crawler, which was developed by the Internet Archive with support from the International Internet Preservation Consortium.
- Bagit Library, an open source java library, solved the challenges presented by transferring and aggregating the End of Term content.
- The Internet Archive reconfigured existing in-house tools to automatically generate metadata records and thumbnail images for the over 6,000 websites in the archive.
- A modified version of the California Digital Library’s open source eXtensible Text Framework provided a gateway to web-archived materials.
- The Nomination Tool facilitated collaborative collection development for web archiving.
This year’s team consists of the Library of Congress, the California Digital Library, the University of North Texas Libraries, the Internet Archive, George Washington University Libraries, Stanford University Libraries and the U.S. Government Publishing Office.
The data from the 2008 and 2012 End of Term government domain preservation projects can be found at the End of Term web archive.
Amanda Ziadeh is a former reporter/producer for GCN.