Gathering web artifacts from Obama administration

Archiving Obama administration websites for digital posterity

With the end of the Obama presidency nearing, web archivists are working to document federal government websites and social media content before they transition to the new administration on Jan. 20, 2017.

The End of Term Web Archive collaboration began in the summer of 2008 to document the state of legislative, executive and judicial branch websites at the end of the Bush administration.

The team gathers all federal .gov sites, federal content on .mil and .com domains and social media content. The team is also asking the public to nominate .gov websites and content with the End of Term nomination tool.

Referred to as the “web harvest,” the archive will document web content and make that information available for public access and long-term preservation, according to a post by Abbie Grotke, lead IT specialist for the Library of Congress Web Archiving Team.

The Library of Congress worked with partner organizations to crawl the websites, develop a front- interface to the project and support data transfers. According to the End of Term Web Archive, the scale of the project drove development of new web harvesting and access technologies including:

  • The Heritrix web crawler, which was developed by the Internet Archive with support from the International Internet Preservation Consortium.
  • Bagit Library, an open source java library, solved the challenges presented by transferring and aggregating the End of Term content.
  • The Internet Archive reconfigured existing in-house tools to automatically generate metadata records and thumbnail images for the over 6,000 websites in the archive.
  • A modified version of the California Digital Library’s open source eXtensible Text Framework  provided a gateway to web-archived materials.
  • The Nomination Tool facilitated collaborative collection development for web archiving.

This year’s team consists of the Library of Congress, the California Digital Library, the University of North Texas Libraries, the Internet Archive, George Washington University Libraries, Stanford University Libraries and the U.S. Government Publishing Office.

The data from the 2008 and 2012 End of Term government domain preservation projects can be found at the End of Term web archive.

About the Author

Amanda Ziadeh is a former reporter/producer for GCN.


  • Records management: Look beyond the NARA mandates

    Pandemic tests electronic records management

    Between the rush enable more virtual collaboration, stalled digitization of archived records and managing records that reside in datasets, records management executives are sorting through new challenges.

  • boy learning at home (Travelpixs/

    Tucson’s community wireless bridges the digital divide

    The city built cell sites at government-owned facilities such as fire departments and libraries that were already connected to Tucson’s existing fiber backbone.

Stay Connected