Gathering web artifacts from Obama administration

Archiving Obama administration websites for digital posterity

With the end of the Obama presidency nearing, web archivists are working to document federal government websites and social media content before they transition to the new administration on Jan. 20, 2017.

The End of Term Web Archive collaboration began in the summer of 2008 to document the state of legislative, executive and judicial branch websites at the end of the Bush administration.

The team gathers all federal .gov sites, federal content on .mil and .com domains and social media content. The team is also asking the public to nominate .gov websites and content with the End of Term nomination tool.

Referred to as the “web harvest,” the archive will document web content and make that information available for public access and long-term preservation, according to a post by Abbie Grotke, lead IT specialist for the Library of Congress Web Archiving Team.

The Library of Congress worked with partner organizations to crawl the websites, develop a front- interface to the project and support data transfers. According to the End of Term Web Archive, the scale of the project drove development of new web harvesting and access technologies including:

  • The Heritrix web crawler, which was developed by the Internet Archive with support from the International Internet Preservation Consortium.
  • Bagit Library, an open source java library, solved the challenges presented by transferring and aggregating the End of Term content.
  • The Internet Archive reconfigured existing in-house tools to automatically generate metadata records and thumbnail images for the over 6,000 websites in the archive.
  • A modified version of the California Digital Library’s open source eXtensible Text Framework  provided a gateway to web-archived materials.
  • The Nomination Tool facilitated collaborative collection development for web archiving.

This year’s team consists of the Library of Congress, the California Digital Library, the University of North Texas Libraries, the Internet Archive, George Washington University Libraries, Stanford University Libraries and the U.S. Government Publishing Office.

The data from the 2008 and 2012 End of Term government domain preservation projects can be found at the End of Term web archive.

About the Author

Amanda Ziadeh is a Reporter/Producer for GCN.

Prior to joining 1105 Media, Ziadeh was a contributing journalist for USA Today Travel's Experience Food and Wine site. She's also held a communications assistant position with the University of Maryland Office of the Comptroller, and has reported for the American Journalism Review, Capitol File Magazine and DC Magazine.

Ziadeh is a graduate of the University of Maryland where her emphasis was multimedia journalism and French studies.

Click here for previous articles by Ms. Ziadeh or connect with her on Twitter: @aziadeh610.


inside gcn

  • Robotic process automation  (Alexander Supertramp/Shutterstock.com)

    Robotic process automation delivers better results for citizens

Reader Comments

Please post your comments here. Comments are moderated, so they may not appear immediately after submitting. We will not post comments that we consider abusive or off-topic.

Please type the letters/numbers you see above

More from 1105 Public Sector Media Group