Scratching the surface of the Obama administration’s social media data
- By Amanda Ziadeh
- Jan 19, 2017
In early January, the White House announced archival projects from a number of organizations, groups and platforms committed to finding new ways of preserving the Obama administration’s digital history. The goal was to make the content more useful and available to the public.
One of the participating organizations is the non-profit online library Internet Archive, which has been conducting web crawls as part of its End of Term Web Archive to preserve U.S. government websites at the end of presidential administrations. The White House asked the organization to help make the administration's social media data publically accessible, according to Jefferson Bailey, director of web archiving for Internet Archive.
Bailey said White House’s digital strategy team was also interested in an event that would encourage the creative reuse and scholarly analyses of the data. In response, the Internet Archive held the White House Social Media and Gov Data Hackathon on Jan. 7.
The White House provided the URLs of the Whitehouse.gov Drupal site as well as Twitter, Vine and Facebook datasets from 2009 to the present. (Data from YouTube had not been delivered in time for the event.) The 20-odd hackathon participants also had access to web archive data from current White House websites and press briefings, the End of Term Web Archives from the 2008, 2012 and 2016 projects, 2016 election social media data and text transcripts of President-elect Donald Trump’s candidate speeches.
According to Bailey, participants worked mostly with Twitter and Whitehouse.gov data, extracting text from the homepage, using press briefings or video content and doing data mining projects. They analyzed how often different White House social media accounts were active, and found the most mentioned names and places in press briefings from the past eight years.
Yet time was limited. “It’s kind of hard to work with the datasets because we didn’t have a lot of time to prep it to save people from having to do some of that work,” Bailey said. Though no tools were built, some participants planned to create exploratory layers on top of the data after they had mined it and cleaned it up.
The Internet Archive has made all this data public and downloadable at Archive.org, and it plans to hold more hackathons as additional federal social media data becomes available. Considering most data files were JSON or CSV, receiving and storing the data was not challenging, as Internet Archive collects tens of terabytes a day. Bailey said the Facebook dataset was the largest at about 5 gigabytes, but he anticipates the files from YouTube to be much larger.
The organization will continue to archive federal agency social media content and compile the high-volume End of Term Archives. It has already launched its Trump Archive of speeches, interviews and debates, but will have to wait and see what White House social media data the next administration makes available, Bailey said.
In a recent blog post, the White House explained how the next administration can continue to use the social digital channels it has created. The National Archives and Records Administration will maintain the Obama Administration archived content posted to these social media accounts on the platforms they were created on, but with slightly altered account handles.
Amanda Ziadeh is a former reporter/producer for GCN.