Narabot uploads images to Wikimedia Commons
- By Stephanie Kanowitz
- Aug 18, 2014
Since 2011, the National Archives and Records Administration has uploaded more than 100,000 digitized records. To maintain the effort, the agency is working to develop new technology with the help of Wikipedia and the public.
Specifically, volunteers are working with NARA on Narabot, an upload script to port images to Wikimedia Commons, a sister project to Wikipedia and a repository of free media.
“By uploading digital content there, we make it readily available for Wikipedia editors to embed in Wikipedia articles, making them far more visible than they are in our own catalog,” said Dominic McDevitt-Parks, digital content specialist and Wikipedian in Residence at NARA.
The upload script the agency used for the 100,000 files was developed – “in true Wikipedian fashion – by volunteer programmers in the Wikipedia community,” he said. “We are continuing to develop it in order to flesh out features and add capabilities that will make it work for all types of records we hold.”
The files were uploaded from the U.S. National Archives bot, categorized, organized and restored or improved as needed, according to Wikimedia Commons.
Now, “we are having to work on a technical solution because we are seeking to upload bulk quantities of files along with archival metadata in a specific format, and that is not a simple task for which a tool already exists without customization,” McDevitt-Parks said.
NARA stores its Narabot working code on GitHub, where developers can reuse and rework it.
Most of the files uploaded so far are popular images by artist Ansel Adams or war posters, McDevitt-Parks said. However, archivists don’t choose and upload images themselves. They are developing a workflow so that digitized records can flow from NARA’s online catalog to the Commons. The agency has billions of analog textual records that have yet to be archived, so this effort will also help bring them online.
“Aside from the uploads, all of the activities being undertaken on our project’s home page on Wikimedia Commons are volunteer-driven, including the effort by editors of the site to digitally restore images to improve their value (for Wikipedia articles especially), the reporting of potential metadata errors and categorizing images,” he said.
Such everyday Wikipedia editing improves articles related to NARA’s holdings even if it is not explicitly part of its project. NARA is also working on an online catalog application programming interface (API) that will enable a scalable upload of large numbers of holdings to the Commons. “It makes our catalog machine-readable so that a script can structure it in the format appropriate for Wikimedia Commons,” McDevitt-Parks added.
The new technology is part of the 2014-16 Open Government Plan set up by NARA, which is responsible for preserving and documenting government and historical records and providing public access to them. The flagship initiative of the plan, “Innovate to Make Access Happen,” has three objectives, said McDevitt-Parks: to digitize more records, to improve the scalability and searchability of records in the agency’s online catalog and to bring data to the public in ways that is meaningful and useful to them.
The program with Wikipedia started by taking requests for new digitization projects. The focus now, however, is developing a citizen-scanning program that McDevitt-Parks hopes will generate greater Wikipedian-initiated digitization.
“The citizen scanning program would allow members of the public to come into our facilities and do volunteer digitization by following our standards and recording metadata,” he said.
NARA has held scanathons in Washington, D.C., and is establishing the Innovation Hub at the National Archives Building to support this effort.