Frank McDonough

COMMENTARY

NARA's digital archive failing on promise to preserve and protect

Guest columnist Frank A. McDonough cites three examples of archive projects that are flourishing

The National Archives and Records Administration’s mission is to look backward and preserve the past records of the federal government. As a result, it operates under the radar because only former government employees and historians care much about the past.

For decades, the agency has promised to bring forward a plan and program to preserve the electronic records of the past. Periodically, Congress will waken from its slumber on archival issues and ask for a status report. Periodically, NARA officials will drag out previous responses and promise that success is on the way, although it never is despite spending lots of money on the initiative.

In addition to preserving the Constitution, Declaration of Independence and Bill of Rights, NARA is responsible by law for preserving the billions of pages of e-mail messages, memos, electronic documents and files created by agencies. NARA’s solution is to set up an electronic archive, which would allow searchers to find and access records online regardless of which computer or software created the records.

Lockheed Martin, a weapons manufacturer, won a contract in 2005 worth $317 million to create a modern archive for electronic records. Six years later, in early 2011, the Government Accountability Office reported that the Electronic Records Archives program was behind schedule and could eventually cost $1.2 billion to $1.4 billion. Furthermore, the agency’s inspector general reported that when the system is implemented, users will only be able to search based on a document's subject line, not content.

GAO delivered its usual obscure impression of the reasons why the system is failing, citing weak oversight and planning by NARA. No one at NARA paid a penalty for the troubled program or the 400 percent cost overrun, nor did Lockheed Martin. In responding to the GAO report, the archivist of the United States agreed with the findings but disagreed with the future cost estimates.

While NARA plods ahead with Lockheed Martin, several related projects have overtaken what the agency and its contractor have been trying to do for six years. Here are three of them.

  • The U.S. Patent and Trademark Office, recognizing that it did not have the technology to make patent information available to the public, decided to turn over 7 million patents to Google to put online and thereby meet two goals: give the public access and allow the agency to meet the president’s mandate for more transparency in government.
  • Then there is the Google Books Library Project, an enhanced card catalog of the world's books. For this project, Google is working with publishers and libraries to create a comprehensive, searchable, virtual card catalog of all books in all languages to help users discover new and out-of-print books and help publishers discover new readers.
  • Moreover, there is IBM’s Watson project, in which a team of about 20 core researchers fed libraries containing books, encyclopedias, dictionaries, thesauruses, databases, taxonomies, and even movie scripts, novels and plays into a supercomputer. Then they added more than 100 multiple expert analyzers running concurrently to appraise millions, perhaps billions, of possibilities with the goal of responding to a question with the correct answer in two seconds or less.

One wonders why NARA continues doing business with a weapons manufacturer when companies with the resources and skills to tackle similar challenges have demonstrated exceptional progress in taming the world’s inventory of information.

Reader Comments

Thu, Feb 24, 2011 Steve Ellicott City

The ERA project expects the size of the digital archive to grow to over 168PBs in the next few years. Today no commercially available technology exists to manage this volume of data, so Lockheed is creating a custom application. When this contract was in the formation stages, NARA asserted very aggressively that they would only accept "text" documents, but Congress mandated that they accept everything. Agencies use NARA as a dumping ground for their electronic documents and NARA/Lockheed need to adapt to this ever changing landscape. There are so many issues with creating an archive of this magnitude, document types, versions of software, versions of media, and yeah, did we mention they need to accept digitally signed documents - no word yet on how they'll determine validity of those documents. On the NARA side, it appears that they have given up on controlling the process and are allowing Lockheed to dictate everything on how it will be built and what it will do. This is an incredibly complex task, Lockheed is probably not the most competent to perform it, but they own it for the foreseeable future.

Thu, Feb 24, 2011

How does a product segment singularly define a $40B company? Lockheed Martin's IT group is the heritage IBM Federal Systems Group (after being bought 1st by Loral then by Lockheed), the very company you place upon a pedestal! If ownership solely defines an organization's essence, then NBC has become a merely a cable company, AOL was just a magazine publisher, and Yamaha's motorcycles, watercraft, and golf clubs are merely musical instruments (http://en.wikipedia.org/wiki/Yamaha#History).

Thu, Feb 24, 2011

NARA's challenge (regarless of medium) includes limiting access to record to those with the right and the authority for such access. As such, in the IT realm it inherits concurrently every access control model of EVERY records provider - none of your examples have that constraint. The Watson's three-year price tag has been estimated at roughly $1 billion to $2 billion (per http://money.cnn.com). Google's "enterprise value" is $167B, yet you think search is easy and cheap, because they give it away free to you. Your logic suggests that maybe NARA could save money by just leaking all the e-records it receives to wikileaks.
And finally, if you read and understood the ERA system requirements (within http://www.archives.gov/era), you would know that NARA never expected google-like search on every record for $317M, they specified topic/subject search to narrow down to sets of records as the primary access model.

Thu, Feb 24, 2011

GPO and the FDsys program are an example of a successful government led archive project. You can search the full text of many key government documents (Federal Register, US Code, etc) updated on a daily basis.

Thu, Feb 24, 2011

RE:..."because only former government employees and historians care much about the past"...? Inquiring minds other than historians and former government employees need to know - what was on those documents that Sandy 'the burgler' Berger stole from the Archives. Also need to find out about the travels of a certain 'student' during the Cold War/Vietnam to Moscow to a 'Peace Conference' in January 1970 then made a 40 day train trip behind the 'Iron Curtain' who later became CINC. Also we would like to know more about the current CINC and his radical associates past. “You have to know the past to understand the present.” Dr. Carl Sagan

Please post your comments here. Comments are moderated, so they may not appear immediately after submitting. We will not post comments that we consider abusive or off-topic.

Please type the letters/numbers you see above