NARA launches search and archiving system
- By Henry Kenyon
- Oct 04, 2011
A new electronic records system is making the storage and search for federal documents easier for both archivists and the general public. The technology allows data in a variety of formats, from scanned paper documents to electronic files, to be collected, sorted, stored and then referenced and searched.
The goal of the National Archives and Records Administration’s Electronic Record Archives (ERA) project is to make the archive’s massive trove of materials more accessible and searchable.
With facilities in 17 states and an informative website, NARA is among the most accessible archive service in the world. But prior to the ERA project, there was no single system at NARA for electronic records. The majority of the archive’s records were stored on tape, which was processed and verified on legacy systems, said David Lake, an archivist with the ERA systems engineering team at NARA.
Some of these electronic records databases are available through the Access to Archival Databases system.
NARA takes crowdsourcing approach to tagging historic documents
Despite the recent deployment, the ERA project has had its difficulties. In fact, the Office of Management and Budget has mandated that NARA end ERA’s development phase in September 2012, a year earlier than scheduled.
Lockheed Martin was awarded the $308 million ERA contract in 2005. The Government Accountability Office has found that after six years, the program was behind schedule and running over budget with costs potentially rising to the $1.2 billion to $1.4 billion range.
Lake and Lockheed Martin spokespersons both told Federal News Radio that, despite the reduced development schedule, NARA will focus on deploying ERA to federal agencies with the goal of adding more functionality to ERA around 2014 after all major government organizations are using the system.
Prior to ERA, loading data into the archive’s legacy systems required manually copying records from original media onto tapes for electronic archiving. The old systems also had significant limitations in their ability to manage the growing volumes of electronic data. Another challenge was accessing electronic records on shelved tapes.
“In terms of public access, there was limited ability to conduct a search of multiple NARA sources of information about records. One would have to search each system, such as AAD and the Archival Research catalog, separately,” Lake said.
ERA, which began in 2005 and has been deployed in increments, allows NARA to collect and process many more electronic documents than before. The system provides workflow support for transferring data to NARA and uses tools to scan for viruses and identify the format of each electronic record. Electronic records stored in ERA also have integrity checks consisting of secure hash algorithms to ensure security. Information about important events relating to stored records are also maintained for improved authenticity.
To ensure the long-term preservation of data, the ERA program is including a capability to move electronic records from one format to another, said Lake.
To access data, ERA uses a single search mechanism capable of referencing many different sources of information. Previously, researchers had to conduct several different searches in different legacy systems to conduct a major review of online resources at NARA.
“The new Online Public Access (OPA) piece brings all that information together for search, and gives the researcher results with relevant records displayed front and center. Previously, a user usually had to click through multiple layers to view an electronic record,” said Lake.
The prototype search feature is provided by the Velocity system, developed by Vivisimo, part of the ERA project contracting team led by Lockheed Martin.
This new search system is specifically designed to make the federal government’s permanent records easier to find online, said Bob Carter, Vivisimo’s vice president for federal markets. The OPA lets users search across all collections with a single query and receive a list of results that are highly relevant to their request. If a user wants to dig deeper into one of the query results produced, Velocity provides a list of other relevant cross-matches that may be of interest.
Topic clusters generated by Vivisimo’s clustering technology and federation and indexing engines provide convenient subject navigation of the top results. Users can narrow results by type of archival materials, date, file format, archive location and other properties, said Carter.
Additionally, an image viewer integrated by Lockheed Martin provides fast access and viewing of archival images retrieved by the search tool.
ERA is designed to be consistent with the Open Archive Information System Reference Model, with components to manage the packaging and intake to electronic records, to provide archival storage and data management, and to ultimately provide access to those records that are free of restrictions, Lake said.
NARA receives electronic records from federal agencies, presidential administrations and Congress. Because each type of document is governed by unique laws and regulations, ERA consists of multiple instances to handle their separate needs, he said.
The ERA system uses a service-oriented architecture using commercial systems combined with some custom code to provide different services according to configurable orchestrations, Lake said. The public access component is a separate part of the system designed to manage and provide access to publicly available electronic records and descriptions of all records.
NARA is seeing a significant rise in the volume of electronic records being transferred to the National Archives, Lake said. The archive’s current legacy systems and processes for managing electronic records have a limited ability to scale to the increased demand. ERA is already helping NARA to collect and store more electronic records and before, he said.