Formatting the future

 

Connecting state and local government leaders

Five years ago, U.S. Courts started putting in place an electronic docket-filing system.

Five years ago, U.S. Courts started putting in place an electronic docket-filing system. It would contain records to be kept'and accessed'for decades, if not indefinitely, and that forced project managers to make some tough decisions on electronic formats.The courts decided on the Adobe Portable Document Format, for two reasons, according to John Brinkema, a senior research computer scientist at the Administrative Office of the U.S. Courts.First, PDFs preserve the look and feel of the original paper document, an important quality because legal documents frequently make references to other pages within that document or to pages in other documents'even if those records are in electronic form only.Second, and more important, Adobe Systems Inc. has published the specifications for reading a PDF document. Should the company ever go out of business, the records could be accessed using other software written to Adobe's specifications.These days, U.S. Courts has more than 2 billion electronic records in PDF format, spread across almost 200 locations around the country. The federal judiciary is ahead of many agencies in establishing an electronic-records management process.'Rather than waiting around for the rest of the government to do things, we just did it,' Brinkema said.When it comes to saving electronic information for the ages, the challenge of choosing the appropriate format is formidable.A format specifies how to encode a set of data so that it can be accessed by people or other machines. Every software vendor uses formats to encapsulate the data that is generated in its programs.But given the volatility and constant changes in the IT industry, the formats an agency chooses today might not be around in 10 or 100 years. Horror stories abound of suddenly vital documents locked away on some early, now unreadable, version of WordStar.'It is hard to preserve digital information without a clear guide to how the information is encoded within a format,' said William LeFurgy, a project manager for the Library of Congress' Digital Initiatives program.Compounding this problem is the fact that hardware used to read these formats may also disappear. Who still has equipment to read 5'-inch floppy disks or punch cards?LeFurgy said the library's Digital Initiatives program considers a number of factors when considering whether to hold on to a format for long-term use.One is the proprietary nature of the format. 'This is a major problem for many commercial software products'the specification is hidden as a business secret, which results in a format whose information content cannot be decoded without using the original proprietary software,' LeFurgy said. The library will not rule out proprietary formats altogether. But any used must have publicly disclosed specifications, such as Adobe's.Still, even supposedly open file formats can contain traps. The PDF format, for instance, can be extended to include JavaScript, audio files, images, special fonts and even videos'all of which may or may not be encoded in an open format. For this reason, in 2002 the U.S. Courts started work on a subset of the PDF specification, called PDF archiving, or PDF/A.The idea, according to Melonie Warfel, director of worldwide standards for Adobe, is to have a subset of the PDF specifications that is restricted only to completely open standards.Aside from functionality, agencies should also consider a format's popularity, LeFurgy said. There's a good chance that documents written in Microsoft Word will be accessible for quite some time, simply because it is so widely used today, and readers will be in demand for some time to come.One piece of good news for agencies is that the National Archives and Records Administration has started specifying which formats other agencies should use to submit their records to NARA.Yet another factor to consider is the complexity of a format's encoding process, LeFurgy said. Compression schemes used to reduce the size of a record, or encryption schemes to secure a document, could be particularly problematic for future archivists, who might not have access to those algorithms.Though it requires greater amounts of storage space, keeping records uncompressed is also a smart move in preserving the fidelity of images and audio files, said Charles Fenimore, Motion Image Quality project leader for the Digital Media Group of the National Institute of Standards and Technology.Fenimore's research team is finding that converting imagery or sound from one compressed format to another always results in additional loss of quality, which can be problematic as older data gets moved to newer formats, he said.In addition to file formats, agencies must also worry about the formats of the physical media itself'the tapes, disk drives and optical disks that contain records. These, too, are vulnerable to rapid obsolescence.While tape is considered the electronic medium that lasts the longest, it is not immune to failure. Kenneth Thibodeau, director of NARA's Electronic Records Archive program, has heard of rare cases where an entire library of aging tapes suddenly started failing en masse.'The chemical processes of the manufacturing processes are such that a batch of tapes could self-destruct in a matter of months,' Thibodeau said.NARA keeps its permanent records on two copies of tapes, each in a different location. To guard against failures, the agency each year tests a sample of tapes to assure they are still stable.'Tape is a devil, but it is a devil we know. We know the vulnerabilities, and most of them can be managed,' Thibodeau said.Agencies are increasingly using optical disks for archiving, though the jury is out on how long the media can last, given that optical disks have only been in use for the past 25 years or so. It's an area that Fred Byers, an IT specialist at NIST, is investigating.Byers said he thinks that disks could last for over a century, if kept in environmentally friendly conditions. What concerns him, though, is the fluctuating rates of quality control in the manufacturing processes, which lead to variances in how long disks can last. He has started working with the Optical Storage Technology Association to develop an industry archiving standard.If manufacturers adhere to quality control specifications that the OSTA working group is developing, they will be able to put a seal of approval on their products, indicating that the disks should last for a set number of years.In the end, though, agencies must assume that whatever media they use will be obsolete sooner or later. So they should develop a long-term strategy of periodically updating their files to whatever media is current, officials said. In other words: think of archiving not as a process of putting records on storage media, but rather as a process to preserve records independent of whatever physical media is used. This is the strategy both NARA and the Library of Congress are taking.'It is a given that we will be moving digital content to and from many kinds of media as part of our ongoing management and preservation function,' LeFurgy said.NARA has had a storage migration plan in place since 1971, Thibodeau said. The agency will pick a storage media that it can trust to last a specific stretch of time, and develop a process of moving the records off that media when that time period ends.Thibodeau likens the archiving process to a funnel, one that takes in many formats and converts them all into a standard output format.'The first thing we do when a record comes in is that we copy it from whatever media it is onto the standard media, and to a standard physical format,' Thibodeau said.At the end of that lifecycle, the agency can easily automate the transfer of those records to the new media.'It becomes a production process,' Thibodeau said.The Archival Preservation System now handles those duties. But it will be replaced by NARA's Electronic Records Archive, which will be more suited for handling submissions through the Internet.An important aspect in migration is that agencies must maintain a record's authenticity. Electronic records could be modified less conspicuously than paper records.'What is important is the ability to preserve those records in an authentic manner, so that it is incontestable if they were to go to court,' said Tom Kelley, a customer engagement manager for Lockheed Martin Corp.Lockheed Martin is one of two companies NARA chose'the other is Harris Corp. of Melbourne, Fla.'to build ERA prototypes. In any system, agencies must be able to establish a chain of custody leading back to the original to prove the record in question remains authentic, despite any number of transformations.'We would keep traceability back to the original submittal, and any chain of transformations that would happen,' Kelley said.

Agencies are OK with these formats

The Electronic Records Management E-Gov Initiative, overseen by the National Archives and Records Administration, has started specifying formats that are acceptable for submitting records to NARA for long-term archiving. In addition to these formats, NARA plans in the near future to designate additional acceptable transfer formats.


  • ASCII (for text): NARA accepts text documents, Web pages and e-mails in the American Standard Code for Information Interchange. The ASCII standard, which has been around for more than 40 years, was created to represent the English alphabet, numerals and selected special characters. ASCII contains no methods of representing how to display characters'such as which fonts or typesets to use'and can be read by pretty much all word processors and browsers, as well as many other applications.


  • Geography Markup Language (for GIS records): GML is an Extensible Markup Language-based format for geospatial data records used in geographic information systems, overseen by the Open Geospatial Consortium. NARA accepts records in versions 2 and 3.


  • JPEG (for images): Joint Photographic Experts Group's File Interchange Format is used for capturing images. The International Standards Organization recognizes JPEG as a still-image standard. Not yet recognized by NARA but generating interest in the archiving community is the JPEG successor, JPEG2000, which reportedly offers better compression.


  • Portable Document Format (for documents and forms): Adobe PDF is used to capture paper documents in an electronic format, although electronic-only documents can be created with PDF as well. PDF maintains the original look-and-feel of a document, regardless of what computer platform it is opened on. NARA accepts PDF documents in versions 1.0 - 1.4, and asks agencies to turn off all security settings before submitting documents.


  • TIFF (for images): The Tagged Image File Format encodes bit-mapped images, although extensions exist for character recognition as well. Adobe holds the copyright to the TIFF specification. NARA accepts images in TIFF formats 4 through 6.


  • Spatial Data Transfer Standard (for GIS data): A format for representing Earth-referenced data, SDTS is recognized by both the Federal Information Processing Standards and the Federal Geographic Data Committee, a 19-member interagency committee developing policies for geographic data use. The Geological Survey makes heavy use of SDTS.
  • 'Tape is a devil, but it is a devil we know. We know the vulnerabilities, and most of them can be managed.'

    'NARA's Ken Thibodeau

    Rachael Golden

    To keep documents accessible, agencies face critical choices on software and hardware

















































    Optical questions































    X
    This website uses cookies to enhance user experience and to analyze performance and traffic on our website. We also share information about your use of our site with our social media, advertising and analytics partners. Learn More / Do Not Sell My Personal Information
    Accept Cookies
    X
    Cookie Preferences Cookie List

    Do Not Sell My Personal Information

    When you visit our website, we store cookies on your browser to collect information. The information collected might relate to you, your preferences or your device, and is mostly used to make the site work as you expect it to and to provide a more personalized web experience. However, you can choose not to allow certain types of cookies, which may impact your experience of the site and the services we are able to offer. Click on the different category headings to find out more and change our default settings according to your preference. You cannot opt-out of our First Party Strictly Necessary Cookies as they are deployed in order to ensure the proper functioning of our website (such as prompting the cookie banner and remembering your settings, to log into your account, to redirect you when you log out, etc.). For more information about the First and Third Party Cookies used please follow this link.

    Allow All Cookies

    Manage Consent Preferences

    Strictly Necessary Cookies - Always Active

    We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

    Sale of Personal Data, Targeting & Social Media Cookies

    Under the California Consumer Privacy Act, you have the right to opt-out of the sale of your personal information to third parties. These cookies collect information for analytics and to personalize your experience with targeted ads. You may exercise your right to opt out of the sale of personal information by using this toggle switch. If you opt out we will not be able to offer you personalised ads and will not hand over your personal information to any third parties. Additionally, you may contact our legal department for further clarification about your rights as a California consumer by using this Exercise My Rights link

    If you have enabled privacy controls on your browser (such as a plugin), we have to take that as a valid request to opt-out. Therefore we would not be able to track your activity through the web. This may affect our ability to personalize ads according to your preferences.

    Targeting cookies may be set through our site by our advertising partners. They may be used by those companies to build a profile of your interests and show you relevant adverts on other sites. They do not store directly personal information, but are based on uniquely identifying your browser and internet device. If you do not allow these cookies, you will experience less targeted advertising.

    Social media cookies are set by a range of social media services that we have added to the site to enable you to share our content with your friends and networks. They are capable of tracking your browser across other sites and building up a profile of your interests. This may impact the content and messages you see on other websites you visit. If you do not allow these cookies you may not be able to use or see these sharing tools.

    If you want to opt out of all of our lead reports and lists, please submit a privacy request at our Do Not Sell page.

    Save Settings
    Cookie Preferences Cookie List

    Cookie List

    A cookie is a small piece of data (text file) that a website – when visited by a user – asks your browser to store on your device in order to remember information about you, such as your language preference or login information. Those cookies are set by us and called first-party cookies. We also use third-party cookies – which are cookies from a domain different than the domain of the website you are visiting – for our advertising and marketing efforts. More specifically, we use cookies and other tracking technologies for the following purposes:

    Strictly Necessary Cookies

    We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

    Functional Cookies

    We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

    Performance Cookies

    We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

    Sale of Personal Data

    We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.

    Social Media Cookies

    We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.

    Targeting Cookies

    We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.