Archivists work on a PDF for the long term

Even documents based on open standards can become unreadable because of intellectual property disputes.

Stephen Levenson, chief technology officer in the records office of the Administrative Office of the U.S. Courts, learned that harsh lesson a few years ago.

Now Levenson and other agency representatives are helping vendors develop a permanent document archiving standard called PDF/A, free from licensing concerns.

The courts, like many agencies, for years have kept documents in read-only Adobe Portable Document Format. Adobe Systems Inc. of Mountain View, Calif., has published its PDF specifications, so agencies have felt reasonably sure their documents would remain accessible even if Adobe should go out of business or start charging prohibitive prices for its PDF reader.

The public specs would allow them to build their own readers, or so most archivists assumed.

It turns out, however, that many third-party PDF readers can no longer read older PDF documents.

Early PDF versions used the LZWDecode filter to compress document images. In 1999 the patent holder, Unisys Corp., began charging licensing fees for LZWDecode. Although Adobe itself paid the licensing fee so that its readers can view older PDFs, third-party readers without licenses can no longer read the early documents.

Intellectual frustrations

The intellectual property issue kept archivists wondering whether they could maintain readable PDFs if LZWDecode were ever locked up. Levenson, like many others, felt frustrated.

'We vowed not to let something like that happen again,' he said. 'What we thought we owned, we didn't.'

The U.S. Courts are now converting to PDF/A and an electronic filing system to ensure that archived files will remain readable for the ages.

Representatives of the Agriculture and Defense departments, IRS, Library of Congress, National Archives and Records Administration, and National Library of Medicine are working on the standard with Adobe and a number of other companies.

PDF/A, based on Version 1.4 of Adobe's public-domain specification, is a stripped-down, platform-agnostic version using no proprietary technologies.

'These will be static documents,' Levenson said.

Once the standard is agreed on, it will be submitted to the International Standards Organization. Levenson said he hopes to see a draft standard early next year and a final, international standard by the end of 2005.

The agency participants are encouraging software vendors to incorporate the standard into their PDF readers and generators. Meanwhile, soon after Unisys started charging licensing fees for its compression software, Adobe switched to a public-domain compression technology called Flate.

'We can use it forever,' Levenson said.

About the Author

Joab Jackson is the senior technology editor for Government Computer News.

Reader Comments

Please post your comments here. Comments are moderated, so they may not appear immediately after submitting. We will not post comments that we consider abusive or off-topic.

Please type the letters/numbers you see above