Readers defend PDF format against critics

Documents in the Portable Document Format are about as common on the Web as celebrity photos, but does the format help or hinder the dissemination of information? Readers were of two minds in responding to our report on the Sunlight Foundation’s criticisms of PDF. Several stressed the need for better-educated users. Others contended that data shouldn't always be easy to get at.

Sunlight Labs Director Clay Johnson argued that PDF works against government transparency because the format makes it difficult for computers to parse information. An architect for Adobe – although PDF is an open standard, Adobe has built a substantial business around PDF – responded that it is fairly easy to incorporate Extensible Markup Language into a PDF, although most people don’t know how to do it.

“So the real problem is not the feature set of the Adobe products, but how government officials use them,” wrote a reader named Mike. “Any shift in applications will result even less transparency for a while until users become familiar with the new applications. I know our organization specially prohibits the use of advanced Adobe features and scans in the documents. This is because our document control gatekeepers are stuck in the 1970s. Even more to the point for transparency is access to the documents to begin with. Look at the health care debate: Every discussion on the current bill is confronted by advocates on both sides with ‘but there is no final bill yet.’ Transparency and access extend beyond the feature set of Adobe products.”

“I agree with Mike,” wrote Buddy of Somewhere in the USA, who suggested that to “most users who use Acrobat, PDF means let's scan this in ‘image’ format and make it a PDF when they need to learn how to use the software! Where I work at there was never any training on Adobe, let alone work for a place where software features are not disabled by administrative personnel under the guise of security. As for a worker who has to get items out, it’s just as fast to make a SCAN PDF and go from there.... Job done.”

“What's important is how you create/manipulate the documents,” added Kelvyn in Philadelphia. “I work in a municipal agency with a hybrid paper/electronic record system, and, for the moment at least, PDF remains the best bridge between those worlds. I can scan a document, add fields that push info into our database and publish a fully-searchable PDF on our Web site. The problem is that many PDFs on government sites are simple scanned images of text, and not easily searchable. I spend a lot of time trying to educate my staff on the need to print directly to PDF from other apps in order to preserve full-text searching.”

At least one reader, signed Anonymous, offered some practical advice: “All you have to do to get a working copy of the PDF doc is to right-click on it, and hit, ‘Select all.’ Then do a ‘Ctrl-C’ to copy it to either a Word or WP doc. Then hit ‘Ctrl-V’ to paste it on the page. Once it's in WP or Word, you can work with it.”

A reader named Ed, however, said that doesn’t always work. “I tried to comment on an environmental impact statement that the Maryland State Highway Administration put out in PDF and it was a nightmare,” he writes. “They locked the document so you could not cut and paste into another document. So the practice of putting their statement into another document and then questioning or refuting the statement was almost impossible. And this is exactly what they wanted because they did not want dissent.”

But is making it difficult to extract data from a document always a bad thing? Stanley Baranowski wrote that transparency might not be the real, or even the only, issue. “It seems to me one issue is ‘data extraction’ and the difficulty of ‘others’ extracting certain information but not necessarily all of the data, only what ‘they’ want you to see, not the entire document -- that ‘taken out of context’ thing. I think that the difficulty of extracting pieces of the entire document actually reinforces the idea of transparency. Someone cannot cut and paste just what thy want to show you, but there's the entire document -- no manipulation -- read it all and decide for yourselves.”

“My initial reaction is against this idea [of easily parsed data],” wrote Charles, of Hollywood, Fla. “Our goal is making the information available to the largest number of *people* and PDF is an excellent way to do that. I say ‘our’ because I work for the city in which I live and one part of my job is making the information available. The tone of the article seems, to me at least, to be that I need to spend more time making the information we provide in such as way as to allow *them* to just cut and paste into whatever they are doing with said information. I use the 'old lady' test (my apologies to the old ladies) -- can the oldest and most technologically inept citizen in my city find, and then read, the information? If the answer is yes, then I have done a good job. If you want to do something else with that information, then you are probably tech-savvy enough to figure out how to get the information out of my PDF files.”

“I believe transparency must also include the ability to ensure accuracy of presentation,” added another reader. “Too many individuals I work with do not verify what they read but take it as fact [that] it is accurate. If someone can extract my data and manipulate it and then reproduce it as mine, that is worse than it being easy to parse. One of the biggest reasons I use PDF is because it is protected from the average users’ exploits.”

Nov 05, 2009

