Is PDF hurting transparency?
- By Joab Jackson
- Nov 04, 2009
Computers cannot easily parse government documents rendered within the Portable Document Format, according to the Sunlight Foundation, a nonprofit organization dedicated to government transparency. The group argues that because of this, the widely used document standard is actually detrimental to government transparency efforts.
The difficult parsing means that people have to work harder to reuse government data, the organization asserts.
Although PDF is an open standard, it's closely associated with Adobe, which makes popular free software for reading PDF documents, and more sophisticated software for creating them. Adobe representatives dispute the Foundation's claim, saying a PDF can contain parsable data, in the form of XML datasets, but admitted that not enough of its users know how to use the feature.
In a blog entry provocatively entitled "Adobe is Bad for Government," posted last week, Sunlight Labs head Clay Johnson bemoaned the difficulties of extracting data from PDFs.
Johnson — not the same person as the former deputy director of management at the Office of Management and Budget — points out a number of specific examples in which the government's use of PDFs have made data hard to extract. The examples include House of Representatives bills, the Internal Revenue Service's Political Action Committee filings, and Congressional earmark requests from members of Congress.
"It is a misunderstanding about the capabilities of PDF," said Bobby Caudill, who is the government solutions architect for Adobe Systems. Caudill pointed out that it is possible to load the documents used to create a PDF directly into the PDF file. An XML document could be incorporated in such a way, for instance. So all an end user would need to do is extract the XML document from the PDF and then parse away as usual.
However, Caudill admitted, most users of Adobe Acrobat don't know they can do this. "It is quite easy to do, but most people aren't aware of this capability," he said.
He also noted that, contrary to Johnson's assertion that PDF is a proprietary format, it actually is a standard controlled not by Adobe, but by International Standards Organization. "The process of development now belongs to them," Caudill said.
"It is easy to oversimplify the technology choices. There's this perception that there is this false choice between providing openness for people and openness for machines," added Robert Pinkerton, director of government solutions for Adobe. "You need to do both."
Pinkerton pointed to Adobe's one-day conference Nov. 5 on open government, at which agencies could explore these issues further.
Johnson also took aim at Adobe Flash, or rather the users of Flash, noting that in many cases government agencies assume that rendering data into a visually appealing format is the best way to achieve transparency. Instead of pie charts and dashboards, agencies should concentrate on providing the data in easily parsable formats, he said.