GCN Tech Blog

By GCN Staff

Blog archive

Adobe plunges PDF into XML

Today at the XML 2006 Conference, being held this week in Boston, Adobe Systems Inc., will reveal a radical beta of what could be the next version of its veritable Portable Document Format'one made up entirely of the Extensible Markup Language.

Although the current version of PDF allows a document creator to bundle an XML-encased transcript of the text in that document, documents rendered in this new document layout format--codenamed Mars'will be comprised entirely in XML, explained Joel Geraci, Adobe's PDF developer evangelist.

The company's research lab has released the software for public review. Should the feedback prove helpful and the Adobe corporate Powers-That-Be bless the new format, Mars could be the next generation PDF, and be rolled into the company's offerings as early as with the next major release of Acrobat.

'PDF is over 15 years old. It predates XML. The technology it's not at the same level compatibility as XML, where there are a lot of tools and knowledge about how to work with XML,' admitted Phillip Levy, the PDF and XML architect for Adobe who helped develop Mars. 'So moving the PDF technology onto an XML base gives us a lot better integration with the rest of the world.'

Like the documents rendered in Microsoft Office's new XML formats, documents rendered in Mars will be a zipped collection of individual files. A plain-text Scalable Vector Graphics file will hold not only the document text but also explicit instructions on rendering the precise look and feel of the document. The zipped collection will also include any images that were incorporated into that document as well.

Adobe's use of SVG could represent a major step forward for that XML-based format, Levy said. Still going through developmental growing pains, this XML-based language describes how to depict visual elements in a presentation, with precise controls on where each element appears on the layout. Adobe will also add its own XML-based extensions to cover visual elements not handled by standardized SVG tags, Levy said.

Performance-wise, this new format should be on par with the current PDF, Levy said. Although XML encoding can be particularly verbose, the zipped compression should keep the file size manageable. Also, the processing power needed for rendering should be about on par with current PDFs, Levy noted. Eventually, the new format should be able to have all the advanced features, such as security, that the current PDF offers.

Levy said that by having PDF all-XML, organizations will be better able to incorporate into their workflow functions like PDF generation and information extraction from PDF documents.

Geraci demonstrated this potential ease with a proverbial 'Hello World' file. He displayed a Mars file for a document with only one line of text, 'Hello World.' He then opened up that document's SVG file and copied the 'Hello World' line, with its SVG encasements, to another line below the original, changing the offset value tag so it would appear just below the original line. Saving the SVG file, Gerace reopened the document in a viewer to display that the second line was added.

Although most PDF SVG files will be too complex to change by hand, the demonstration showed how easily an XML-parsing application could manipulate a PDF file, Gerace stated.

While today, external actions on PDFs can be done using Adobe PDF libraries, developers can find these libraries difficult to work with, Levy said. XML should come far easier, because the syntax is familiar and can be readily incorporated into Java and other programming languages, he said.

Posted by Joab Jackson on Dec 07, 2006 at 9:39 AM


Reader Comments

Sun, Dec 10, 2006 Joab Jackson

When I spoke with Geraci, I got the sense some of the spec licensing options were still up in the air. As it stands now, a *lot* of this new PDF is based on the SVG spec anyway. So in theory you could create PDF in a simple text editor (if you had a lot of time on your hands anyway). It is the extensions that have yet to be decided. Joab

Sat, Dec 9, 2006 Mark Frautschi MD

It's my understanding that Adobe has been a good community citizen by opening its PDF format to other communities, governments and other organizations and that this was one reason why the State of Massachusetts selected PDF (along with the Open Document Format) as one candidate for its official records in the last year. Hopefully the prototype Mars XMLized PDF will remain open enough to serve the needs of the same diverse user set. If the authors happen to know whether Adobe has made any pronouncements about the openenss of Mars or future PDFs, I would welcome hearing this.

Please post your comments here. Comments are moderated, so they may not appear immediately after submitting. We will not post comments that we consider abusive or off-topic.

Please type the letters/numbers you see above

resources

HTML - No Current Item Deck
  • Transforming Constituent Services with Business Process Management
  • Improving Performance in Hybrid Clouds
  • Data Center Consolidation & Energy Efficiency in Federal Facilities