Streamlining XML

Extensible Markup Language has helped
immeasurably in standardizing datasets
across federal agencies, but one of its
downsides is that it is rather verbose.
Bracketing each bit of information with
a set of tags can balloon the size of the
resulting dataset. Sending all that data
back and forth becomes a comparatively
expensive exercise, especially for low-bandwidth
and low-power platforms,
such as mobile phones. Many companies
offer network appliances that cut the
verbosity ' usually by using smaller
tokens in place of tags ' but no open
standard has addressed this issue.

Until recently, that is. A World Wide
Web Consortium working group has
published the Efficient XML Interchange
(EXI) framework. It uses a relatively simple
algorithm to encode XML event
streams, one that examines the data and
replaces tags with more compact identifiers.

'It is a compression format targeted
for XML,' Ed Day, an engineer at
Objective Systems and member of the
working group, said at the recent XML
2007 conference in Boston.

Day showed that an EXI-compressed
file could be 100 times smaller than the
plain, unencoded version of the same
material ' and 14 times smaller than one
compressed in the GNU zip compression
format. He said EXI also shrinks files better
than the International Organization
for Standardization's Fast Infoset, which
encodes XML into binary form. And
although the recently announced Version
1 is still in draft form, Sun Microsystems
already is working on a set of Java application
programming interfaces for EXI.

'We're trying to get XML into places
where it could not be used before due
to performance constraints,' Day said.

For greater detail on the specification,
see GCN.com/991.

About the Author

Joab Jackson is the senior technology editor for Government Computer News.

Reader Comments

Please post your comments here. Comments are moderated, so they may not appear immediately after submitting. We will not post comments that we consider abusive or off-topic.

Please type the letters/numbers you see above