Energy puts SGML to work
- By Florence Olsen
- May 26, 1997
Energy's Office of Scientific and Technical Information (OSTI) recognized the value of
SGML several years ago when it adopted the language as its standard for document exchange.
Now Energy is laying the groundwork for a distributed multimedia archive that agency
scientists and academic researchers can access from any desktop computer.
With the new architecture, OSTI officials want to give government scientists desktop
access to the full text and multimedia output of Energy's more than 60 research sites and
program offices that conduct basic materials research and other high-interest
OSTI has had modest success so far with electronic document exchange. "We're
trying to stay in an environment where we're not locked into a proprietary
representation," said Earl Smith, computer specialist in OSTI's Information Systems
Development branch at Oak Ridge National Laboratory in Tennessee. Now only 17 research
sites use an electronic format to transfer their lab output to OSTI.
Smith said unlocking the capabilities of SGML can benefit all organizations that manage
information. "SGML gives you room to do lots of interesting things," he said.
Under the new plan, the labs themselves would be relieved of transferring copies of
their research to OSTI. Instead, OSTI would maintain an entity registry at each research
site by exploiting a little-used entity naming mechanism in the SGML standard.
The Intelligent Document Exchange Architecture (IDEA) under development incorporates
commercial products that transform non-SGML documents into SGML.
Despite its preference for the SGML format, OSTI has been accepting Adobe Systems
Inc.'s PostScript and Portable Document Format, TIFF, Group 4 and HyperText Markup
Language documents as well.
Older documents likely still will be stored as TIFF, PostScript or PDF image files with
links to an SGML bibliographic header or metadata file identifying author, title,
chapters, figures and mathematical equations, for example.
In an SGML-based archive, applications referencing SGML documents don't have to know
whether those documents are stored in a file system or a database or whether they are out
somewhere on the Internet, said Charles F. Goldfarb, principal architectural consultant on
the DOE project and chief inventor of the SGML standard.
System developers using SGML can avoid direct, hard-wired addressing with Universal
Resource Locators, or URLs, for example.
"This is a part of SGML that only recently is being explored in any depth,"
The SGML entities, or components, will be stored in a relational database and managed
by commercial document management systems.
The software that will complete OSTI's distributed document archive also derives from
That software is a scalable SGML messaging-oriented middleware and data collection
Until now, no vendor has used the open SGML standard to build tools that do this, said
Ron Turner, co-owner of Soph-Ware Associates in Spokane, Wash., and principal developer
for OSTI's distributed archive project. "Until the tools are built, the open standard
really isn't doing much," he said.
OSTI officials said they hope to use the new SGML tools and the discipline that goes
into analyzing and creating SGML document types to manage the interactions between all the
components in the distributed archive.
Soph-Ware Associates will publish its distributed archive specification and offer a
commercial product based on that specification.
Goldfarb said the OSTI architecture is interesting because it doesn't rely on elaborate
application programming interfaces for applications to communicate with one another.
"The idea of replacing a software way of thinking with a document way of thinking
I find very innovative," Goldfarb said.
Although the architecture as envisioned does not use the mandatory Government
Information Locator Service protocol, the American National Standards Institute's Z39.50,
it performs a similar type of transaction, Turner said, and could have links to GILS.
OSTI's distributed archive initially would support Energy's bench scientists, most of
whom are connected directly to the high-speed Energy Sciences backbone network.
The multimedia delivery infrastructure will depend on the increased bandwidth from the
National Science Foundation's Internet2 initiative to upgrade ESnet, Smith said.
In addition to the government's $825,000 Small Business Innovation Research Grant, the
SGML project has received in-kind funding from many different sources.
They include SoftQuad Inc.; Saros Corp., a business unit of FileNet Corp.; and One Room