Markup language update

Sidebars & Related Stories









Tips for buyers
















What do you bet that the 1990s will be known in the next millennium as the
Internet decade?


Among the long list of things the Internet has changed forever is the role in the
federal government of publishing, especially publishing on the Web. And to make documents
Web-ready, you need markup languages.


The oldest of these, Standard Generalized Markup Language, is not a markup language by
itself—it is a metalanguage used for defining markup languages, primarily for
electronic information encoding and interchange. The International Standards Organization
of Geneva defines the SGML standard in their document, ISO 8879:1986.


In the federal government, SGML is best known as part of the Defense Department’s
Continuous Acquisition and Lifecycle Support initiative. The SGML portion of CALS is
Mil-M-28001. CALS was intended to reduce the cost of supporting and maintaining military
equipment by standardizing information storage formats for weapons systems, which often
have 20-year lifecycles.


SGML specifies a standard method for describing a document’s structure as a
hierarchical, or nested, model with logical, predictable elements, which are marked in the
document with tags.


A Document Type Definition (DTD) defines the document’s structure, along with
rules for the relationships between document elements. Angle brackets contain the element
tags and typically appear in pairs at the beginning and end of elements, for example:


<par>This is a paragraph in
SGML.</par>


SGML authoring tools help generate valid tags by putting the tags from the current DTD
into a menu.


An interactive parser, often included in an authoring tool, can both verify the
correctness of the overall document and restrict the tags entered to only those that are
valid according to the rules of the current DTD. A batch parser can only check the
correctness of the overall document.


A formatter reads an SGML document as well as its DTD and style, or Formatting Output
Specification Instance (FOSI), and produces pages or other formatted output; doing this
correctly is usually a multipass process because of the complexity and sequential nature
of SGML.


Some SGML authoring tools can format single pages for document preview and do visual
editing, even though they don’t generate documents for printing. SGML publishing
tools generally can lay out, compose, paginate and print entire documents, including
generated text such as tables of contents and indexes. SGML publishing is appropriate for
large, structured documents such as manuals and parts catalogs that can be formatted
automatically; it is usually inappropriate for small custom-designed documents such as
brochures.


The CALS initiative goes beyond requiring SGML tags—it requires a particular
Mil-Spec DTD, and a particular style of formatted output, called the output specification.
An individual document style created from the output specification determines how each
element in a document will be rendered on the screen and to a printer.


Another formatting style standard, also blessed by the ISO, is the Document Style
Semantics and Specification Language. DSSSL has not caught on commercially because of its
complexity. A new formatting style standard, Extensible Style Sheet Language (XSL), the
style sheet component within Extensible Markup Language (XML), is still under development.


Several trade organizations have developed their own, industry-specific DTDs, with
mixed success. The auto industry has managed to standardize, for example, but the
publishing industry’s standard DTD has not been widely adopted, largely because it
lacks several essential features.


Developing a custom DTD is a technical activity resembling programming that is usually
done by specialists or consultants.


XML, a subset of SGML, has simplified grammar and no requirement for a DTD. It was
designed for large-scale electronic publishing and is useful for exchanging structured
documents.


XML proponents hope that it will become the new standard for the exchange of a variety
of data on the Web as well as within and between companies and agencies. XML tools, XML
parsers for browsers and XML systems are starting to appear.


You can use XML for the sort of information usually kept in a database. For example,
the following XML describes a customer record:


<customer-details id=“AcPharm39156”>


<name>“Acme Pharmaceuticals
Co.”</name>


<address country=“US”>


<street>7301 Smokey Boulevard</street>


<city>Smallville</city>


<state>Indiana</state>


<postal>94571</postal>


</address>


</customer-details>


Hypertext Markup Language is a markup language written in SGML for use on the Web. Many
agencies and companies publish their electronic information in HTML on the Web. Web-based
electronic commerce, quickly becoming important in the private sector, is of increasing
interest to agencies.


The current World Wide Web Consortium standard is HTML 4.0, but many browsers can only
read older versions of HTML, and other browsers support proprietary tags that have not
been accepted into the standard. The next generation of HTML proposed for consideration by
the consortium will implement HTML as a set of XML tags, allowing for a graceful
transition to the more powerful XML language.


HTML authoring tools range from simple text editors to sophisticated visual Web design
systems.


Web site management tools, such as Microsoft FrontPage, can, besides authoring pages,
automatically generate navigation links between pages in a site, standardize styles across
pages and maintain links when page names change.


Many authoring tools include limited support for client-side scripts in JavaScript or
VBScript. Specialized tools, which are not discussed in this guide, help Web programmers
develop server-side Web applications and dynamically incorporate database records in Web
pages.


Because HTML is an application of SGML, an SGML authoring tool can be used to edit
HTML, given an HTML DTD. Because most HTML-specific authoring tools support scripting,
uploading and link maintenance as well as tag and content creation, they are better suited
than SGML authoring tools to creating Web sites.


This Buyers Guide lists SGML and XML authoring and formatting tools available in the
United States, plus a small sample of HTML authoring and conversion tools.


Agencies use HTML primarily for Web sites—both internal intranet sites and public
sites on the Internet.


XML has not yet really arrived in government, although it is likely to be in use within
the next year as, for instance, a way to automatically connect Web sites to SGML documents
and databases.


SGML’s highly structured nature makes it suitable for creating searchable CD-ROMs.


Government, especially DOD, uses SGML for document storage and publishing. SGML
publishing works well within the CALS framework, where large documents with relatively
simple standard formats are the rule. Several limits crop up in document formatting,
however.


Some SGML formatting engines have trouble generating acceptable multicolumn layouts,
wrapping text around illustrations and correctly formatting CALS tables and equations.
None of these is a problem in conventional desktop publishing environments, but such
environments don’t address the long-term stability issues that prompted creation of
SGML.


Even editing tables and equations can be problematic. Compared with table and equation
edit functions in desktop publishing programs, those in SGML authoring programs often seem
primitive. Given the large number of tags and attributes generated by the editors and the
number of SGML table formatting standards, however, one can easily understand the problems
developers have faced and overcome.


SGML documents are often put to multiple purposes: The same document may be destined
for a print publication, a CD-ROM and a Web site.


This can present a problem. Many more SGML tags are required for a good searchable
CD-ROM than are needed for a printed document, in which excess tags can add significantly
to the document’s development cost and cause maintenance headaches later.


Some systems can automatically convert SGML documents to Web pages and other viewable
documents. The quality may vary, however, and it may be necessary to hand-tune HTML pages
each time they are generated to attain the highest quality Web pages—not a viable
option when thousands of pages are involved.


Some systems offer a way to view SGML documents directly from a Web browser, using a
plug-in viewer.


One common problem is that external contributors and editors don’t have SGML
authoring systems and cannot deal with the SGML tags in text documents.


As a result, tags can be lost in the revision process and must be re-entered when
revisions are merged into the master document. Some agencies deal with the problem by
having external contributors do revisions on paper, and the internal publications
department type the changes directly into the SGML system.


Some SGML authoring systems cannot deal with partial or invalid documents. So even if
an agency’s field office or contractor can do SGML editing, interchange problems may
still exist.


Contributors to a document may be restricted to seeing only part of a document, either
for security reasons or to avoid multiple revisions being made to the same document
section. To allow such multiple levels of access, it is sometimes necessary to extract
subdocuments from the master document and create a full set of context structure tags to
make valid documents for the working DTD. Some systems do this more easily than others.


Some systems also track revisions better than others do. If documents you’ll be
creating and publishing typically have long lifecycles with frequent revisions, be sure to
check a package’s document comparison and revision marking
capabilities.   


lWhen publishing Standard Generalized Markup Language documents, you must
choose between formatting from pure SGML markup and formatting with additional filtering
information.


ArborText’s Adept Publisher, available for Unix and Microsoft Windows NT 4.0,
takes the former route and generates PostScript directly from a document’s SGML,
Document Type Definition and Formatting Output Specification Instance using a multipass,
rule-based engine. The trade-off is you get more automatic publishing but less control
over the end product.


Adept Publisher is an appropriate printing engine for large, simply formatted
documents, including most of the Defense Department’s Continuous Acquisition and
Lifecycle Support documents. It handles complex index and cross-reference structures and
does a good job with revision marking.


Adept Publisher includes all the functions of Adept Editor, a highly configurable
authoring system for pure native SGML and Extensible Markup Language.


In its default view, Adept displays two editable panes, a document map and an edit
view. The edit view has most of the features of desktop word processors, and it deals with
SGML tags in several ways.


You can specify tags to be viewed or hidden in the document. A quick-tag entry menu
helps you to create only those tags valid in the current context. Even when tags are
hidden, empty tags are displayed and highlighted to let you fill in the missing tag
contents.   When searching a document, you can find text inside specific tags if
you wish.


When dragging an element within a document, cursor cues indicate whether you can move
the element, where to drop the element, where the context would change the tags and where
the element would be invalid.


An external equation editor pops up when you add or edit an equation; it includes a
palette of equation symbols.


Adept takes its commands from menus, toolbars, dialogs and a command line. You can
customize menus, toolbars and dialogs. You can automate publishing via Adept Command
Language and compiled .dll files that tie into Adept’s object model.


An add-on product for developers simplifies extensive customization and
automation.Adept does a good job of handling partial documents and invalid tags, and of
importing and exporting XML, and it integrates with six document management systems.


Adept Publisher sells for $2,350; Adept Editor is for $1,350. More details on both
products are available at www.arbortext.com.


Contact ArborText Inc. of Waltham, Mass., at 781-529-1000 or 734-997-0200.


Martin Heller is a software developer, consultant and writer in Andover, Mass.





inside gcn

  • urban air mobility (NASA)

    NASA seeks partners for urban air mobility challenge

Reader Comments

Please post your comments here. Comments are moderated, so they may not appear immediately after submitting. We will not post comments that we consider abusive or off-topic.

Please type the letters/numbers you see above