XML tools can tune up e-gov efforts

XML tools can tune up e-gov efforts<@VM>A host of XML development tools can add life to your Web pages

The Lowdown

  • What is it? XML is a metalanguage that uses tags to describe the content of data in a document in much the same way that HTML formats the appearance of a document. Both are subsets of SGML.

  • Why would you need it? XML can serve as the glue between various applications, making it easier to exchange data among products and to publish information in an accessible manner. It also can make creating Web pages easier for the developer.

  • When wouldn't you need it? If you don't publish or gather data over the Internet and aren't currently developing an integrated information management solution with components from various vendors, then you can probably ignore XML.

  • Price? The tools for implementing XML vary greatly and so do their prices. They range from free, downloadable XML editors'text editors with XML syntax built in'and other development tools to full content management suites that can cost tens of thousands of dollars.

  • Status? Some of the most critical parts of XML, including the schema that will make it easy to exchange data among various applications from different vendors, are not yet standardized. So XML cannot instantly integrate incompatible data. But many vendors have promised to follow the World Wide Web Consortium's recommendations when they are finalized, which could happen as soon as this spring.

  • Must-know info? As usual with a new technology, the hype exceeds the reality. In this case, XML can do all that its proponents claim'the exaggeration comes in the claims of how easy it will be to implement.

  • The ABCs of XML

  • Document Type Definition (DTD)'the language created by XML producs that describes the contents of SGML or XML documents.

  • XML schema'a superset of DTDs written in XML syntax and created with XML tools.

  • Document Object Module (DOM)'an application programming interface for accessing HTML and XML documents from a Web browser.

  • Extensible Stylesheet Language (XSL)'the XML equivalent of the Cascading Style Sheet used with HTML, which XML also supports. XSL is an XML-specific style sheet format.

  • Parser'a tool that finds XML tags, extracts the information they contain and passes them along to applications.

  • Editor'a text editor that works within XML.

  • For more on XML, go to...

  • www.biztalk.org'BizTalk, a group organized by Microsoft Corp. that promotes XML.

  • www.ebxml.org'ebXML is a joint effort of the United Nations Center for Trade Facilitation and Electronic Business and OASIS (see below) for creating a set of specifications for electronic business.

  • www.isoc.org'The Internet Society.

  • www.oasis-open.org'OASIS, the Organization for the Advancement of Structured Information Standards.

  • www.voicexml.org'VoiceXML Forum, a group sponsored by AT&T Corp., Lucent Technologies Inc. and Motorola Inc.

  • www.w3.org/XML'The World Wide Web Consortium's XML page, site of the most official XML information.

  • XML promises a simple way to exchange data over the World Wide Web.

    Vendors still have to agree on standards, but the Extensible Markup Language can give currency to e-commerce plans


    As the Web evolves toward greater use of electronic commerce, Extensible Markup Language has emerged as its lingua franca in waiting. XML provides a level of functionality beyond that of Hypertext Markup Language and a common way to identify data in transactional documents.

    But a greater ease of use doesn't necessarily mean converting to XML is easy. Nor does it mean that HTML will be left behind.

    The simplest way to understand XML is to understand how it differs from HTML. Both are subsets of Standard Generalized Markup Language, but where HTML concentrates on style, XML focuses on substance.

    HTML was developed to format documents moving across networks, particularly the Internet, and format paragraphs, headings, lists and simple illustrations for the Web.

    Flying blind

    But HTML is limited in its ability to describe the kind of data displayed. An order form designed in HTML isn't really aware of the data it contains, so the data is difficult to transfer to other applications.

    XML gets into the content. It is a metalanguage'a language describing how another language is structured'used to define the data structure of documents and make it easier to exchange data between applications.

    XML lets programmers create documents that describe the various data fields to allow the information to be used by other programs, similar to the way database entries do.

    In XML, a page developer can define new tags; in HTML, tags are predefined. And in XML, software can easily mine the data over a network'as long as all the programs recognize the same tags.

    Because of its SGML heritage, XML documents can be parsed and validated the same way any SGML file can. XML is better suited for general use because it discards some of the most complex features of SGML.

    HTML can do many of the things that proponents tout as XML features but only through a bewildering array of sometimes incompatible plug-ins. As the Web has grown, HTML has been unable to grow with it except through third-party add-on utilities.

    But HTML won't be riding off into the sunset any time soon. It does some things better than XML'and does some things XML cannot do'so there is no move to replace it.

    A new World Wide Web Consortium (W3C) standard, XHTML Version 1.0, is a combination of HTML and XML that allows Web sites to add XML to their pages but lets some old browsers continue to view the content.

    XML promises:

    ' A simple way to exchange data over the World Wide Web

    ' A new beginning for programmers who have pushed HTML to the limit

    ' A new paradigm for querying databases over enterprise or local networks

    ' An easy way to exchange data between various local applications

    ' A solid middleware standard that makes XML distributed computing possible because applications can access distributed objects across the Web.

    A lack of standards is one problem. But there is also a fundamental problem with the entire XML concept that must be reconciled before it becomes a transparent data exchange tool.

    The biggest hurdle to adoption of XML is its most important feature: the ability to create new tags. This ability lets users customize XML documents to exactly fit their needs. But unless you are developing an in-house system, custom tags won't be recognized by other applications, virtually eliminating XML's usefulness.

    Vendors producing various XML products create Document Type Definitions that describe their product's metadata schema. A validating editor or processor uses DTDs to determine if a particular document contains valid XML code.
    Because this is such a flexible process, there is no standard set of definitions and therefore no way to exchange data among different applications unless they share identical DTDs.

    There are published sets of DTDs but, because there are many different sets, none of them can be described as a standard.

    No magic bullets

    If your applications can share data before you incorporate XML, then all probably will be well. But if your data is incompatible now, merely formatting it with XML tags won't magically make it compatible.

    The best option today is to use mapping tools that can read DTDs from one vendor and translate them into another vendor's DTDs. But is this what you thought XML promised?

    XML is touted as an easy, seamless way to share data among disparate applications. But when you have incompatible data sets requiring XML coding on both sides, with a layer of translation from mapping tools in between, 'easy' could be a question of degree.

    In most cases you will have to analyze the two DTD sets and create the mapping. Commercial translation tools don't understand the various DTDs; they just manage the routine translation after you do the work.

    The W3C XML Schema is intended to define the structure, content and semantics of XML documents, superceding SGML DTDs.

    The XML Schema reached candidate recommendation status in October, meaning that it now can be implemented on the Web and users can develop feedback on the specification.

    The XML Schema is necessary to teach XML systems how to parse data to recognize, for example, the difference between a string of text and a Social Security number.

    Until the XML Schema reaches full recommendation status, which is expected by spring, vendors and industry groups will continue to create their own DTDs.
    DTDs have become well-entrenched in the user community simply because it has taken more than two years to develop the XML Schema.

    Microsoft Corp. and other companies have said they will follow any W3C standard once it reaches the recommendation level.

    What's in the toolbox?

    The software available for XML ranges from full-blown software development kits to free editors and other tools. IBM's Alphaworks, for instance, publishes a lot of cutting-edge XML products for free download.

    Editors are simple and valuable tools'they are text editors with XML syntax built in.

    Software development kits offer the complete set of tools for generating valid XML code, integrating XML into your present operations and even helping to migrate existing data or applications to XML.

    Although XML is still a few steps away from fulfilling its promise, several agencies are already making use of it.

    The Patent and Trademark Office, for instance, is using Infrastructures for Information Inc.'s S4/Text to develop an XML system for authoring patent applications that can be filed over the Internet.

    The Securities and Exchange Commission, Defense Department, IRS and Government Printing Office also are creating XML documents for such uses as online forms, document exchange and online training.

    John McCormick, a free-lance writer and computer consultant, has been working with computers since the early 1960s.

    Adobe Systems Inc.
    San Jose, Calif.
    FrameMaker + SGMLWindows, Macintosh, UnixXML publishing$1,449
    Artesia Technologies Inc.
    Rockville, Md.
    TEAMSUnixEnterprise digital asset manager$100,000 up
    Bluestone Software Inc.
    610-915-5000 x3052
    Visual XMLNT, SolarisIntegrated development environmentFree download
    The Breeze Factor LLC
    Encinitas, Calif.
    Breeze XML StudioJavaJavaBeans from XML creator$495 up
    Chrystal Software Inc.
    San Diego
    AstoriaSolaris, WindowsEnterprise authoring tool$5,000 up
    CueSoft.com Inc.
    Brighton, Colo.
    Exml EditorWindowsXML editor, tree or source viewFree
    CUEXml ActiveX 2.2WindowsXML parser and DOM components$79
    DataChannel Inc.
    Bellevue, Wash.
    DataChannel ServerNT, SolarisXML enterprise portal$100,000 up
    Lakeville, Minn.
    eidonXbaseWindows, SolarisDatabase content management tool$20
    IBM Corp.
    Armonk, N.Y.
    alphaworks.ibm.com or www.software.ibm.com for DB2 XML Extender
    XML for C++Windows, Solaris, Unix, LinuxC++ libraries with classes for parsing, generating and validating XML documentsFree
    XML and Web Services DENT, Win 2000XML Web development toolsFree
    XML Diff and Merge ToolAIX, Linux, NTReconciles, compares and merges XML documentsFree
    XML EnablerJavaMakes data viewable by any browserFree
    XML GeneratorJavaGUI DTD editorFree
    XML SecuritySuiteWindows, LinuxSupports digital signatures, encryption and access controlFree
    DB2 XML ExtenderWindows, Unix, Linux, Solaris, AIXAdds XML document support to DB2Free
    Infoteria Corp.
    Beverly, Mass.
    iPexC++ libraries for Windows, UnixC++ XML processing engine$150 up
    iMessengerNTRetrieves XML data e-mail$1,200 up
    Infrastructures for Information Inc.
    S4/TextMicrosoft WordForms XML editor working with Word$7,500(includestraining)
    Interwoven Inc.
    Sunnyvale, Calif.
    Ajuba2NT, Solaris, Red Hat LinuxXML author, server and manager30,000 up
    Intranet Solutions Inc.Eden Prairie, Minn.
    eXpedioNT, Win 2000, Unix, Linux, SolarisContent management and publishing$40,000 up
    Microsoft Corp.
    Redmond, Wash.
    Microsoft XML Parser in JavaWindowsXML parserFree
    Biztalk Jumpstart KitNTUses Biztalk schemas and Microsoft software to build Web pagesFree
    Open Text Corp.
    Waterloo, Ontario
    AelfredJavaDTD-aware XML parserFree
    SaxWindowsSimple API for XML driverFree
    Near and Far Designer XMLWindowsXML modeling and authoring tool$99 up
    SoftQuad Software Inc.
    XmetaL 2.0WindowsXML editor$495
    Software AG USA
    San Ramon, Calif.
    XML Starter KitNT, UnixDatabase and development environmentFree 90-day trial
    TaminoNT, Unix, Linux, OS/390Native XML database server$25,000 per CPU; $40,000 for Solaris


    • Records management: Look beyond the NARA mandates

      Pandemic tests electronic records management

      Between the rush enable more virtual collaboration, stalled digitization of archived records and managing records that reside in datasets, records management executives are sorting through new challenges.

    • boy learning at home (Travelpixs/Shutterstock.com)

      Tucson’s community wireless bridges the digital divide

      The city built cell sites at government-owned facilities such as fire departments and libraries that were already connected to Tucson’s existing fiber backbone.

    Stay Connected