SQL vs. XML in a database world
- By Joab Jackson
- Mar 16, 2005
Jonathan Robie, XML evangelist
In a sense, Jonathan Robie acts as a diplomat between two disparate technologies'Extensible Markup Language and relational databases. Communicating between those two worlds is not a task for the faint-hearted. Advocates of the Structured Query Language, which allows users to access information in relational databases, are quick to point out XML's deficiencies, while XML adherents question the value of a relational model.
Robie seeks to bridge these two worlds. He is one of the chief authors of XQuery, an SQL-like query language that can be used for searching both XML documents and relational databases. The World Wide Web Consortium, the international body that creates and maintains Web standards, is now reviewing XQuery as a formal draft.
In addition to XQuery, Robie works on a number of other W3C working groups and is co-author of XQuery from the Experts, published by Addison-Wesley. In his day job, he's the XML program manager at data connectivity software provider DataDirect Technologies Inc. of Bedford, Mass. Previously, Robie was an XML Research Specialist at Software AG. He spoke to GCN associate writer Joab Jackson by phone.GCN: How is XQuery different from Structured Query Language?
ROBIE: SQL is a relational query language, XQuery is an Extensible Markup Language query language. If all you are doing is querying relational databases, then SQL is the language you want. XQuery works best if you're querying XML or a combination of XML and relational sources.GCN: How did you first start working with XML? Did you know it would consume a large portion of your career when you started?
ROBIE: No. I was working for an object database company at the time  and a medical textbook publisher contacted us, wanting a Standard Generalized Markup Language database.
That was before XML. Since I was the person who knew the most acronyms in the company, the call was put through to me.
I spent a lot of my career working with things that do not fit naturally into relational databases'geographic information systems, multimedia systems or things like that.
The idea of an SGML database was exciting because the most important information in any organization is the document information. The standard clich' is that 90 percent of corporate data is not in databases, because it doesn't fit naturally in databases.GCN: How did that work on the SGML database lead to XQuery?
ROBIE: Joe Lapp [then the principal architect of webMethods Inc.] and I had done most of the work on a language called XQL.
In 1998, two query languages, XQL and XML-QL got a lot of interest within the W3C, so we got onto the W3C working group for XML-based querying languages. We spent the first year trying to figure out what our requirements were.
We started out by doing use cases. We came up with a document that had about 90 queries, so we took about seven query languages and tried them out with these 90 queries. None of them were perfect. [For XQuery], we took the best part of each of the languages.GCN: What was it about these use cases that made them unusual?
ROBIE: The fundamental thing about them was that they relied on the structure of XML. XML's logical structure is based on hierarchy and sequence. Two things that relational databases don't do is hierarchy and sequence. Yet Web sites typically get a lot of information from databases, but they don't put two-dimensional tables up on the screen. You create hierarchies for that information. In XML you structure everything with hierarchy and sequence.GCN: So how can XQuery help in that sort of situation?
ROBIE: Let me give you a scenario. Your Web site has a message coming in: Someone wants to ask for the price of some stocks.
Now we can join XML documents and various relational tables. The XML documents will identify the person who wants the information. It will give the data range that person is interested in. We join that against a relational database to figure out how the stocks have performed, and we build an XML structure that contains the result.GCN: It seems as if there is some resistance to the idea of using XML as a database tool from the relational database community.
ROBIE: Yeah, I read your interview with [database consultant] Fabian Pascal. It depends on who you're talking about.
Certain purists don't like it. If you remember back in the early days, some of the people who are kicking back against XQuery now were kicking back then against the use of null in SQL [to represent missing data].
They are the people for whom SQL and real relational databases just aren't pure enough and what should have existed never really got implemented.
I think that since the standards boards of the International Standards Organization and the American National Standards Institute are in the process of adding all of XQuery into SQL, it is just hard to claim that the SQL community is not accepting XQuery.
If you go to the major database conferences, you will see a lot on XML. Every relational database vendor has added XML support.
People are very conservative about their database choices. But the flow of information is just as important as the control of information.
For most information, if you want to exchange it, you use XML. People don't give every Tom, Dick and Harry a connection to their databases. And the way you get through the firewall is to send XML around.GCN: It seems repetitive to add an XML tag to each record in a database. Don't XML databases require more storage space than relational ones?
ROBIE: I think you are confusing physical characteristics with logical characteristics. In an XML database, you don't have to store the physical characters associated with every tag. If I have an XML view of the rows and tables of the relational database, it could be exactly the same size as the relational database.
If the data is in a native XML database, then it depends on the physical design of the structures used to store things. But there is no reason it has to be more storage-intensive than a relational database.
If you have a straight text file, then, yes, it will be large, but it is very repetitive information, which is easily addressed. There is work being done in binary XML, which is a much more compressed format.
A database isn't just a bunch of two-dimensional tables with data. You have all these indexes, and the indexes can make the data several times larger than the original. XQuery right now is read-only, and it has no idea what a table is, by the way.GCN: What does DataDirect offer in terms of XQuery?
ROBIE: We have two things. We are shipping today an XQuery implementation in Stylus Studio, which is an integrated development environment for XML.
We'll also be developing over the coming year DataDirect XQuery. That will be an XQuery implementation that works with all the major relational databases and with XML sources, either in files, or in the Java programming space. You can query an XML file, or an Oracle, Microsoft, IBM, MySQL or Sybase database.GCN: Do you enjoy working on standards boards? It seems like standards take a long time to develop.
ROBIE: Everyone gets frustrated by how long it takes. I've been doing standards since 1996. But I do feel that having several people with opposing points of view, who reach consensus on the requirements and on the design, results in a more general-purpose and better solution.