Beyond the database giants

Leader of the open-source pack

MySQL has become the leading open-source database, part of a popular system stack that includes Linux and Apache.

(1) Its Query Browser allows you to manage multiple queries with a tabbed interface.

(2) A robust administrator tool makes managing the database easier, thereby improving uptime.

(3) Through MySQL Administrator you can set parameters to optimize the database's performance.

Bruce Momjian is one of the core developers behind PostgreSQL, an open-source database.

Although most of the LinuxWorld trade show attendees had left for the day, Bruce Momjian took the podium to spread the gospel: PostgreSQL, the open-source database management system he co-develops, is ready for mission-critical work.

'We're in the enterprise period now,' Momjian enthusiastically told lingering techies during an evening session at the February conference in Boston. No longer would the free database only be found in pilot projects and cash-strapped organizations, he predicted.

'We have the standards compliance, reliability and performance,' he said.

Earlier that day, Marten Mickos, CEO of MySQL AB of Sweden, similarly pitched his company's database software for large-scale jobs.

'When I speak to our biggest customers, they always say, 'We use MySQL not because the price is low but because the performance is stellar,' ' he boasted at a panel discussion. The company claims MySQL is used in 5 million locations, about 5,000 of which pay the company for support.

And Momjian and Mickos are not alone. Hoping to duplicate the success of the open-source Linux operating system, proponents of open-source databases foresee their wares supplanting pricier offerings from IBM Corp., Microsoft Corp. and Oracle Corp.

'In terms of technical capability, 95 percent of applications that use databases just don't care what kind of database it is,' said Robin Bloor, partner at IT consultant Hurwitz Associates of Waltham, Mass. 'A database is just a cupboard to store the data.'

Still, agencies considering an open-source database should proceed carefully. While freely available open-source databases can save money, the purchase price is only part of the expense of running a database management system. The initial software represents just 30 percent of the cost of implementing and maintaining a database system, Bloor said. Choosing the wrong database'even a free one'can be expensive in the long run.

According to Tom Rizzo, director of product management for Microsoft's SQL Server, 'The open-source perception is one of 'Hey, it is a free lunch.' But there is no free lunch.'

For Mariella Di Giacomo of Los Alamos National Laboratory Research Library, MySQL proved the best choice for building a 7-terabyte database. True, MySQL didn't have all the features Oracle offered, but Di Giacomo's project, building an electronic library of scientific-journal articles, did not need all those features. What the Library Without Walls Development Team did require'fault tolerance, load balancing and security via replication'MySQL possessed. Plus, MySQL was cheaper, even after the library contracted with the MySQL AB for support.

'MySQL has proven to be fast at handling links among billions [of] rows of data in several virtual tables,' Di Giacomo said.

However, open-source databases haven't always been considered suitable for such large jobs. In 2003, the computing division of the Energy Department's Fermi National Accelerator Laboratory compared PostgreSQL, MySQL and Oracle8. It found MySQL and PostgreSQL well suited for deployments where money was tight and performance was not crucial.

'The goal was to have a comprehensive comparison to allow users to make in- formed decisions as to which database would be applicable for their applications, and maybe more importantly, which would not,' e-mailed Julie Trumbo, group leader for the division's database group.

The group looked at 33 qualities, from price to how well the database performed query optimization. Oracle's database scored best in every category except price. But the two free databases did fairly well too, earning good or average ratings in most categories. And of course, unlike Oracle, they received top marks for price.

The fact that the Energy Department would even evaluate an open-source database in 2003 represented a significant milestone. When the PostgreSQL project got under way in 1996, it could best be described as experimental, admitted Momjian.

'The database had just come out of the university. It did not have the rigorous testing to prove it was 100 percent reliable,' Momjian said. In fact, early work concentrated on simply preventing PostgreSQL from crashing.

Another open-source database, Berkeley DB, also has scholarly roots. It was developed in the early 1990s at the University of California Berkeley as a basic database for the Berkeley Software Distribution, a Unix clone. Yet Berkeley DB shows how successful the open-source model can be. As free software, it became widely embedded in routers, operating systems, enterprise applications and other systems.

The company that now owns Berkeley DB, Sleepycat Software Inc. of Lincoln, Mass., focused development on users seeking a simple, fast database. Because it does not support SQL queries, Berkeley DB has very little performance overhead. Sleepycat claims this software is currently installed in over 200 million devices and computers. The company offers the software and its source code at no charge, but vendors who package it with a commercial application must pay a licensing fee.

'The availability of the source is a great assurance. You are not dependent on me. You could support Berkeley DB if Sleepycat weren't around,' said Michael Olson, CEO of Sleepycat.

Olson is not particularly bothered that government users, who rarely write commercial software, could use his database for free. Some may need support, which Sleepycat sells. But even if they don't, a free implementation still works in his favor.

'What I have done is taken that money off the table for my proprietary competitors,' Olson said. 'Yeah, I didn't get any money, but neither did anyone else.'

Commercial vs. open source

At its simplest, a database management system is software that organizes data into tabular rows and columns. A database engine adds, changes and deletes the data, usually using some variation of the standardized Structured Query Language, SQL.

'The database is only one part of a data management solution,' Microsoft's Rizzo said. He estimated that it would take a team of developers about a week to build new database software from scratch. In many cases, the support tools are more important to customers than the basic database functionality, he argued.

When customers purchase Microsoft's SQL Server, for instance, Rizzo said they also get a whole collection of additional integrated tools. For instance, online analytical processing software can build reports and analyze data. SQL Server also has a self-tuning feature, which automatically optimizes queries so they will run as quickly as possible.

Nonetheless, open-source advocates are doggedly trying to get beyond their experimental beginnings and offer the features that commercial companies do. Momjian said over 250 volunteer developers around the world worked on the latest edition of PostgreSQL, Version 8.0, released in January,
'PostgreSQL is one of the most rapidly advancing databases out there. Because we're an open-source database developed by a community around the world, we're able to harness that brain power,' Momjian said.

Version 8 includes some important data recovery features, most notably save points and point-in-time recovery. Save points are used to roll back transactions so a failed transaction can be returned to some earlier stage. Point-in-time recovery continuously logs all changes and can be used to recover lost data, should disaster strike.

'The two things that people said that they needed to have to move from Oracle were save points and point-in-time recovery,' Momjian said. 'I'm surprised we've gone so long without those features.'

The new version of MySQL, to be released in June, will also offer much-needed enterprise features, according to Zack Urlocker, vice president of marketing for MySQL. Version 5 promises to be more suited for heavy transactional environments, not traditionally MySQL's strong suit. MySQL owes its popularity to the tremendous growth of the Internet in the late '90s, Bloor said. Internet service providers downloaded the free software, which worked well for storing read-only Web information. Enterprises, however, might prefer databases that can write to disk faster.

To meet this need, MySQL 5.0 will feature XA support, which allows the system to handle distributed transactions. Using XA support, processes can be spread across multiple databases. Version 5.0 will also introduce enterprise database staples such as stored procedures, triggers and views.

One open-source database that has long claimed enterprise chops is the Interactive Graphics Retrieval System, or Ingres, now owned by Computer Associates International Inc. of Islandia, N.Y. CA purchased the database in 1994.

The fact that most people have not heard of Ingres isn't due to any technical shortcomings, the company says. Rather, the problem with Ingres was lack of popularity during its commercial days, according to Yogesh Gupta, chief technology officer for Computer Associates.

'By 1994, Ingres was number three in a three-horse race. The race had already been finished,' Gupta admitted. By setting loose the source code, the company hoped to attract a larger community of supporters.

Today, Ingres has an installed base of about 25,000 paying customers. Computer Associates positions Ingres as an alternative to large heavily transactional databases. Ingres can match Oracle9i or IBM's DB/2 feature for feature, said Tony Gaughan, the senior vice president of development who oversees Ingres. CA itself uses the database in all the enterprise infrastructure software it sells.

Support costs

Even as open-source databases approach technical parity with their pricier brethren, their value could still be limited by dicey support, critics contend.

'When considering databases, customers should consider not only the product's features and capabilities, but the ecosystem that surrounds the product,' said William Hardie, senior director of database product marketing for Oracle. 'How many customers use the product in a production environment? What resources does the vendor apply to the ongoing development of the product? How many hardware and independent software vendor partners support the product?'

The open-source community is offering increasingly sophisticated support, in addition to the much-touted option of e-mailing the software's actual developer.

In February, MySQL unveiled a set of services aimed at enterprise customers, featuring automated updates, alerts and phone support. Subscriptions to MySQL Network range from $5,000 per server per year (with guaranteed 30-minute response time to queries) to $595 per server per year, Urlocker said.

MySQL has also attracted third-party companies that extend the database's functionality. Emic Networks of San Jose, Calif., offers clustering software that allows users to run a database across multiple servers. M/Cluster works much like Oracle's Real Application Clusters, said Eero Teerikorpi, CEO of Emic. By spreading a database across different machines, users can balance the load and improve reliability of the system as a whole.

PostgreSQL has also found commercial support, at least from one established vendor. In January, Pervasive Software Inc. of Austin, Texas, started offering support services for PostgreSQL. The 300-employee company has long offered its own embedded database, Pervasive.SQL. But when it wanted to expand beyond the embedded marketplace, it adopted PostgreSQL instead of developing its own database, said Lance Obermeyer, Pervasive's director of products. The company offers its own implementation of PostgreSQL, called Pervasive Postgres, on its Web site. The company also offers support service ranging from $195 to $4,995 per year.

While such companies may not be able to offer the same depth of support as Microsoft or Oracle, they can still be a good deal if they fit an agency's needs. Agencies that outsource their database support should evaluate each product for its particular strengths and weaknesses, and not be distracted by purchase price, Bloor said.

And for agencies that maintain their databases in-house, either for security reasons or because of an abundance of technical expertise, an open-source database could provide the best deal of all.

'The more technically proficient people you have, the more open source makes sense,' Bloor said.


  • Records management: Look beyond the NARA mandates

    Pandemic tests electronic records management

    Between the rush enable more virtual collaboration, stalled digitization of archived records and managing records that reside in datasets, records management executives are sorting through new challenges.

  • boy learning at home (Travelpixs/

    Tucson’s community wireless bridges the digital divide

    The city built cell sites at government-owned facilities such as fire departments and libraries that were already connected to Tucson’s existing fiber backbone.

Stay Connected