Ride the next big wave: server clusters
Network administrators might as well get ready now, because there's no sidestepping the
next big server trend: clusters. From the workgroup server under the desk to the mammoth
workhorses in the data center, clustering soon will be everywhere.
Microsoft Cluster Server software, code-named Wolfpack, is already part of Microsoft
Windows NT Server 4.0 Enterprise Edition. Expect a more robust version in next year's
release of NT Server 5.0 (see story, Page 32). That event will encourage many
administrators to convert their network topologies from single servers to binary and
beyond. Cluster technology will dominate the server side of the enterprise for the next
The GCN Lab offers a short course in Clustering 101.
The server operating system makes up a small portion of clustering readiness.
Applications also have to be cluster-aware, and the server hardware and network
infrastructure will need more administrator attention.
In simple failover, the most basic form of clustering, two servers make a cluster. If
one fails, the other takes over. But if the surviving server comes up short in processing
power, memory or storage resources, it too can fail or slow users down.
Microsoft Corp. certifies clustering hardware and configurations for Cluster Server.
Compaq Computer Corp. is certifying as many of its recent ProLiant servers as possible.
Other makers such as Dell Computer Corp. are producing servers specifically for
clustering--for example, the Dell PowerEdge 6100 Cluster.
Hardware certification is crucial. Microsoft will not support its Cluster Server on
unapproved hardware or configurations.
Although clustering can be done in many ways, simple failover will gain the broadest
use, because Microsoft Cluster Server will be so widely available in NT 5.0.
In a failover setup, the two clustered servers connect to each other via dedicated
network interface cards. The NICs exchange is called a heartbeat signal. If an
application, resource or server fails, the heartbeat stops, and an alert goes out for the
other server to take over.
Recovery can take up to 30 seconds, depending on the setup and applications. For
example, a network resource might be polled once per second through the heartbeat. If
after two consecutive pollings the resource is judged dead, the server will restart or
fail over to the other server. A restart attempt delays failover. If restart is attempted
four times, downtime before failover would total eight seconds.
Data loss is supposed to be limited to nonshareable information on the client and
possibly data from the process that was executing when the failure occurred. Users would
notice nothing but a slight delay.
The failover cluster, though it has two servers, appears to the user as one virtual
server. Each physical server knows its own TCP/IP address plus the address assigned to the
Microsoft's two-node cluster will work with shared data files, networked printers and
World Wide Web pages under its Internet Information Server product.
Other server applications fall into cluster-aware and nonaware categories. Some of
Microsoft's most recent products such as Exchange Server 5.5 and SQL Server enterprise
editions are now cluster-aware.
Cluster-compatible, as opposed to cluster-aware, applications come from other software
companies, such as Lotus Development Corp. Lotus officials maintain that Lotus Domino
Server 4.6 coexists with Microsoft Cluster Server, but Domino's Wolfpack support is
Domino does its own proprietary clustering but doesn't support any other applications,
just as linked, distributed e-mail servers don't. Domino permits up to six nodes; each can
run a different operating system.
Novell Inc. soon will announce a clustering package code-named Orion. It might be
released concurrently with or as part of NetWare 5, and it is expected to do 16-node
clustering. Beta versions of cluster-aware Novell applications probably will arrive by
The next version of Windows NT may or may not expand beyond simple two-node failover.
Eventually NT could support as many as eight or 16 nodes under distributed parallel
clustering. Here, the servers no longer exchange a heartbeat. They communicate through
their own system area network (SAN), which is separate from the LAN or WAN.
SAN communication becomes even more important than the number of processors, because
the SAN relays so much information.
Under distributed parallel clustering, storage is split off from the server CPUs. Each
server has redundant arrays of independent disks or other high-bandwidth storage
subsystems. The SAN must handle not only requests from the clients and other normal server
functions but also the data storage.
As clustering products gradually emerge, load balancing will become the industry's war
Load balancing means directing routines to individual processors.
The cluster OS makes the decisions on where to send tasks to equalize the workload
among different servers and processors. The task will become highly complex as four-way
and perhaps eight-way servers appear on enterprise networks.
A distributed parallel cluster of eight servers with four processors apiece has 24
possible routes to compute something. The processors don't work in concert as in a
massively parallel system. Instead, each works separately on a manageable task.
An alternative to this scenario is Oracle Corp.'s Parallel Server for NT, which does
distributed cooperative clustering on up to four nodes.
Although some load-balancing occurs, the four nodes converge into a single, shared
storage subsystem that must provide highly reliable storage. If it doesn't, the number of
processors won't matter, because data will be unavailable.
In Wolfpack's two-node failover, both servers must be capable of handling the whole
load, so load-balancing is not an important consideration. Adding a second, similar server
to an overburdened 200-MHz Pentium Pro server would improve performance little if at all.
If one of the servers cannot handle the full load, it will fail when needed, leaving no
server. Therefore, under Wolfpack, both servers must be robust and almost identical in
Compaq may get all of its ProLiant line certified, but more recently designed products
will likely be better optimized for a clustered environment.
Compaq offers an excellent CD-ROM with a step-by-step guide on how to set up a cluster.
Compaq provides the CD exclusively to GCN readers.
Call 800-392-9299. When prompted for the reseller code, enter 5555. You must give your
name and address.