Mirrored nodes protect data

Mirrored nodes protect data

Alaska center's IT team creates a unique high-availability backup cluster

By Alan McCollough

Special to GCN

At the Alaska Native Medical Center in Anchorage, the largest such center of the Indian Health Service, technical managers recently set out to ensure reliable access to the applications and data that thousands of users rely on daily.

Their goals were high availability during system failures as well as data survival in case of earthquakes, fires or other disasters.

Bill Barnes, right, transformed a closet at the Alaska Native Medical Center into a backup computer room. The author, Alan McCollough, left, is Web programmer.

Bill Barnes, the Unix system administrator, had already seen an electrical transformer failure put the server room on a countdown under battery backup. After a mad scramble to shut down nonessential servers and run extension cords to separate electrical circuits, he'd restored power with barely 10 minutes of battery life remaining.

'The idea of a hot backup computer room came out of my analysis of single points of failure as part of our deployment of a high-availability computer cluster,' Barnes said. 'As long as all of our computing assets were housed in a single computer room, the hospital systems would be vulnerable.'

In fact, not long after the cluster was installed, the main transformer feeding the computer room failed. Electricians had to do rapid bypass wiring while the uninterruptible power system still functioned.

Barnes began wondering what would happen if another power failure kept the computing facility down for several days or weeks.

'It didn't take much selling to management' to get funds to modify an unused wiring closet for a backup computer room, he said.

The small, separate room would house a redundant set of servers in a cluster working in conjunction with the primary servers. Connectivity between the two rooms would be a combination of 100Base-TX Ethernet and optical fiber.

Extra switches

To ensure high availability of networking components between the two server rooms, the center installed additional switching equipment.

'Part of what made a second computer room possible was the use of SecureFast virtual LAN software [from Cabletron Systems Inc. of Rochester, N.H.] on the core network switches,' chief information officer Keith Creiglow said.

The software lets the Cabletron switches run a meshed network using rudimentary open-shortest-path-first routing to bypass downed switches. All the center's computer closets are connected to both computer rooms for redundancy on the network side as well as on the server side.

About the same time, the author decided that he would take a crack at building a test server cluster from two spare 166-MHz Gateway Inc. Pentium systems. Within a week, the test server cluster was working, and the go-ahead was given to modernize the Gateway ALR 9200 server cluster and find each node of the cluster in a separate server room.

After a complete reinstallation of Microsoft Windows NT 4.0 Enterprise Edition with Microsoft Cluster Services, two 200-MHz Pentium Pro processors were added to each node, bringing each server up to four processors.

Each node also got an IBM SerialRAID/X Adapter card for access to the IBM Corp. Serial Storage Architecture (SSA) drive pool.

One node of the Cluster Services cluster runs Microsoft Internet Information Server and ColdFusion, an application server from Allaire Corp. of Cambridge, Mass. The other node provides database access via Microsoft SQL Server 7.0. If one node fails, the other node can provide both Web and SQL Server services until repairs are made.

A server cluster needs common access to a shared drive array so that if one node dies, the other node can continue to provide service. Most cluster servers are built into a rack and connect to a common RAID array with SCSI cables. That was how the original Gateway 9200 cluster was configured.

Barnes, however, was already using IBM's SSA extended technology for some of the AIX servers on the center's campus. SSA can link up to 96 hard drives in a loop configuration, unlike SCSI's straight-line configuration. SSA drives mount in special cabinets, which connect to the host computer's SSA controller card.

Connections between SSA devices can be either copper or fiber cables, but fiber makes it possible to locate SSA devices up to 10 kilometers apart.

The Alaska Native Medical Center built an ultra-high-availability cluster to guarantee uptime during system failures, fires or other disasters.

Now a new idea came to light: combining the failover capability of clustered servers with SSA's remote location capabilities. The staff built a server cluster with each node physically isolated from the other, and with a mirrored set of data stored separately.

Exhaustive research and planning resulted in a disaster-resistant, high-availability configuration. Using IBM SSA Disk Administration tools, the staff configured the disk resources so that a mirrored set of data resided in each server room. Should one room be destroyed, a complete set of data and servers would still exist in the other.

A software update to the SSA disk configuration tools made it easier to configure the drive arrays so that a mirrored set of data was available in each location.

A few reinstallations produced the desired remote mirrored array. To test it, the staff disabled the fiber connections between the two rooms and shut down the remote SQL Server node and the SSA drive array in the remote room. The Web server node in the main server room successfully took over providing both Web and SQL services, using the available SSA drive array.

OS harmony

Both Windows and AIX operating systems are now running on various server clusters. Microsoft Cluster Services operates on those running NT 4.0, and IBM High Availability Cluster Multi-Processing operates on the AIX servers. IBM's SSA technology provides sharing on the drive arrays used by the cluster server nodes.

The unique ultra-high-availability server clusters maintain crucial hospital services such as the Resource Patient Management System, Microsoft Exchange e-mail, intranet business applications and online medical research libraries.

'The thought of having all our eggs in one basket was truly scary,' Creiglow said. 'Having our clustered servers split between two computer rooms gives redundancy not found in most configurations. This certainly makes it easier for me to sleep at night.'

Alan McCollough is a Web programmer and certified ColdFusion developer at the Alaska Native Medical Center.

Stay Connected

Sign up for our newsletter.

I agree to this site's Privacy Policy.