A COOP plan that holds water
Tax and Trade Bureau data center survives flood and outage ' thanks to disaster recovery planning
By my estimate, we were less than a minute from getting electrical arcing [which would have fried the equipment]. ' Mike Borland, Tax and Trade Bureau
GCN Photo by Rick Steele
A continuity-of-operations plan isn't worth much if it sits on a shelf, says Mike Borland, assistant chief information officer for infrastructure at the Treasury Department's Tax and Trade Bureau. 'You have to test it on real conditions.'
Earlier this month, Borland and TTB staff had a chance to test their COOP and disaster recovery plans against a cold splash of reality.
On Monday afternoon, July 9, a water pipe burst on the sixth floor of the bureau's Washington headquarters on G Street, N.W., a few blocks east of the White House.
The fiber that runs the bureau's network acted like a conduit to help funnel water down into TTB's data center on the third floor.
Soon, the data center's NetBotz monitoring tool alerted the information technology staff that there was moisture in the center.
'A bunch of us ran over there,' Borland said. He found an inch and a half of water pooling by the electrical junction boxes that tie the servers together.
Borland made the decision to 'push the big red button,' a kill switch built into the uninterruptible power supply that shuts everything off.
'By my estimate, we were less than a minute from getting electrical arcing,' which would have fried the equipment, he said.
Fortunately no one was hurt. And although the G Street systems were down for two days, TTB's network was back up in several hours with no loss of data. 'As a bureau we had no lost productivity, and we continued our mission as if nothing had happened,' said Robert Hughes, TTB's CIO and associate administrator of information resources.
For more than a year, TTB has had a COOP and disaster recovery plan in place for just such an event.
With its Ground Zero location in downtown Washington, D.C., the bureau had organizational information technology survivability as one of its major goals, Borland said. Replication was the main strategy.
The bureau runs a failover site at the National Revenue Center in Cincinnati. The two geographically dispersed locations 'give us superb survivability,' Borland said. 'We've got two sets of everything ' e-mail, file storage, servers ' so if we lost one, the other one could handle it.'
Last September, Borland and the IT staff tested the replicated system by taking down the main data center in Washington and failing over to the Cincinnati center for about a week while they upgraded the UPS. No data was lost, he said.
Then, on the evening of July 4, powerful thunderstorms rolled through Cincinnati and knocked out power to the entire downtown area. TTB's data was recovered by the next day, Borland said.Water everywhere
So when the pipes burst in Washington a few days later, the COOP plan went into action once again. Headquarters employees had to work from home for two days, but the actual system downtime was less than four hours, Borland said.
'Everybody had to leave the headquarters,' he said. 'It just wasn't habitable.'
About 100 employees and contractors were displaced, including the IT staff, regulatory and labeling specialists, the general counsel and the administrator.
Every employee had a secure token device from RSA that let them access the network through any Internet connection.
They could get into their e-mail and files by tapping into the replicated system in Cincinnati using a virtual private network that connected to a Citrix farm, Borland said.
'A lot of our users had never tried accessing their e-mail through Outlook Web Access,' he said. 'But it worked extremely well.' So well that some employees want to keep using it postdiluvian.
On Tuesday, July 10, TTB called Data Clean, a company that specializes in cleaning up flooded computer rooms. Two days later, the network was restored to all three floors of the bureau, and operations were pretty much back to normal.
A week later, the team had one more chance to show their collective mettle. July 19, severe afternoon thunderstorms knocked out power to the Cincinnati data center yet again.
Power was out for about 31'2 hours. Thanks to quick action by staff and a robust UPS, no systems were damaged ' and e-mail and network file storage remained available throughout the outage.
'Our staff is great,' Borland said. 'But we're looking for the person who has placed the Chinese 'May you live in interesting times' curse on us.'
TTB's government IT staff consists of five employees: the CIO and four assistant CIOs. Borland's entire operations staff is outsourced ' systems administrators, network administrators, program and project managers, help-desk analysts, systems analysts and IT planners. The contract personnel are employees of Booz Allen Hamilton, RS Information Systems and its subcontractors.
Hughes praised the staff for its quick response to the flooding. 'We were able to bring up our backup data center in four hours without a single lost
e-mail or file,' he said.Normal operations
Hughes also noted a link between telework and the bureau's response to the emergency. TTB staff has had a lot of practice using the bureau's remote-access capabilities, he said.
About a quarter to a third of TTB personnel work from home full time. More than half telework at some point, and most TTB staff have remotely accessed the bureau's IT resources at least once. 'It seemed like normal operations rather than an abnormal situation,' he said.
The bureau learned a couple of lessons from the flooding, Borland said.
'We already knew that [the COOP and disaster recovery plan] worked in theory and in practice ' when we had plenty of time to shut everything down,' he said. 'But in an actual emergency, in an unleisurely, frenetic way, it all worked ' in the main.'