NASA uses diagnostic tools to manage the shuttle
NASA uses diagnostic tools to manage the shuttle
Complementary software gives operations center the big picture and a detailed look at problem areas
By Drew Robb
Special to GCN
Anyone who has connected a couple of computers in a home network knows it won't work unless a dozen parameters are precisely configured. Adding more computers does more than add variables, it also multiplies the number of possible errors.
Imagine the complexity, then, of NASA's enormous networks. The challenge of keeping everything functioning well is compounded by mission-critical conditions, the very real risk of lives lost and a million-dollar penalty for each day a launch is delayed.
To keep its vital information flowing, NASA's shuttle operations network specialists at Kennedy Space Center use Sniffer Total Network Visibility from Network Associates Inc. of Santa Clara, Calif., in conjunction with Hewlett-Packard OpenView Network Node Manager.
These tools help locate problems such as slow performance, timeouts or failure to detect certain devices before they snowball into partial or complete network failures, said Matthew Guessetto, the network engineer.
To monitor the shuttle operations network, Guessetto employs OpenView Network Node Manager. It provides an overview of network health, as well as troubleshooting capabilities that let him respond to problems. It operates a range of 20 distributed and portable packet analysis tools to gain visibility into Ethernet, Fast Ethernet, Gigabit Ethernet, token-ring and Fiber Distributed Data Interface connections.
If this monitoring software detects slow or no response, it automatically produces an alarm. Alternatively, a user who detects a problem can send notification. In either case, the console operator generates a trouble ticket, and the matter is rapidly taken care of either remotely or by dispatching technicians to a site.
Network Node Manager is deployed across a network of 9,000 users, 5,000 of whom are NASA employees; the others are mostly Lockheed Martin Corp. and Boeing Co. personnel engaged in shuttle launch.
The network backbone is made up of three 100-Mbps FDDI rings bridged with routers from Nortel Networks Corp. of Brampton, Ontario. To increase bandwidth, this transmission line is shifting to 100-Mbps full-duplex switched links with plans to upgrade to gigabit capacity by 2001.
The network sprawls over 90 facilities at Kennedy Space Center. The equipment being used to connect the larger lines to desktop systems is a complex array of devices including 15 routers from Cisco Systems Inc. of San Jose, Calif., that feed 100-Mbps Fast Ethernet and 10-Mbps Ethernet switching hubs and wiring concentrators from multiple vendors. The network uses three different types of Ethernet: shared 10-Mbps Ethernet, switched 10-Mbps Ethernet and switched 100-Mbps Fast Ethernet. It also integrates an array of token-ring 4-Mbps and 16-Mbps hubs.On tap at all times
A 35-member team provides network infrastructure support for 20,000 Kennedy workers in NASA space flight operations. This squad, managing everything from the backbone to the desktop, is the Network Support Group of United Space Alliance of Houston, the prime contractor for NASA's Space Shuttle Program. Central to the team's success are network diagnostic and management tools, as well as built-in redundancy.
'In the rare event that a large sector of the backbone should go down, we turn to the independent network running outside the backbone,' Guessetto said. 'By querying each network device in serial, the problem can be isolated, and a technician can be dispatched to a specific location.'
Network Associates' TNV Sniffer troubleshoots specific problems with applications, protocols, routing issues and slow performance. It clarifies whether the network is accepting and transporting packets. Remote TNVs are deployed across the network, operating on both the desktop-to-backbone lines and the main transmission line itself.
Used as complementary applications, OpenView shows the broad picture while Sniffer pinpoints details of what is wrong. Without affecting users, the distributed TNVs run full-time on the network to remotely troubleshoot isolated incidents, including user-originated trouble tickets.
'TNVs allow us to analyze conversations between devices,' Guessetto said.
The Sniffer console controls multiple TNVs from one location and shows what is happening between machines experiencing a communication problem. Troubles are diagnosed remotely to determine whether to send a technician. Guessetto gave examples such as the detection of insufficient server horsepower and duplicate IP addresses to show the value of network traffic analysis tools.
'Sometimes we get an alarm, and it's a hardware problem,' he said. 'OpenView identifies the exact problem so we don't need Sniffer to find out. Other times, it's the application software acting up, and the TNVs let you see what's wrong with it.'