FAA corrects system bug, averts holiday travel crisis

FAA recently found and corrected a date-sensitive bug that would have brought the
Enhanced Traffic Management System (ETMS) to a halt Nov. 2.


"What we learned with this problem will be invaluable to us as we work on the year
2000 problem," said Bob Voss, FAA's chief of air traffic management.


ETMS takes a nationwide look at flight plans and displays them for controllers. The
system then examines the data and predicts where areas of high traffic will occur.


Airports and airlines use ETMS data to bring in more staff when there is extra traffic.
The system can run ground delay models to help airport officials compensate for variables
such as bad weather. Without ETMS, long delays would occur at every major airport, FAA
officials said.


The problem with ETMS was the 450 Hewlett-Packard Co. Apollo workstations deployed at
80 towers and control centers across the nation. In June 1996, Hewlett-Packard engineers
found that internal clock settings would make the systems stop working Nov. 2, 1997, at
14:49 Greenwich Mean Time.


"The HP Apollo computers tick away in quarter-second increments," FAA testing
chief Bob Fiepkiewicz said. "And the magic of that Nov. 2, 14:49 GMT date is that
it's the actual time the internal clock rolls over from being 31 bits to 32 bits."


When FAA set a clock forward on a test machine, all ETMS numerical values for flight
information turned negative. It was as if flights did not exist, and the system was
worthless. Fiepkiewicz said.


HP created a patch that fixed the operating system, but FAA's troubles did not end
there. In September 1997, FAA ran a full test of the system with the patch. The patched OS
ran without a hitch, but Pascal applications such as ETMS crashed.


"From there, the problem got significantly more complex," Voss said.


FAA was facing a tight deadline. Had ETMS failed, FAA would have had to track all the
airplane flights manually. But the high volume of daily flights would have made the task
next to impossible, Voss said.


"It would not have been a safety issue, but there would have been an economic
impact," he said, "and a lot of delays at airports."


FAA has some processes in place to scan for year 2000 problems in software, Voss said.
But the programs did not work with ETMS because Pascal programs, although date dependent,
don't call for a date as part of the calculation. They are virtually invisible to
automatic detection.


But FAA was relieved that HP at least identified the problem. Delois Smith, FAA product
leader for traffic flow management, said if HP had not found the problem with the
workstations, FAA probably would never have known the system was about to crash.


To fix the problem, FAA relied on old-fashioned code inspection, checking one line at a
time.


"They worked days, nights and weekends to fix the systems," Smith said.


FAA programmers examined 1.3 million lines of code and fixed 150,000 lines. They
finished Oct. 22. But even then, Smith said, they were not 100 percent sure the fixes
would work.


"We could not take the whole system down to do an all-out test," she said.
"There was still some nervousness when the day came."


FAA programmers were on standby as Nov. 2 dawned. As the hours rolled by, however, they
had little to do except cheer. The fix worked.


The 33-MHz Apollo workstations, which have 64M of RAM, are slated for replacement with
faster Unix machines. FAA will face the same kind of problem with most Unix machines, but
not until 2037, agency officials said.


About the Author

John Breeden II is a freelance technology writer for GCN.

inside gcn

  • analytics (Wright Studio/Shutterstock.com)

    3 data strategies to help crackdown on internal corruption

Reader Comments

Please post your comments here. Comments are moderated, so they may not appear immediately after submitting. We will not post comments that we consider abusive or off-topic.

Please type the letters/numbers you see above

More from 1105 Public Sector Media Group