Patch management makes life easier at NASA
<b>Agency deploys PatchLink Update to 80,000 machines</b>
- By William Jackson
- Sep 09, 2005
'Reaction is no longer an option. That was the ultimate justification for a patch management system.'
'Michael Castagna, NASA IT security officer
Managing and automating the process of patching its diverse IT infrastructure has helped NASA get a green rating for IT security on the President's Management Agenda.
NASA IT security officer Michael Castagna said the need for an automated system became clear to him in January 2003.
'The epiphany came on a Saturday morning, when I visited the Web site incidents.org,' he said. There he learned of the SQL Slammer worm, which tore through vulnerable computers throughout the Internet in a matter of minutes.
'NASA did fairly well with Slammer,' Castagna said. Firewalls were able to block most of the malicious traffic. But the speed with which the worm spread made it clear that firewalls and rapid reaction were not adequate IT security. 'Reaction is no longer an option. That was the ultimate justification for a patch management system.'
The tool the space agency settled on was PatchLink Update from PatchLink Corp. of Scottsdale, Ariz., which now is deployed on more than 80,000 computers across NASA.
Mark Page, the enterprise architecture lead at the Kennedy Space Center who spearheaded the deployment, likened the deployment to 'herding a whole lot of cats.'Final notice
Slammer may have delivered NASA its final notice that patch automation was mission-critical, but the agency had begun automating the process several years earlier.
'We saw this train coming relatively early,' Castagna said, and work had begun on an in-house tool.
It was Page, who came to NASA from Microsoft Corp., who began work in early 2001 with a script file that would roll up and load Microsoft patches. After the terrorist attacks of Sept. 11, money became available to further develop and enhance the tool. The need to handle more than Microsoft patches soon became clear.
'The majority of my machines are Windows-based,' Page said. 'But NASA being a scientific organization, a good percentage of them are Unix or Macs.'
So Page was asked to include Red Hat Linux in his tool, and then Solaris and HP-UX.
'It got out of hand pretty quickly,' Page said. 'And then they wanted to be able to do reports. It became evident that we needed a commercial product.'
It was after Slammer made its appearance that NASA conducted a make-or-buy analysis for a patching automation tool.
'We put together an exhaustive analysis plan,' Castagna said. The agency compared six vendors on cost, workflow capability, maturity, support for multiple operating systems, checks and balances in rolling out patches, validation and reporting, and the company's willingness to partner with NASA.
Patchlink Update supported Windows as well as most flavors of Unix, and it also had basic reporting capability and a Web interface for easy administration. 'The only thing left out that we needed was Mac,' Page said. So PatchLink built an agent for computers running the Mac OS.
'We wouldn't have a Macintosh agent if it wasn't for NASA,' said Chris Andrew, PatchLink's vice president of product management.
That willingness to accommodate the agency's needs was a key consideration in choosing the product.
'Patch management is a rather nascent technology,' Castagna said. 'There is never going to be a perfect fit, especially in an enterprise deployment,' so partnering with the company to customize the tool was essential.
PatchLink keeps NASA several months ahead of most other customers with early releases of products and features, Page said.
'NASA was an interesting account from the inception,' Andrew said. 'They have every kind of network you can imagine, all managed together.'
PatchLink Update is a server-agent system with an online service for delivering new vendor patches to the customer's server. Agents residing on client devices scan the system to determine what patches are needed and if they can be safely installed. They also can confirm successful installation.Preferential treatment
Using client agents, which must be deployed to every system, rather than a remote scanning system adds to the administrative overhead of a patch management tool, but the benefits outweigh the burden, Page said.
'The problem with scanning is that it only gives a snapshot,' he said. 'My users are extremely mobile, and at any given time 10 percent of my machines are in flux,' going in and out of service, moving and changing IP addresses. Agents also can help with asset management. 'I quickly ruled out the agentless side of the house,' Page said.
Despite the automation, the decision to distribute and install a patch lies with the system administrator. Because of the possibility of damaging a system with a patch, the best practice is a cautious, phased rollout; first on segregated and non-critical systems, and then to wider network segments.
PatchLink itself tests patches on a library of 250 system configurations before distributing them, 'but at the end of the day, we don't have the exact configuration that every customer is running,' Andrew said.
Because of this, the administrators responsible for each machine handle testing and patching. This means that each NASA center has several PatchLink Update servers. Although a single server can support more than 20,000 nodes, 'no one person owns all of those nodes,' Andrew said. 'Ownership is the limiting factor' in the number of servers needed, 'not scalability.'
Distributed ownership and responsibility complicated the PatchLink deployment at NASA. Page did not want to force the technology on other sites.
Technically, the IT security office had the authority to require the use of PatchLink, 'but he who has the money makes the rules,' Page said, and each center had its own budget. So in spring 2003 he began the job of persuading NASA centers to adopt PatchLink. It was not a hard sell, Castagna said. In August of that year, the agency acquired an enterprise license for it.
'It took us about a year to get it out there,' Page said. 'The first 70 percent was easy, the rest was harder.'Proud of progress
System complexity and configuration issues made patching later machines more difficult, but Page is proud of the progress made in that year. 'I think I've got 94 percent of our machines covered,' he said.
With a three-year technology refresh cycle at NASA, it is unlikely that the figure will ever be 100 percent, and 90 percent coverage would be considered a victory.
'Twenty-five to 30 new machines pop up on my network [at the Kennedy Space Center] every day,' Page said. 'And that goes on at every center. At any one point in time, five percent of my machines are not covered because they're brand new, and another five percent are gone on travel.'
NASA recently implemented a baseline patch deployment program agencywide, in which new patches are evaluated and prioritized.
'We tend to do it after Microsoft's Patch Tuesday, since Microsoft usually accounts for the bulk of the patches,' Page said. 'Every month we evaluate.'
Patches determined to be critical are added to the baseline in two to five days. Actually getting them installed takes a little longer.
'It tends to take three to five days, even on critical patches,' Page said. 'People are not comfortable putting these things on all the time, and rightly so. '
There is a waiver procedure for deviating from the baseline program for some mission-critical systems. There, critical systems are tightly controlled and partitioned from the Internet. They're patched less frequently, although it's done on a strict schedule.
NASA is now beta testing a new version of PatchLink Update with stronger enterprise reporting capabilities, something Page and Castagna are excited about.
'We look at the enterprise as a whole,' Castagna said. 'We want a decentralized approach to patch management and a centralized approach to reporting.'
Page sees a possible downside to the new capabilities, however.
'I know the guys in D.C. are already salivating over this,' he said. 'We are going to be so busy building reports when they start seeing the data that I'm going to have to start saying no.'