Another View | High-performance baselines
- By Cray Henry
- Aug 21, 2008
One of the constant tensions we face in technology is finding the right balance between supporting diverse individual talents and keeping in place a set of common team processes.
Let's face it, the top talent in the information technology field is justifiably proud of their accomplishments. They are problem-solvers who find creative solutions. At the same time, we need to provide a consistently high quality of service for a diverse group of customers.
The High Performance Computing Modernization Program regularly confronts that issue. HPCMP provides computing resources for 4,000 scientists and engineers from the Defense Department's research and development and test and evaluation communities. Our customers are often working on problems that could have life-or-death consequences for current and future warfighters. They depend on us to set up systems in ways that promote productivity.
Because of the diversity of supercomputers and interests among HPCMP community members, there is a need to make processes, tools, computational environments and user service methodologies consistent so users can effectively and efficiently take advantage of the full range of HPCMP services. Different problems require different computational approaches.
Meanwhile, many of our customers must use multiple systems to solve their problems. They need flexible computing environments that let them explore different approaches, but they don't have the time or the inclination to deal with a bunch of unique systems.
The creation of the Baseline Configuration Initiative has proven particularly helpful. Established three years ago, the initiative's goal is to establish a common set of capabilities and functions across HPCMP centers so customers can work more productively as they shift work to different centers and systems.
Two fundamental questions inspired this initiative: What are the common operations HPCMP customers perform? How do we standardize those operations across a broad variety of supercomputers to maximize the productivity of all users?
To answer those questions, the program assembled a team comprised of systems administrators, the user community, and the customer service support group. Jeff Graham of the Air Force Research Laboratory Major Shared Resource Center was assigned to lead the initiative.
Since then, the team has created a consistent environment for multiple types of supercomputers and operating systems. Each system shares a minimal common set of commands, scripts, paths to key tools and libraries, environmental variables, and version-control practices. The challenge is finding those common elements that most can agree provide value to most customers.
Most system configuration attributes don't need to be exposed to customers, and many don't need to be standardized because there would be significant expense and risk in forcing a standard. Also, too much standardization can be a disincentive for people you've hired to solve problems. But in selected instances, customer productivity can be improved and our problem-solvers can appreciate the gain. The team works collectively to establish policies that make a meaningful difference for our customers and can be implemented in ways that avoid disruption.
There are now 18 established baseline configuration policies across six participating sites, with 13 others in development. The centers have implemented almost all of the policies. In some cases, systems cannot be fully compliant with a policy. Requiring full compliance for all policies on all systems is not cost effective or necessary.
The baseline configuration team has established a compliance assessment process that includes a policy notification/ feedback mechanism for users and center employees. The status on any noncompliant system, along with planned compliance dates, is available on a common Web site.
Together, these steps have helped HPCMP get closer to the goal of balancing a common technology configuration with the need to support customers without disrupting their work and asking them to learn and adjust to site-specific environments. Cray J. Henry ([email protected]) is director of the Defense Department's High Performance Computing Modernization Program.
Cray Henry is director of the Defense Department’s High Performance Computing Modernization Program.