NIST test puts software analysis tools through their paces
Evaluation provides a benchmark for further research into the development of these complex tools
The National Institute of Standards and Technology (NIST) has published the results of its first Static Analysis Tool Exposition (SATE), a side-by-side evaluation of tools used to find security flaws in software.
The tests, conducted last year, put eight static analysis tools through their paces in analyzing test programs for flaws that could result in security vulnerabilities. Results were evaluated by researchers from NIST and MITRE to better understand the state of art of software analysis.
“Overall, tools handled the code well, which is not an easy task for the test cases of this size,” the authors concluded in the SATE report, Special Publication 500-279. “In particular, the tools can help find weaknesses in most of the SANS/CWE Top 25 weakness categories.”
However, the researchers also concluded that there were weaknesses in the SATE tests.
“Due to complexity of the task and limited resources, our analysis of the tool reports is imperfect, including lapses in analyzing correctness of tool warnings,” they wrote.
NIST emphasized that the tests are not intended as competitive evaluation to rank the tools, because there is no adequate set of metrics for all aspects of their performance. Rather, the tests set a benchmark for future research and development of these software assurance tools. SATE was envisioned as an annual event and NIST is planning the next one.
As programs grow longer and more sophisticated, and are required to interact with an increasing number of other programs, the possibility of errors or features that can be misused resulting in security vulnerabilities increases. The number and subtlety of attacks from hackers has also increased. Because it is impossible to anticipate every combination of input a given piece of software might receive, static analyzers use mathematical and logical tools to rigorously predict the behavior of the program and examine it for weaknesses.
“Most modern software is too lengthy and complex to analyze by hand,” said NIST software assurance expert Vadim Okun, an author of the report. “Additionally, programs that would have been considered secure 10 years ago may now be vulnerable to hackers. We’re trying to focus on identifying what in a program’s code might be exploitable.”
The tests were conducted in February 2008 and the results originally reported at the Static Analysis Workshop held in Tucson, Ariz., that June. The final report was released by NIST in June.
Static analysis tools participating in the test were Aspect Security ASC, Checkmarx CxSuite, Flawfinder, Fortify SCA, Grammatech CodeSonar, HP DevInspect, SofCheck Inspector for Java, University of Maryland FindBugs and Veracode. In addition, Aspect Security performed a human code review.
The tests included two tracks, one for programs written in C and one for Java programs. These languages were chosen because of their popularity and the number of analysis tools that support them. There were three open-source test programs for each language. For C there was a console instant messenger, a host service and network-monitoring tool, and a Web server. For Java there was a network management system, a forum and a document management system.
One of the findings from the tests was the differences and synergies between human and automated reporting and analysis. “While human review is needed for some types of weaknesses (e.g., some authorization problems), tools can quickly find hundreds of weaknesses,” the report said. “Sometimes the human describes the cause of the problem at a high level, while the tool provides the specific vulnerable paths for the instances of the problem.”
However, the complexity of the job of finding security flaws and the current state of the art made it difficult to do meaningful evaluations.
“Due to the variety and different nature of security weaknesses, defining clear and comprehensive analysis criteria is difficult,” the report said. “As SATE progressed, we realized that our analysis criteria were not adequate, so we adjusted the criteria during the analysis phase. As a result, the criteria were not applied consistently. For instance, we were inconsistent in marking the severity of the warnings where we disagreed with tool’s assessment.”
Some significant elements were not evaluated. “We did not consider the user interface, integration with the development environment, and many other aspects of the tools. In particular, the tool interface is important for a user to efficiently and correctly understand a weakness report.”
For these reasons, the exercise is seen as a starting point for future work that will carry evaluation of static analysis forward. Tool makers interested in participating in the next SATE can contact Okun at firstname.lastname@example.org or 301-975-3268.