Want to avoid software snafus? Here's a good place to start.

The National Institute of Standards and Technology has dramatically expanded its public dataset of software flaws to help developers and analyzers avoid weaknesses in their programs.

The Software Assurance Metrics and Tool Evaluation (SAMATE) Reference Dataset contains examples of errors in a number of popular programming languages that could leave software vulnerable to exploits by hackers and criminals.

Version 4.0 of SAMATE contains 175 broad categories of weaknesses with more than 60,000 specific cases. This release has more than doubled the number of categories and added 30 times the number of examples from the previous release.


Related story:

Updated SCAP specs aim to improve automated security checks


“This is an enormous step toward bringing methodical science to the hard question of bugs in software,” said Paul E. Black, NIST computer scientist and SAMATE project leader. The dataset is used to build static analyzers that comb software for problems.

SAMATE, which began in 2004, is an umbrella project to improve software assurance by excluding known problems. The catalyst for the program was a Homeland Security Department project on software assurance tools, Black said.

“They wanted to understand what tools were available, measure their effectiveness and identify gaps,” he said. The tools analyze software, scanning it for known flaws and weaknesses. “We asked ourselves, does this tool catch all possible errors? We realized that to answer that we needed a list of all possible errors.”

NIST worked with DHS to establish a long-term program for creating such a list. The effort complements other programs, such as the Common Weakness Enumeration and the Common Vulnerabilities and Exposures databases maintained by Mitre Corp.

SAMATE contains specific examples of coding flaws in software written in Java, C and C++. Each case is about a page of computer code showing a problematic way of composing functions, loops or logic operations

The current dataset is limited in the languages it includes and still does not cover all types of weaknesses. The Common Weakness Enumeration contains closed to 500 types of weaknesses, Black said. “We’ve expanded enormously, but we could probably double our set again,” he said.

Industry is using SAMATE, Black said. Before the latest release, there had been 10,000 downloads of the dataset over a 10-month period.

About the Author

William Jackson is freelance writer and the author of the CyberEye blog.

Reader Comments

Please post your comments here. Comments are moderated, so they may not appear immediately after submitting. We will not post comments that we consider abusive or off-topic.

Please type the letters/numbers you see above