Pulse

By GCN Staff

Blog archive

NIST draft standard details approximate matching

The National Institute of Standards and Technology’s draft publication SP 800-168, Approximate Matching: Definition and Terminology, provides a description of approximate matching and includes requirements and considerations for testing. 

Approximate matching is a technique designed to identify similarities between two digital artifacts or arbitrary byte sequences such as a file.

A similarity between two artifacts is determined by a particular approximate matching algorithm. One process the technology uses to find these similarities is resemblance. In this method, two similarly sized objects are compared and searched for common traits. For example, successive versions of a piece of code are likely to share many similarities.

A second way approximate matching measures similarities is containment. This method examines two different sized objects and determines whether the smaller one is inside the larger one, such as a file and a whole-disk image.

This technology is very useful for security monitoring and forensic analysis by filtering data.  It provides a result from a range of outcomes [0, 1], which are interpreted as a level of similarity. The reliability of a result is assessed by the robustness of the algorithm, its precision, and whether the algorithm includes security properties designed to prevent attacks, as the manipulation of the matching technique.

A public comment period on Special Publication 800-168 begins on Jan. 27, 2014, and runs through March 21, 2014.  Comments can be sent to match@nist.gov with “Comments on SP 800-168” on the subject line.

Posted by Mike Cipriano on Jan 31, 2014 at 7:38 AM


Reader Comments

Fri, Feb 7, 2014 Don O'Neill

First, I offer a suggestion to expand the domains of utility. The domain of utility can be expanded to include the detection of unauthorized use or reuse of proprietary information, copyrighted material, and trade secrets. Here the fragment detection use case would be especially applicable. Here the type of similarity expected may be either resemblance or containment. Second, I offer the suggestion to expand the universe of deep detection by employing Carnegie Mellon University's Function Extraction (FX) methods to reveal intended functions that may then be subject to approximate matching algorithms to determine similarity or identity. Third, I offer the suggestion to expend the universe of deep semantics through cognitive computing, for example, IBM Watson.

Please post your comments here. Comments are moderated, so they may not appear immediately after submitting. We will not post comments that we consider abusive or off-topic.

Please type the letters/numbers you see above

resources

HTML - No Current Item Deck
  • Transforming Constituent Services with Business Process Management
  • Improving Performance in Hybrid Clouds
  • Data Center Consolidation & Energy Efficiency in Federal Facilities