Scrub Your Data, not Your Career

<b><font color=

Performance: A-

Ease of use: A

Features: A

Value: A-

Price: $300 for a single-user license


SRS Technologies

Huntsville, Ala.

(256) 971-7000

Reviewer's comments: Document Detective finds a surprising amount of hidden metadata not meant for public consumption. This brand-new version lets you do a side-by-side comparison to determine if the scrubbed document meets your agency's reporting requirements.

Tools such as Microsoft Word, PowerPoint and Excel are wonderful ways to express information. But until the fruits of their labor are made public, users treat them like personal workspaces, which can lead to problems when data that wasn't intended for public consumption leaks out.

The GCN Lab took a look at a beta version of Microsoft Office 2007, which includes features to eliminate hidden data in documents, but it won't be ready until late 2006 at the soonest. What is an Office user to do in the meantime?

Document Detective from SRS Technologies is a program designed to find and remove data from documents. It adds a toolbar to Office apps, giving you access to tools for redaction, security scanning and even government-standard classification markings. The software was originally designed to meet Defense Department regulations for document security.

We installed successive versions of Document Detective (1.1 and 2.0) on a lab workstation that in the past had been used for numerous presentations in all types of Office formats. There was plenty of data to examine.

Document Detective looks at more than 100 different ways data can be accidentally shared. Some of them are fairly obvious, such as Word. Others are subtler, such as embedded objects.

In our tests, Document Detective found and scrubbed all the instances we could find of hidden data in very well used files. And the most recent version, which just came out, includes some nice new features.

At the request of the National Security Agency, for instance, version 2.0 looks closely at fonts for anomalies. It flags words less than four points in size or ones that are larger than 128 points. Tiny text might not be visible while editing, and large text might sit outside the visible margins.

The program already did a good job of finding white text on a white background, or other color settings that might hide data. But the new interface in version 2.0 makes these instances easier to find and fix. In fact, the way you use Document Detective has improved significantly. You can now view pre- and post-scrubbed documents side by side on the screen to make sure you did not lose any formatting information during cleaning.

Working with PDFs

Of course, the problem of hidden data is not unique to Office. Adobe Acrobat Portable Document Format files can also present pitfalls. PDFs can contain metadata that won't print or display normally but is readily available to people who know where to look. We found several interesting things inside PDF presentations that vendors have given us over the years before Document Detective scrubbed them clean.

Previous versions of Document Detective lumped most of the found data into an Objects folder, which in large PDFs can be pretty extensive. Version 2.0 breaks the Objects folder down into smaller components, such as thumbnails or bookmarks, so it's easier for users to target data for redaction, as well as save information they'll need later.

It's worth noting that metadata is often there for a reason. Under the right circumstances, such as when you need to revert to a previous version of a document, so-called hidden data is a lifesaver.

If software such as Document Detective eliminated everything willy-nilly, it might erase something you want or need. Thankfully, Detective 2.0 makes the examination process easy and configurable. You can put as much or as little effort into it as needed, depending on how secure you need your file to become.

About the Authors

John Breeden II is a freelance technology writer for GCN.


  • Records management: Look beyond the NARA mandates

    Pandemic tests electronic records management

    Between the rush enable more virtual collaboration, stalled digitization of archived records and managing records that reside in datasets, records management executives are sorting through new challenges.

  • boy learning at home (Travelpixs/

    Tucson’s community wireless bridges the digital divide

    The city built cell sites at government-owned facilities such as fire departments and libraries that were already connected to Tucson’s existing fiber backbone.

Stay Connected