Scrub Your Data, not Your Career
- By John Breeden II, GCN Staff
- Apr 13, 2006
Tools such as Microsoft Word, PowerPoint and Excel are wonderful ways to express information. But until the fruits of their labor are made public, users treat them like personal workspaces, which can lead to problems when data that wasn't intended for public consumption leaks out.
The GCN Lab took a look at a beta version of Microsoft Office 2007, which includes features to eliminate hidden data in documents, but it won't be ready until late 2006 at the soonest. What is an Office user to do in the meantime?
Document Detective from SRS Technologies is a program designed to find and remove data from documents. It adds a toolbar to Office apps, giving you access to tools for redaction, security scanning and even government-standard classification markings. The software was originally designed to meet Defense Department regulations for document security.
We installed successive versions of Document Detective (1.1 and 2.0) on a lab workstation that in the past had been used for numerous presentations in all types of Office formats. There was plenty of data to examine.
Document Detective looks at more than 100 different ways data can be accidentally shared. Some of them are fairly obvious, such as Word. Others are subtler, such as embedded objects.
In our tests, Document Detective found and scrubbed all the instances we could find of hidden data in very well used files. And the most recent version, which just came out, includes some nice new features.
At the request of the National Security Agency, for instance, version 2.0 looks closely at fonts for anomalies. It flags words less than four points in size or ones that are larger than 128 points. Tiny text might not be visible while editing, and large text might sit outside the visible margins.
The program already did a good job of finding white text on a white background, or other color settings that might hide data. But the new interface in version 2.0 makes these instances easier to find and fix. In fact, the way you use Document Detective has improved significantly. You can now view pre- and post-scrubbed documents side by side on the screen to make sure you did not lose any formatting information during cleaning.
Working with PDFs
Of course, the problem of hidden data is not unique to Office. Adobe Acrobat Portable Document Format files can also present pitfalls. PDFs can contain metadata that won't print or display normally but is readily available to people who know where to look. We found several interesting things inside PDF presentations that vendors have given us over the years before Document Detective scrubbed them clean.
Previous versions of Document Detective lumped most of the found data into an Objects folder, which in large PDFs can be pretty extensive. Version 2.0 breaks the Objects folder down into smaller components, such as thumbnails or bookmarks, so it's easier for users to target data for redaction, as well as save information they'll need later.
It's worth noting that metadata is often there for a reason. Under the right circumstances, such as when you need to revert to a previous version of a document, so-called hidden data is a lifesaver.
If software such as Document Detective eliminated everything willy-nilly, it might erase something you want or need. Thankfully, Detective 2.0 makes the examination process easy and configurable. You can put as much or as little effort into it as needed, depending on how secure you need your file to become.
John Breeden II is a freelance technology writer for GCN.