App filters out classified e-mail
- By Susan M. Menke
- May 03, 2002
Classified information embedded accidentally'or intentionally'in e-mail used to be an everyday headache at the Office of the Defense Undersecretary for Acquisition, Technology and Logistics.
Commercial filtering software flagged too many of the 30,000 Microsoft Outlook messages that flow in and out daily, said David Lloyd, who was detailed to the office to find a solution.
Each false positive had to be examined manually even if its only 'secret' was in the harmless word 'secretary,' Lloyd said. And for every nine or 10 false positives, the text filtering packages he tested over an 18-month period would miss one or more message attachments containing genuine leaks.
The limit that Lloyd set for manual examination by a security specialist was eight to 10 documents per day. Furthermore, he wanted software that could scrutinize almost any document, whatever the format and whether zipped or not.
Besides confusing 'secret' with 'secretary,' most of the commercial filters tripped up on the (C) mark that denotes classified material. They incorrectly flagged any documents with A, B, C lists of topics, Lloyd said. And they could not distinguish between document cover sheets and their classified attachments.
'It was essentially up to the government to solve this problem,' Lloyd said.
He enlisted the University of Massachusetts' Center for Intelligent Information Retrieval in Amherst to help him find a natural language analyzer that could do real-time scanning, root parsing and syntax analysis.
Ultimately, the team chose a $40,000 server application, Harvest Mail from Chiliad Corp., also of Amherst.
Harvest Mail can read Adobe Portable Document Format, HTML, Microsoft PowerPoint, native Lotus Notes files and assorted other formats, including zipped files.
Lloyd said the office uses Harvest Mail to filter collaborative documents and correspondence as well as e-mail attachments. It could run on the same server platform with Microsoft Exchange Server, but that would slow performance. So he tried journaling'copying all e-mail messages to a separate server for scanning.Employee counseling
Harvest Mail is used 'only for this purpose to avoid privacy problems, and only security officers see the hits,' Lloyd said. Hits have dropped to about seven per day, and if the examiners agree with the software's verdict, they immediately contact the e-mail administrator and counsel employees who sent the classified messages.
Chiliad president Paul McOwen said Harvest Mail can process up to 8,000 messages per hour on a dual-processor server running Microsoft Windows NT or Unix.