Tools to track stolen data through the dark web

Tools to track stolen data through the dark web

It’s no secret cybercrime has become a lucrative and more professional endeavor.  According to Eugene Kaspersky, CEO of security firm Kaspersky Lab, “thousands of businesses have already been hacked and had their sensitive data stolen -- resulting in multi-billion dollar losses.”

And as headlines attest, the U.S. government has been no stranger to such incidents. 

The information that is lifted from government or corporate networks by criminals has become a hot commodity on the dark web, the part of the internet often used for criminal activity.  Stolen information may surface in secret chatrooms, forums or marketplaces on the dark web, but as many experts concede, tracking this information is extremely difficult for a number of reasons, primarily because the dark web is not indexed by Google and other search engines. 

Another complicating factor associated with tracking stolen data is that it usually doesn’t surface until it is ready to be sold, Justin Harvey, chief security officer of Fidelis Cybersecurity, told GCN.  These transactions typically take place “through underground forums and marketplaces, and it’s very difficult to infiltrate those environments because it’s usually invite only, you have to know someone,” Harvey said.  

Currently, tracking stolen data is a function of threat intelligence and threat research teams, according to Harvey.  Through conventional police or investigative work, these teams will try to infiltrate tight-knit circles, monitor forums and even barter with traders as a means of unearthing the stolen data. 

Others, however, are looking for automated solutions to track information.  The Department of Veterans Affairs last month said it was seeking software that can search the dark web for exploited VA data improperly outside its control, distinguish between VA data and other data and create a “one-way encrypted hash” of VA data to ensure that other parties cannot ascertain or use it. The software would also use VA's encrypted data hash to search the dark web for VA content.

Some companies, such as Terbium Labs, have developed similar hashing technologies.  “It’s not code that’s embedded in the data so much as a computation done on the data itself,” Danny Rogers, a Terbium Labs co-founder, told Defense One regarding its cryptographic hashing.  This capability essentially enables a company or agency to recognize its stolen data if discovered. 

Bitglass, a cloud access security broker, uses watermarking technology to track stolen data.  A digital watermark or encryption algorithm is applied to files such as spreadsheets, Word documents or PDFs that requires users to go through an authentication process in order to access it.   

This watermarking can even protect from hackers trying to copy and paste stolen data into a separate document, “because we associate the digital watermark with the data in a source file and not the file itself,” Salim Hafid, product marketing manager at Bitglass, told GCN.  If someone tried to open a Word document or an Excel spreadsheet to copy a particular dataset into another file, the watermark follows that data and tracks the source file, he said. 

To test its solution’s tracking ability,  Bitglass “leaked” fictitious personal information -- bank information, Social Security numbers or credit card numbers -- to the dark web and was able to visualize where the leaked data was being accessed from and by whom, as well as what parts of files were being accessed.  “We embed in the source file the ability to call back to a server,” Hafid said, explaining how the company tracks this information.  Applications that open Word documents or PDFs have built-in call-backs to get information on how it should be rendered, he said. 

Applying solutions in a government setting can be a bit of a challenge considering the extra regulation hurdles.  However, with increasing government adoption of cloud technologies, cloud access security solutions – such as Bitglass’s watermarking technology – is attractive to government agencies and personnel because “they demand access on their mobile devices, they want to work from home or work from outside of the office,”  Hafid said.

To penetrate the deep web, the even larger unindexed portion of the web, the Defense Advanced Research Projects Agency has developed the Memex search engine that it envisions will revolutionize the discovery, organization and presentation of search results within the deep web, the dark web and multimedia content.  Memex is currently being used by members of the law enforcement community with plans to eventually transition its use to the commercial market. 

About the Author

Mark Pomerleau is a former editorial fellow with GCN and Defense Systems.


  • Records management: Look beyond the NARA mandates

    Pandemic tests electronic records management

    Between the rush enable more virtual collaboration, stalled digitization of archived records and managing records that reside in datasets, records management executives are sorting through new challenges.

  • boy learning at home (Travelpixs/

    Tucson’s community wireless bridges the digital divide

    The city built cell sites at government-owned facilities such as fire departments and libraries that were already connected to Tucson’s existing fiber backbone.

Stay Connected