Georgia Tech automates monitoring of web censorship
A lot is going on under the surface when you cruise the Internet. Advertisers are tracking you though cookies and web beacons, intelligence services may be scanning headers in your page requests and governments may be blocking access to data you’re trying access. And while you can install programs – such as Disconnect, Ghostery and DoNotTrackMe – that prevent advertisers and others from tracking your movements across the web, there is no way to track who is blocking you from reaching, say, that blog about violent protests in China.
Currently, the only sources of information on who is blocking what web content are anecdotal reports from users collected by organizations such as the Berkman Center for Internet and Society at Harvard. Berkman’s Herdict project collects and disseminates real-time, crowdsourced information about Internet filtering, denial of service attacks and other blockages and offers a graphic display of what countries are blocking how much content on the Internet.
Now, researchers at Georgia Tech’s College of Computing are working on a tool, named Encore, to automate the process of monitoring web censorship using a single line of code. Encore runs when a user visits a site where the code is installed. It then collects data on potentially censored websites reachable from the user’s location. The results of the access attempts are then aggregated by the Georgia Tech team.
Nick Feamster, an associate professor and one of the project leaders, noted that Encore is still in beta and is currently limited in the information it can provide. “Encore can only give you binary information – could I reach this content or not,” Feamster said. “We’re basically trading off depth to get breadth.”
The other current limitation to Encore is that it can’t simply scan the Internet for blocked content. “How do you know which items to test?” Feamster asked. “That’s something that actually we haven’t solved yet.” For now, Encore relies on lists of suspected blocked content maintained by others, including Herdict, then targets those sites for monitoring.
As the team at Georgia Tech acknowledged, Encore does raise some potential red flags for users. After all, some users may not want their computers to be used to access other computers without their knowledge. Also, webmasters who insert Encore code are not required to notify visitors to their site that Encore is installed, though Encore’s developers encourage webmasters to do so.
Also, the team noted on its website that Encore doesn’t track users’ browsing history, but only checks on whether suspected sites are reachable from the user’s computer. Feamster also promised that Encore doesn’t affect performance either for websites or users.
“For abundance of caution, Encore currently only conducts a conservative set of measurements meant to demonstrate its technical merit,” according to the Encore FAQ. “We are investigating how to tailor the set of sites that Encore measures to both yield interesting results and minimize risk to users.”
Encore is, in short, still a work in process. While it might not yet be clear what the best way to monitor Internet censorship is, it is clearly in the public interest to have such a capability.
And for those users who have an objection to their computers being used to check on web censorship, Feamster acknowledged that Encore can be blocked by those same tools that keep advertisers from tracking them.
Posted by Patrick Marshall on Aug 05, 2014 at 8:19 AM