GCN Tech Blog

By GCN Staff

Blog archive

Data scraping, Web 2.0 style

When we we're writing our composite application story for this week's issue, we ran across Chicagocrime.org, a great example of how to build a new service from existing Web applications. This Web site returns a Google Map pinpointing the exact locations of crimes, organized by date, zip code, street or any one of a number of other parameter choose. It draws its data from another Web site, called Citizen ICAM , which is run by the Chicago Police Department.

The Silicon Valley digerati like to point to this Web application as an example of 'Web 2.0,' an idea of thinking about the Web as not only a giant repository of data, but as a platform for providing services as well.

While we'll reserve judgment on if the actual 'Web 2.0' buzzphrase will stick, we did want to find out more about how ChicagoCrime.org worked.

Unbeknownst to us at the time, Chicagocrime.org was actually created by Washington Post Web site developer Adrian Holovaty (The Post owns GCN). Since we only offered limited details of how the site worked in the article, we wanted to go in a bit more detail here:

GCN: What was your original motivation for starting this service?
Holovaty: My motivation was two-fold: To work on an interesting technical project, and to serve the community.

GCN: How do you query ICAM--is it just through the ICAM Web site?
Holovaty: It's very low-level and dirty. The Chicago Police Department doesn't provide any sort of machine-friendly data feeds, so I wrote a screen-scraping program that emulates human browsing behavior and grabs all crimes directly from the Web site. There really isn't any raw data that's specific to me -- what you see on Citizen ICAM is what I get.

GCN: The Google Maps application programming interface only accepts longitude and latitude for placing points on the map.Do you use any outside Web resources for translating the addresses into longitude and latitude? Is this process automated as well?
Holovaty: Yes, this process is automated as well. I use Yahoo's free geocoding service.

GCN: Generally speaking, can you say anything about approach you used for parsing the crime data into various categories that you offer?

Holovaty: The categories are all taken directly from Citizen ICAM; I didn't do any sort of automatic categorization.

GCN:Is there anything the ICAM folks could do to their data sharing approaches that would make your (or your administrator's) life easier?

Holovaty: Yes, yes, yes! If ICAM folks provided a raw feed of crime data, that would simplify my data-retrieval programs significantly. I asked the CPD about this when I met with them in person, but they said they weren't willing to do it.

Posted by Joab Jackson

Posted by Brad Grimes, Joab Jackson on Apr 24, 2006 at 9:39 AM


  • Workforce
    Avril Haines testifies SSCI Jan. 19, 2021

    Haines looks to restore IC workforce morale

    If confirmed, Avril Haines says that one of her top priorities as the Director of National Intelligence will be "institutional" issues, like renewing public trust in the intelligence community and improving workforce morale.

  • Defense
    laptop cloud concept (Andrey Suslov/Shutterstock.com)

    Telework, BYOD and DEOS

    Telework made the idea of bringing your own device a top priority as the Defense Information Systems Agency begins transitioning to a permanent version of the commercial virtual remote environment.

Stay Connected