GCN Tech Blog

By GCN Staff

Blog archive

Data scraping, Web 2.0 style

When we we're writing our composite application story for this week's issue, we ran across Chicagocrime.org, a great example of how to build a new service from existing Web applications. This Web site returns a Google Map pinpointing the exact locations of crimes, organized by date, zip code, street or any one of a number of other parameter choose. It draws its data from another Web site, called Citizen ICAM , which is run by the Chicago Police Department.

The Silicon Valley digerati like to point to this Web application as an example of 'Web 2.0,' an idea of thinking about the Web as not only a giant repository of data, but as a platform for providing services as well.

While we'll reserve judgment on if the actual 'Web 2.0' buzzphrase will stick, we did want to find out more about how ChicagoCrime.org worked.

Unbeknownst to us at the time, Chicagocrime.org was actually created by Washington Post Web site developer Adrian Holovaty (The Post owns GCN). Since we only offered limited details of how the site worked in the article, we wanted to go in a bit more detail here:

GCN: What was your original motivation for starting this service?
Holovaty: My motivation was two-fold: To work on an interesting technical project, and to serve the community.

GCN: How do you query ICAM--is it just through the ICAM Web site?
Holovaty: It's very low-level and dirty. The Chicago Police Department doesn't provide any sort of machine-friendly data feeds, so I wrote a screen-scraping program that emulates human browsing behavior and grabs all crimes directly from the Web site. There really isn't any raw data that's specific to me -- what you see on Citizen ICAM is what I get.

GCN: The Google Maps application programming interface only accepts longitude and latitude for placing points on the map.Do you use any outside Web resources for translating the addresses into longitude and latitude? Is this process automated as well?
Holovaty: Yes, this process is automated as well. I use Yahoo's free geocoding service.

GCN: Generally speaking, can you say anything about approach you used for parsing the crime data into various categories that you offer?

Holovaty: The categories are all taken directly from Citizen ICAM; I didn't do any sort of automatic categorization.

GCN:Is there anything the ICAM folks could do to their data sharing approaches that would make your (or your administrator's) life easier?

Holovaty: Yes, yes, yes! If ICAM folks provided a raw feed of crime data, that would simplify my data-retrieval programs significantly. I asked the CPD about this when I met with them in person, but they said they weren't willing to do it.

Posted by Joab Jackson

Posted by Brad Grimes, Joab Jackson on Apr 24, 2006 at 9:39 AM


  • Defense
    Soldiers from the Old Guard test the second iteration of the Integrated Visual Augmentation System (IVAS) capability set during an exercise at Fort Belvoir, VA in Fall 2019. Photo by Courtney Bacon

    IVAS and the future of defense acquisition

    The Army’s Integrated Visual Augmentation System has been in the works for years, but the potentially multibillion deal could mark a paradigm shift in how the Defense Department buys and leverages technology.

  • Cybersecurity
    Deputy Secretary of Homeland Security Alejandro Mayorkas  (U.S. Coast Guard photo by Petty Officer 3rd Class Lora Ratliff)

    Mayorkas announces cyber 'sprints' on ransomware, ICS, workforce

    The Homeland Security secretary announced a series of focused efforts to address issues around ransomware, critical infrastructure and the agency's workforce that will all be launched in the coming weeks.

Stay Connected