GCN Tech Blog

By GCN Staff

Blog archive

Data scraping, Web 2.0 style

When we we're writing our composite application story for this week's issue, we ran across Chicagocrime.org, a great example of how to build a new service from existing Web applications. This Web site returns a Google Map pinpointing the exact locations of crimes, organized by date, zip code, street or any one of a number of other parameter choose. It draws its data from another Web site, called Citizen ICAM , which is run by the Chicago Police Department.

The Silicon Valley digerati like to point to this Web application as an example of 'Web 2.0,' an idea of thinking about the Web as not only a giant repository of data, but as a platform for providing services as well.

While we'll reserve judgment on if the actual 'Web 2.0' buzzphrase will stick, we did want to find out more about how ChicagoCrime.org worked.

Unbeknownst to us at the time, Chicagocrime.org was actually created by Washington Post Web site developer Adrian Holovaty (The Post owns GCN). Since we only offered limited details of how the site worked in the article, we wanted to go in a bit more detail here:

GCN: What was your original motivation for starting this service?
Holovaty: My motivation was two-fold: To work on an interesting technical project, and to serve the community.

GCN: How do you query ICAM--is it just through the ICAM Web site?
Holovaty: It's very low-level and dirty. The Chicago Police Department doesn't provide any sort of machine-friendly data feeds, so I wrote a screen-scraping program that emulates human browsing behavior and grabs all crimes directly from the Web site. There really isn't any raw data that's specific to me -- what you see on Citizen ICAM is what I get.

GCN: The Google Maps application programming interface only accepts longitude and latitude for placing points on the map.Do you use any outside Web resources for translating the addresses into longitude and latitude? Is this process automated as well?
Holovaty: Yes, this process is automated as well. I use Yahoo's free geocoding service.

GCN: Generally speaking, can you say anything about approach you used for parsing the crime data into various categories that you offer?

Holovaty: The categories are all taken directly from Citizen ICAM; I didn't do any sort of automatic categorization.

GCN:Is there anything the ICAM folks could do to their data sharing approaches that would make your (or your administrator's) life easier?

Holovaty: Yes, yes, yes! If ICAM folks provided a raw feed of crime data, that would simplify my data-retrieval programs significantly. I asked the CPD about this when I met with them in person, but they said they weren't willing to do it.

Posted by Joab Jackson


Posted by Brad Grimes, Joab Jackson on Apr 24, 2006 at 9:39 AM


Featured

  • Defense
    Ryan D. McCarthy being sworn in as Army Secretary Oct. 10, 2019. (Photo credit: Sgt. Dana Clarke/U.S. Army)

    Army wants to spend nearly $1B on cloud, data by 2025

    Army Secretary Ryan McCarthy said lack of funding or a potential delay in the JEDI cloud bid "strikes to the heart of our concern."

  • Congress
    Rep. Jim Langevin (D-R.I.) at the Hack the Capitol conference Sept. 20, 2018

    Jim Langevin's view from the Hill

    As chairman of of the Intelligence and Emerging Threats and Capabilities subcommittee of the House Armed Services Committe and a member of the House Homeland Security Committee, Rhode Island Democrat Jim Langevin is one of the most influential voices on cybersecurity in Congress.

Stay Connected

Sign up for our newsletter.

I agree to this site's Privacy Policy.