Data scraping, Web 2.0 style
When we we're writing our composite application story
for this week's issue, we ran across Chicagocrime.org
, a great example of how to build a new service from existing Web applications. This Web site returns a Google Map pinpointing the exact locations of crimes, organized by date, zip code, street or any one of a number of other parameter choose. It draws its data from another Web site, called Citizen ICAM
, which is run by the Chicago Police Department.
The Silicon Valley digerati like to point to this Web application as an example of 'Web 2.0,'
an idea of thinking about the Web as not only a giant repository of data, but as a platform for providing services as well.
While we'll reserve judgment on if the actual 'Web 2.0' buzzphrase will stick, we did want to find out more about how ChicagoCrime.org worked.
Unbeknownst to us at the time, Chicagocrime.org was actually created by Washington Post
Web site developer Adrian Holovaty (The Post
owns GCN). Since we only offered limited details
of how the site worked in the article, we wanted to go in a bit more detail here:GCN
: What was your original motivation for starting this service?Holovaty
: My motivation was two-fold: To work on an interesting technical project, and to serve the community.GCN
: How do you query ICAM--is it just through the ICAM Web site?Holovaty
: It's very low-level and dirty. The Chicago Police Department doesn't provide any sort of machine-friendly data feeds, so I wrote a screen-scraping program that emulates human browsing behavior and grabs all crimes directly from the Web site. There really isn't any raw data that's specific to me -- what you see on Citizen ICAM is what I get.GCN
: The Google Maps application programming interface only accepts longitude and latitude for placing points on the map.Do you use any outside Web resources for translating the addresses into longitude and latitude? Is this process automated as well?Holovaty
: Yes, this process is automated as well. I use Yahoo's free geocoding service
: Generally speaking, can you say anything about approach you used for parsing the crime data into various categories that you offer?Holovaty
: The categories are all taken directly from Citizen ICAM; I didn't do any sort of automatic categorization.GCN
:Is there anything the ICAM folks could do to their data sharing approaches that would make your (or your administrator's) life easier?Holovaty
: Yes, yes, yes! If ICAM folks provided a raw feed of crime data, that would simplify my data-retrieval programs significantly
. I asked the CPD about this when I met with them in person, but they said they weren't willing to do it.Posted by Joab Jackson
Posted by Brad Grimes, Joab Jackson on Apr 24, 2006 at 9:39 AM