GCN Tech Blog

By GCN Staff

Blog archive

Data scraping, Web 2.0 style

When we we're writing our composite application story for this week's issue, we ran across Chicagocrime.org, a great example of how to build a new service from existing Web applications. This Web site returns a Google Map pinpointing the exact locations of crimes, organized by date, zip code, street or any one of a number of other parameter choose. It draws its data from another Web site, called Citizen ICAM , which is run by the Chicago Police Department.

The Silicon Valley digerati like to point to this Web application as an example of 'Web 2.0,' an idea of thinking about the Web as not only a giant repository of data, but as a platform for providing services as well.

While we'll reserve judgment on if the actual 'Web 2.0' buzzphrase will stick, we did want to find out more about how ChicagoCrime.org worked.

Unbeknownst to us at the time, Chicagocrime.org was actually created by Washington Post Web site developer Adrian Holovaty (The Post owns GCN). Since we only offered limited details of how the site worked in the article, we wanted to go in a bit more detail here:

GCN: What was your original motivation for starting this service?
Holovaty: My motivation was two-fold: To work on an interesting technical project, and to serve the community.

GCN: How do you query ICAM--is it just through the ICAM Web site?
Holovaty: It's very low-level and dirty. The Chicago Police Department doesn't provide any sort of machine-friendly data feeds, so I wrote a screen-scraping program that emulates human browsing behavior and grabs all crimes directly from the Web site. There really isn't any raw data that's specific to me -- what you see on Citizen ICAM is what I get.

GCN: The Google Maps application programming interface only accepts longitude and latitude for placing points on the map.Do you use any outside Web resources for translating the addresses into longitude and latitude? Is this process automated as well?
Holovaty: Yes, this process is automated as well. I use Yahoo's free geocoding service.

GCN: Generally speaking, can you say anything about approach you used for parsing the crime data into various categories that you offer?

Holovaty: The categories are all taken directly from Citizen ICAM; I didn't do any sort of automatic categorization.

GCN:Is there anything the ICAM folks could do to their data sharing approaches that would make your (or your administrator's) life easier?

Holovaty: Yes, yes, yes! If ICAM folks provided a raw feed of crime data, that would simplify my data-retrieval programs significantly. I asked the CPD about this when I met with them in person, but they said they weren't willing to do it.

Posted by Joab Jackson


Posted by Brad Grimes, Joab Jackson on Apr 24, 2006 at 9:39 AM


Featured

  • Telecommunications
    Stock photo ID: 658810513 By asharkyu

    GSA extends EIS deadline to 2023

    Agencies are getting up to three more years on existing telecom contracts before having to shift to the $50 billion Enterprise Infrastructure Solutions vehicle.

  • Workforce
    Shutterstock image ID: 569172169 By Zenzen

    OMB looks to retrain feds to fill cyber needs

    The federal government is taking steps to fill high-demand, skills-gap positions in tech by retraining employees already working within agencies without a cyber or IT background.

  • Acquisition
    GSA Headquarters (Photo by Rena Schild/Shutterstock)

    GSA to consolidate multiple award schedules

    The General Services Administration plans to consolidate dozens of its buying schedules across product areas including IT and services to reduce duplication.

Stay Connected

Sign up for our newsletter.

I agree to this site's Privacy Policy.