DATA MANAGEMENT

Microsoft, Amazon offer their own repositories for government data

Data.gov is not the only repository for raw government datasets. At least two commercial endeavors, by Microsoft and Amazon, also offer homes for government data.

Microsoft has set up a repository so that government agencies can upload and store their public-facing datasets for reuse by other parties. The repository uses Microsoft's Azure, the company's cloud computing offering. Microsoft's data repository is called the Open Government Data Initiative — not to be confused with the White House's Open Government Initiative Web site, which is a site for soliciting ideas from the public on greater government transparency.

Microsoft developed OGDI as a way to introduce Azure to the federal information technology community, said Susie Adams, Microsoft Federal chief technology officer. "The division built a starter kit that acts as a guide to how agencies can post data to Azure, using Visual Studio," she said. The data is stored in Azure, in SQL Tables. Eventually, Microsoft will move Azure data over to its SQL Data Services, she said. Agencies will be able to work within Azure for a fee.

With OGDI, users can access datasets for free via a Web page or by an Atom Really Simple Syndication feed. They can also query a large dataset by formulating a Microsoft ADO.NET Data Services query. Perhaps more important, other computer programs can ingest the data using a Representational State Transfer-based Web service, a JavaScript Object Notation call, or, if the data is geographic in nature, the Keyhole Markup Language, among other protocols.

As examples, Microsoft has assembled two sets of government data on the site, a compilation of per diem rates from the General Services Administration and a number of data feeds from Washington. Each dataset also includes sample code that developers can insert into other programs that could then automatically access the data. The code comes in the C#, PHP, Python, ActionScript, JavaScript, Silverlight and Ruby languages.

Like Microsoft, Amazon is offering storage in hopes that the government data will attract more users to its paid services.

The company is offering to host government datasets on its Elastic Block Storage (EBS) for free.

Amazon has already posted a large set of Census Bureau geographic data, namely the Topologically Integrated Geographic Encoding and Referencing (TIGER) shape files, which map the country's roads, railroads and rivers within legal and statistical geographic areas. Other public datasets include federal contracts from the Federal Procurement Data Center and Influenza Genome Sequencing from the National Center for Biotechnology Information.

Developers building cloud-based applications on the online store's Elastic Cloud Computing (EC2) service can point their virtual machines to copies of these datasets, which are in a format that is sometimes easier to reuse than the original formats that agencies provide. For instance, the TIGER files are available from the Census Bureau but only via FTP in compressed packages. The Amazon platform would allow Amazon cloud-based Web applications to import copies of the shape files as needed.

A batch of data on EBS "appears just like an external hard drive when it's mounted to an EC2 instance, which is a virtual machine," said Eric Gundersen, president of Development Seed, which is using the TIGER data as part of a school district mapping project for the nonprofit New America Foundation. "So you can hook up this public virtual disk to your virtual machine and work with the data as if it's local to your virtual machine."

Reader Comments

Sat, May 30, 2009

Data.gov is not a repository of data, its a portal. Big difference.

Please post your comments here. Comments are moderated, so they may not appear immediately after submitting. We will not post comments that we consider abusive or off-topic.

Please type the letters/numbers you see above