Microsoft, Amazon offer their own repositories for government data
- By Joab Jackson
- May 29, 2009
Data.gov is not the only repository for raw government datasets. At least two commercial endeavors, by Microsoft and Amazon, also offer homes for government data.
Microsoft has set up a repository so that government agencies can upload and store their public-facing datasets for reuse by other parties. The repository uses Microsoft's Azure, the company's cloud computing offering. Microsoft's data repository is called the Open Government Data Initiative — not to be confused with the White House's Open Government Initiative Web site, which is a site for soliciting ideas from the public on greater government transparency.
Microsoft developed OGDI as a way to introduce Azure to the federal information technology community, said Susie Adams, Microsoft Federal chief technology officer. "The division built a starter kit that acts as a guide to how agencies can post data to Azure, using Visual Studio," she said. The data is stored in Azure, in SQL Tables. Eventually, Microsoft will move Azure data over to its SQL Data Services, she said. Agencies will be able to work within Azure for a fee.
Like Microsoft, Amazon is offering storage in hopes that the government data will attract more users to its paid services.
The company is offering to host government datasets on its Elastic Block Storage (EBS) for free.
Amazon has already posted a large set of Census Bureau geographic data, namely the Topologically Integrated Geographic Encoding and Referencing (TIGER) shape files, which map the country's roads, railroads and rivers within legal and statistical geographic areas. Other public datasets include federal contracts from the Federal Procurement Data Center and Influenza Genome Sequencing from the National Center for Biotechnology Information.
Developers building cloud-based applications on the online store's Elastic Cloud Computing (EC2) service can point their virtual machines to copies of these datasets, which are in a format that is sometimes easier to reuse than the original formats that agencies provide. For instance, the TIGER files are available from the Census Bureau but only via FTP in compressed packages. The Amazon platform would allow Amazon cloud-based Web applications to import copies of the shape files as needed.
A batch of data on EBS "appears just like an external hard drive when it's mounted to an EC2 instance, which is a virtual machine," said Eric Gundersen, president of Development Seed, which is using the TIGER data as part of a school district mapping project for the nonprofit New America Foundation. "So you can hook up this public virtual disk to your virtual machine and work with the data as if it's local to your virtual machine."