Startup aims to make Census data easier to use

Startup aims to make Census data easier to use

That demographic data that the Census Bureau releases on American society is critical for social scientists and economists and praised for its accuracy. It is not, however, known for its ease of use.

But one Austin, Texas-based startup is hoping to make Census data less dense and more accessible. The social network and National Science Federation-funded fellow Jonathan Ortiz are working toward a more intuitive dataset for the Census’ American Community Survey. sells itself as a way to increase collaboration within the realm of data to “accelerate problem solving,” as the company’s cofounder and chief product officer Jon Loyens put it. It’s a social platform that helps people who work with data discover, prepare and share datasets. By linking datasets together using semantic web technology, identifies and adds information about the concepts within the datasets, which makes them easier for people and machines to work with.

The conversation between Census and began when Jeff Meisel, the Census Bureau's chief marketing officer, reached out to the South Big Data Hub, one of four regional innovation hubs established by the NSF that connects government with startups to work collaboratively on data projects.

South Big Data Hub put Meisel in touch with and paved the way for Ortiz’s fellowship.

“Part of our mission is making our public data as easy to use as possible,” Meisel said, adding that Census information is used to make major decisions in both the private and public sectors.

But Ortiz said that even for those with a background in computer science and data, Census information can be difficult to work with.

“It is extremely large,” Ortiz said about the ACS data. “Even when dealing with one year of ACS, it’s already huge. It gets too large to handle on one computer, and that’s when it starts to be considered big data.… It’s accurate, but it’s not the tidiest, easiest dataset that I’ve ever seen.”

What makes the data even more complicated is the supporting information needed to understand the complete set. An example Ortiz used is fuel cost per household in the ACS. The numeral 1 in this category means the cost is included in rent, 2 has another meaning; and then numbers 3 through 9,999 correspond to monetary values.

The Census releases its data in a tabular form. Ortiz explained that with tabular data, a computer doesn’t "know" what it’s looking at. But is changing it to an RDF format that will give the values within a dataset corresponding metadata. With the addition of metadata, computers can interpret a number and its relationship to the whole dataset.

This makes the second part of’s goal more attainable. The website is a social platform where users can comment on and share datasets. Having smarter data makes this collaboration easier for users.

When data scientists begin working with a new set of numbers, the first and most time-consuming task is cleaning up the data set to get a better idea of what it is that they’re looking at. wants to shorten that first step, specifically for the Census data, Loyens said.

Ortiz said they are “about 85 percent done” with the ACS dataset for the state of Wyoming, which was the first state to make the transition to because its low population generated a proportionally smaller dataset. The changes made in the information for Wyoming will be the same ones used for the rest of the states, so the transition will go much quicker for the remaining states and territories, he said.

Editor's note: This article was changed Aug. 10 to clarify how works with tabular data.

About the Author

Matt Leonard is a former reporter for GCN.


  • Records management: Look beyond the NARA mandates

    Pandemic tests electronic records management

    Between the rush enable more virtual collaboration, stalled digitization of archived records and managing records that reside in datasets, records management executives are sorting through new challenges.

  • boy learning at home (Travelpixs/

    Tucson’s community wireless bridges the digital divide

    The city built cell sites at government-owned facilities such as fire departments and libraries that were already connected to Tucson’s existing fiber backbone.

Stay Connected