merging geospatial datasets

Map merge: How USGS integrates geospatial systems

From NASA to NOAA, and the Department of Agriculture to the Department of Defense, federal agencies and departments are collecting geospatial data at a pace that would choke many server farms. The data isn't just being amassed: It's being analyzed to guide troops in unfamiliar terrain, to track the spread of disease and to decipher crime patterns across the law enforcement enterprise.

While geographic information systems (GIS) have become well established in the federal government, the current challenge for agencies is to develop tools and techniques to connect their geospatial programs with their counterparts in other jurisdictions.

While some agencies have been coordinating their data collection and analyses, as a general rule data collected by one agency can rarely be integrated easily and effectively with data collected by another. 

Steps are now being taken to change that.  In fact, the Federal Geographic Data Committee (FGDC) – the main federal unit charged with integrating federal geospatial efforts – in its National Spatial Data Infrastructure Plan 2014-2016, said its primary goal is to "develop capabilities for national shared services."

More specifically, the FGDC, hosted at the U.S. Geological Survey, has been tasked with ensuring, "that spatial data from multiple sources (federal, state, tribal, regional and local governments; academia; and the private sector) are available and easily integrated to enhance understanding of our physical, natural, and cultural world.”

According to agency geospatial leaders and technology experts, there are two major hurdles to achieving those goals: neither the resolution of collected data nor the formats for creating metadata are compatible across data fields.

USGS takes the lead

Not surprisingly, the lead organization researching geospatial data integration and developing tools to enable such integration is USGS. In fact, the agency has staked its future on the goal of geospatial interoperability, having created the Center of Excellence for Geospatial Information Science (CEGIS) to lead the effort.

The Center was established "to conduct, lead and influence the research and innovative solutions required by the National Spatial Data Infrastructure and the emerging GeoSpatial and GeoSemantic Web," said USGS.

According to Lynn Usery, director of CEGIS, the initial impetus for finding ways to integrate geospatial data was the USGS National Map – a collaborative effort among USGS and other federal, state and local partners to improve and deliver topographic information across the nation, including orthoimagery (aerial photographs), elevation, geographic names, hydrography, boundaries, transportation, structures and land cover.

The first hurdle Usery's team faced was the different resolutions of collected data.

"The reason we started the [CEGIS data integration] project was that when we were developing the national map, we realized that the different layers of the national map – hydrography, transportation, contours and all those things – were all actually compiled and generated separately,” said CEGIS’s Usery. That meant that when the layers were put together, it might look as if they didn’t match up.  

Usery’s team then tried to determine exactly what it means for a dataset to be integrated. “If you take two datasets and you put them together, how does the user perceive that as being integrated?" he asked.

According to Usery, the team found that if the resolution of two datasets – say transportation data superimposed on image data – was within about 6.4 meters, users would perceive the data as being integrated.

But unless both datasets were already geocoded using the same projection system, getting the two datasets to align correctly can be a major problem.  For each data set, an application needs to be created to perform the integration.  In the case of integrating transportation data with underlying imagery, said Usery, USGS worked with researchers at the University of Southern California. 

"USC was already doing some work in this area and had developed some algorithms to automatically locate intersections on the images. Then [they could] use those intersections as control points to get vector data to the images," Usery explained.  USGS provided data and small grants to the research team at USC. 

In similar fashion, USGS provided support to another group to integrate hydrography data with contour layers in the National Map.

Integrating metadata

Of course, integrating data sets collected by agencies would be much faster, easier and less expensive, if the data sets were created using standardized metadata – the data about the data, including geographic coordinates and object labels.

To address the problem, CEGIS is considering a “semantic” approach that allows data to be used across application and agency boundaries.

"We are primarily looking at ontology and semantics as a way to integrate data across a variety of organizations and different kinds of data layers," said Usery.  The team is trying a semantic web approach, he said. They built an ontology for all of the data and then built the semantics around that so they can use those semantics to actually integrate with other data sets.

As it is now, government agencies at all levels – as well as the private sector – apply different labels to features, spatial concepts and other data.  As a result, while an agency might be able to populate a map layer with data points from another agency, the analytic potential is very limited if the metadata doesn't share the same architecture. 

Accordingly, CEGIS is working to develop a single integrated language or ontology for describing geospatial data. 

"We're taking geospatial data and building an ontology for the data based on all of our features, relationships and interrelationships of features, and then we structure the data using RDF – resource description framework," said Usery. 

"If our data is structured in that form and other data is structured as RDF we can actually bring the data sets together."

In fact, USGS did just that with data it brought in from the Environmental Protection Agency, which was also in RDF format.  "We just ran an automatic query to locate all of the EPA pollution sites within five miles a local firehouse," said Usery.

Dealing with legacy data

Not surprisingly, the biggest snag to implementing a robust scheme for organizing metadata is the existence of large amounts of legacy data.

According to Usery, , USGS has not converted all of its data to RDF.  "Our data resides in GIS format and it works very well," he said.  "There are lots of procedures designed around those things and we can't just completely change over and lose all the legacy developments that we've done around GIS platforms." 

Instead, the team has developed a tool that allows analysts to take any section of vector data sets and convert it to RDF.

While most integration of federal geospatial data is currently happening through ad hoc arrangements between agencies, the FGDC has developed a process for managing data sets to encourage their availability to, and usability by, all sectors of government.

The FGDC's Geospatial Data Asset Management Plan – issued in March 2014 – requires agencies and departments to adopt standardized processes for "racking, maintaining, expanding and aligning" geospatial data assets.  "At a national level," said the report, "this approach is intended to overcome the single agency, stovepipe model by applying consistent policy, improved organization, better governance and public engagement to deliver outstanding results.”

By the end of 2016, the process is expected to result in the collection of searchable and downloadable datasets that agencies, departments and the public can access via the portal, which already offers more than 95,000 datasets.

While the goals of the FGDC are ambitious, according to Usery, there's a long way to go to meet them.  "All agencies that use geospatial data try to comply with the FGDC standards," he said.  "That's been the case for many years.  Beyond that I don't think there's a lot that goes on."

For now, Usery said, at USGS most efforts to integrate geospatial data continue to be between agencies that share an immediate interest in the specific data.  "We try to leverage data from other agencies and not have to collect all the data ourselves," he said. 


  • Records management: Look beyond the NARA mandates

    Pandemic tests electronic records management

    Between the rush enable more virtual collaboration, stalled digitization of archived records and managing records that reside in datasets, records management executives are sorting through new challenges.

  • boy learning at home (Travelpixs/

    Tucson’s community wireless bridges the digital divide

    The city built cell sites at government-owned facilities such as fire departments and libraries that were already connected to Tucson’s existing fiber backbone.

Stay Connected