Data warehousing and the Web



Organizational countercultures, radical paradigms and living
libraries—this is the language of data warehousing. It’s all about the impact of
the Web.


“In the future, the world is going to be composed of content sitting in data
warehouses, and the mechanism for distributing, accessing and loading will be
nets—intra-, Inter- and extra-,” said Ramon Barquin, founder and former
president of the Data Warehousing Institute in Washington. “It’s part of a
paradigm shift from a processing-centric world to an information-centric world.
Information has become the center of the universe.”


For Pat Garvey, director of the Environmental Protection Agency’s Envirofacts
warehouse team, discovering Web warehousing was like a religious awakening. “The Web
has been a godsend,” he said.


Four years ago, Envirofacts, a data warehouse that pulls together EPA data on about a
million sites that handle potentially harmful materials, got off to a rough start.


“We started to fail as we finished the first iteration in March 1995,” Garvey
said. “We tried to do a desktop [PC] configuration. We got it deployed to 140 or 150
people and realized all of our time was going to be spent just supporting those people. We
were never going to expand the database or expand functionality.”


With the Web maturing, EPA decided to abandon the desktop PC approach and distribute
the data via the Web.


The move led to spectacular success. Each month, Envirofacts attracts about 48,000
users and gets some 130,000 queries of the data warehouse.


“We would have never achieved that kind of utilization without the Web,”
Garvey said.


Web warehousing is increasingly seen as critical to an agency’s mission.


“Data warehousing has been a hot topic for years, but it’s not the same topic
it was five years ago,” said Jim Davis, project manager for data warehousing at the
SAS Institute Inc. of Cary, N.C. “It’s no longer optional. It’s
mission-critical to decision-support activity.”


Data warehouse specialists foresee the growing use of networked data warehousing
transforming the way agencies do business and even turning organizational flowcharts
upside down.


A case in point is the Army’s Center for Healthcare Education and Studies (CHES)
at Fort Sam Houston, Texas, where a Web data warehouse provides medical billing records
from the Civilian Health and Medical Program of the Uniformed Services (CHAMPUS) to about
250 Army managed-care analysts in offices around the world.


The analysts negotiate contracts with healthcare providers and mine the data for cases
of fraud, waste and abuse.


In addition to saving the Army about $28 million over three years, the warehouse
introduced a kind of organizational counterculture at CHES, said Scott Optenberg, chief of
the agency’s Analysis Branch and head of the team that built the warehouse.


The warehouse has empowered its users, giving Army billing analysts immediate access
via the Web to CHAMPUS claim data and turning the organizational triangle upside down.


Before the warehouse was implemented, analysts were the last ones to get the data,
Optenberg said. Records and reports, filtered down to analysts from managers at the top,
had a lag time of months.


“Suddenly, you had people at the bottom becoming incredibly informed,”
Optenberg said. “What surprised us was the political threat that this represented to
many people within a vertically integrated organization.”


Networked data warehousing represents a new, radical paradigm, Optenberg said.


At the Agriculture Department, J. Norman Reid, associate deputy administrator of the
Office of Community Development, encountered similar resistance to a data warehouse
project that delivered information to USDA Rural Development Agency field offices.


Many of the systems managers on the receiving end were used to the way mainframe
systems worked, Reid said. They were not conversant with the kinds of transactions made
possible by the Web.


The IT staff at USDA initially was resistant, Reid said, because they were worried that
their data would not emerge unharmed from use in a data warehouse. His office had to do
some convincing to get over that hurdle.


Organizational discord aside, getting a data warehouse going is enough of a challenge
for managers. Because building a data warehouse usually involves huge amounts of data,
data warehouse veterans advise starting small and building methodically once the project
gets off the ground.


Warehouse builders also must remember that data warehouses are user-driven. “The
No. 1 factor in ensuring that a data warehouse is successful is making sure that it is a
tool for users,” Barquin said. If not, the project will become a dead end.


The Geological Survey’s National Gap Analysis Project is building a data warehouse
of information about the geographical distribution of plants and animals to find out where
species are slipping through cracks in biodiversity management.


“What we want to do is have a living library,” said Michael Kunzmann, a USGS
ecologist working on the program at the University of Arizona in Tucson. “We want it
to last 100 years.”


The warehouse, already 10 years in the making, will never be finished, Kunzmann said.
To continue to be useful, the data must be continuously updated—a principle that
applies to all data warehouses.


So what’s on the horizon for data warehousing? More highly sophisticated,
shrink-wrapped tools for analyzing data, such as data mining technologies. Just a few
years ago, warehouse managers had to write their own data mining software.


Publish and subscribe technologies will be used to deliver new, critical information
from the data warehouse to the user’s desktop PC, SAS Institute’s Davis said.


“You get the information in a timely fashion as opposed to having to check three
or four times a day,” he said.


EPA’s Garvey envisions open data access using a tool that would, from within a
single application, extract data from the warehouses of different agencies. For example, a
user interested in learning more about populations putting stress on wetlands areas could
cull information from EPA, Census Bureau and Geological Survey data warehouses, he said.


As the amount of data and number of databases increase, interoperability standards
become more of a concern. Managing the metadata used to manage the warehouse has been a
major headache.


But in December, Microsoft Corp. turned over control of its Open Information Model for
managing metadata to the Metadata Coalition, which has its own standard, the Metadata
Interchange Specification. The group will integrate the two, creating one new standard and
taking the fledgling technology a step further.


  

inside gcn

  • high performance computing (Gorodenkoff/Shutterstock.com)

    Does AI require high-end infrastructure?

Reader Comments

Please post your comments here. Comments are moderated, so they may not appear immediately after submitting. We will not post comments that we consider abusive or off-topic.

Please type the letters/numbers you see above