The database of databases

 

Connecting state and local government leaders

The Internet and IP-based networks have gone a long way toward removing barriers between agencies. To further a spirit of cooperation, agencies are also beginningto share information from disparate databases by sending some of that data to common data warehouses, where it can be merged, queried and analyzed.

The Internet and IP-based networks have gone a long way toward removing barriers between agencies. To further a spirit of cooperation, agencies are also beginningto share information from disparate databases by sending some of that data to common data warehouses, where it can be merged, queried and analyzed.

Related Links

Data warehouses

Illustration by Bruno Budrovic




The Transportation Department's Bureau
of Transportation Statistics, for example,
has created the TranStats portal
(www.transtats.bts.gov).


It lets researchers, analysts or citizens
view, search, run reports and generate
graphics from 400 transportation-related
tables in over 100 databases (one of them
with more than 250 million rows) kept by
different departments and agencies.


'The Bureau of Transportation Statistics
has three primary goals for
TranStats,' said TranStats data team
manager Cheryl Young. 'The first is to assist
public users in obtaining data easily
and efficiently, the second is to support
public and internal users who perform
online analysis and the third is to offer
data to internal users to perform advanced
statistical analysis using third-party tools.'


But setting up a data warehouse is more
involved than building a traditional database.
'We learned that data collection and
presentation involves much more than
technology,' Young said. 'It is difficult to
change the concept of data stewardship.
We would like to have focused much earlier
on this issue so that the technical
progress can follow.'


Common characteristics

While they share many common characteristics,
a data warehouse is different in design
and function from a production database.
The primary difference is whether the
system will be used for online transaction
processing or online analytic processing.
Most production databases are primarily
for OTLP and require high-speed access to
individual data records. A motor vehicles
department, for example, would need an
OLTP database to let clerks quickly access
the record of an individual car, determine if
a registration is current and update data.


The TranStats data warehouse, on the
other hand, requires a database management
system designed for OLAP. This affects
the hardware needed to support the
data warehouse, as well as the DBMS itself.
TranStats runs on four Sun Solaris
servers and a 2.5TB Sybase IQ Database
Management System from Sybase Inc.
Young said query speed, not transaction
speed, led to selecting Sybase for this application.


'Our data changes monthly or quarterly,
but our queries must be performed within
a split second,' Young said. 'Sybase IQ is
much faster than other databases and the
only one that can meet our dynamic Web
presentation criteria.'


In addition, data warehouses extract information
from a number of databases, so
they tend to be much larger than any one
of their sources.


'Scalability becomes a bigger issue with
a data warehouse, as does the way in
which the DBMS implements parallel
queries because of the heavy query nature
of the data warehouse,' said Craig Mullins,
consultant and author of Database Administration:
The Complete Guide to Practices
and Procedures (Addison-Wesley
Professional, 2002).


Data mart or data warehouse?

There are two main approaches an organization
could take in setting up a data warehouse.
Bill Inmon, considered the father of
data warehousing, favors a top-down approach.
The organization decides what it
wants to contain in the central warehouse
and how that data will be used. Lower-level
analytic databases, or 'data marts,' then
draw their data from the central warehouse.
This method ensures consistency of
data throughout the organization.


The other method is a bottom-up approach.
The data marts are set up first to
meet specific business needs, and these feed data into the warehouse. This allows
the organization to start realizing a return
sooner. Whichever method is chosen, the
overall data warehouse design should be
decided early in the process.


'In setting up a data warehouse, it is important
to have the data model correct
right from the get-go,' said Nathaniel
Booth, IT specialist at the Geological Survey
Data Center in Middleton, Wis. 'You
have piles of applications that will depend
on that data going forward, so it is really
hard to change later.'


USGS hired a consultant for the initial
design of the data warehouse and early
business intelligence deployments. In addition
to settling on the data format, one
must also select the DBMS. The decision is
complicated by the fact that, in most cases,
the organization already has some sort of
DBMS in place, and any new system must
be compatible with what already exists.


In addition, it requires interoperability
with systems at any other agency that will
be feeding data into the data warehouse.


'Assuming you already have a database,
which almost everyone in government
has, just buy more of what you have and
do a sole-source acquisition,' advises John
Kost, analyst for Gartner Inc. of Stamford,
Conn. 'The risk in any procurement is
[that] you are going to end up buying
something you hadn't expected and don't
have the internal skill sets to support.'


That was the approach USGS took in setting
up a central data warehouse for collecting
water quality data. Water doesn't
respect legislated boundaries. A pollutant
dumped in a stream in one city will flow
quickly into another municipality's water
system without stopping at the city limits,
finally winding up in a lake or ocean.


Conversely, USGS takes data streams
from state, local and federal agencies nationwide
and aggregates them in Oracle
databases at its data center. Researchers
can then conduct browser-based queries
through a flexible interface.


The largest database contains the National
Water Quality Assessment, with
more than 11 million water and biological
test results.


'We stayed with Oracle because of its
support for a wide array of functionality
including Oracle spatial, text search and
indexing,' Booth said. 'We had to bring in
data from SQL Server and Ingres databases
as well as text data.'


The data center started with Oracle 7.3
and now runs three instances of Oracle's
10g database on Red Hat Linux.


'We try to stay on the most recent version
of the database,' Booth said, though
this sometimes introduces problems.
'Sometimes the new version doesn't support
the applications you are running,
and previous queries don't always work
right, so just dealing with the change
management through the versions of the
database is somewhat of a challenge.'


He also said ensuring all the data follows
a consistent format is an ongoing effort.


'Vendors still haven't come to us with a
standardized metadata
repository, so we have a
difficult time having to
manage the metadata
about multiple products,'
he said. 'In hindsight, it
is nice to pick ones that
play well with metadata.'


'We learned that data collection and presentation involves much more than technology. It is difficult to change the concept
of data stewardship.'
'CHERYL YOUNG, TRANSPORTATION DEPARTMENT


However, when one is depending on an
array of entities with their own legacy systems
and unique data needs, managing
the data warehouse means constantly negotiating
with all interested parties.


Sonoma County, Calif., for example, set
up an IBM DB2 Geographic Information
Systems data warehouse for use by
all agencies, including agricultural inspectors,
tax assessors, the building department
and emergency services. The
county had already been using ESRI
software for some of its applications, but
there was no standard across the county.


'We were looking for a central repository
to share information, so people didn't have
to guess which version is most up-to-date,'
said GIS manager Tim Pudoff. 'By having
a central database where people can publish
their data, we can provide users with a
one-source shopping point.'


The GIS staff has real-time access to add
or update information in the warehouse,
and participating agencies periodically
send automatic or manual feeds from their
own servers. The system is overseen by a
technical advisory committee with representatives
from 16 agencies, who provide
requirements and buy-offs on any upgrades
or new services. In such a situation, Pudoff
emphasized, it is vital to have an executive
champion to help manage the competing
interests of the different parties involved.


'You need someone who can help you
wade through the bureaucratic
trials,' he said.
'I have that from my
manager, and it made all
the difference.'


Selecting the right
hardware and software
for a data warehouse involves
more than just looking at its technical
prowess or initial cost.


Looking ahead

'It is almost irrelevant what the vendor
tells you the price is,' Kost cautions. 'You
need to figure out what are the component
parts of the implementation, including
your own costs.'


This includes the hardware, services in
setting up the database, creating links to
other databases, applications or agencies,
and ongoing support.


Although an initial database purchase
might be geared to a single use, Mullins
advised breaking out the hardware, software
and services components and examining
them separately, even if the entire
contract does wind up going to a single
vendor.


'I would split out the hardware and
programming services from the DBMS
unless this particular DBMS is going to
be used only by this one project,' he said.
'A DBMS is designed for sharing data
across multiple applications, and you
don't have to always have a brand-new
DBMS for every application.'


In evaluating the DBMS, Mullins said to
look at the analytical and reporting features,
not the transaction speed.


'An RFP for a data warehouse should
focus on analytical processing instead of
transactional processing,' he said. 'The reporting
features embedded with the
DBMS become much more important, as
does the ability to manage very large
amounts of data. A DBMS with embedded
ETL [extract, transfer, load] features will
also move to the head of the pack of data
warehouse RFPs, because warehousing
implementations require heavy data
movement from operational databases to
the warehouse.'


One also has to consider who is going to
support it once the consultants wrap up
the project, which was one factor in USGS'
decision to stick with Oracle rather than a
product with a smaller installed base and
local support network.


'For us, it is important to have face time
with the consultants,' Booth said. 'It is
difficult, when you are not in a major metropolitan
area, to find skills in these very
specific technologies.'



DATA WAREHOUSING





Drew Robb is a freelance technology writer in Los Angeles.

X
This website uses cookies to enhance user experience and to analyze performance and traffic on our website. We also share information about your use of our site with our social media, advertising and analytics partners. Learn More / Do Not Sell My Personal Information
Accept Cookies
X
Cookie Preferences Cookie List

Do Not Sell My Personal Information

When you visit our website, we store cookies on your browser to collect information. The information collected might relate to you, your preferences or your device, and is mostly used to make the site work as you expect it to and to provide a more personalized web experience. However, you can choose not to allow certain types of cookies, which may impact your experience of the site and the services we are able to offer. Click on the different category headings to find out more and change our default settings according to your preference. You cannot opt-out of our First Party Strictly Necessary Cookies as they are deployed in order to ensure the proper functioning of our website (such as prompting the cookie banner and remembering your settings, to log into your account, to redirect you when you log out, etc.). For more information about the First and Third Party Cookies used please follow this link.

Allow All Cookies

Manage Consent Preferences

Strictly Necessary Cookies - Always Active

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Sale of Personal Data, Targeting & Social Media Cookies

Under the California Consumer Privacy Act, you have the right to opt-out of the sale of your personal information to third parties. These cookies collect information for analytics and to personalize your experience with targeted ads. You may exercise your right to opt out of the sale of personal information by using this toggle switch. If you opt out we will not be able to offer you personalised ads and will not hand over your personal information to any third parties. Additionally, you may contact our legal department for further clarification about your rights as a California consumer by using this Exercise My Rights link

If you have enabled privacy controls on your browser (such as a plugin), we have to take that as a valid request to opt-out. Therefore we would not be able to track your activity through the web. This may affect our ability to personalize ads according to your preferences.

Targeting cookies may be set through our site by our advertising partners. They may be used by those companies to build a profile of your interests and show you relevant adverts on other sites. They do not store directly personal information, but are based on uniquely identifying your browser and internet device. If you do not allow these cookies, you will experience less targeted advertising.

Social media cookies are set by a range of social media services that we have added to the site to enable you to share our content with your friends and networks. They are capable of tracking your browser across other sites and building up a profile of your interests. This may impact the content and messages you see on other websites you visit. If you do not allow these cookies you may not be able to use or see these sharing tools.

If you want to opt out of all of our lead reports and lists, please submit a privacy request at our Do Not Sell page.

Save Settings
Cookie Preferences Cookie List

Cookie List

A cookie is a small piece of data (text file) that a website – when visited by a user – asks your browser to store on your device in order to remember information about you, such as your language preference or login information. Those cookies are set by us and called first-party cookies. We also use third-party cookies – which are cookies from a domain different than the domain of the website you are visiting – for our advertising and marketing efforts. More specifically, we use cookies and other tracking technologies for the following purposes:

Strictly Necessary Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Functional Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Performance Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Sale of Personal Data

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.

Social Media Cookies

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.

Targeting Cookies

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.