Streams of Data
Streams of Data
Out-of-the-box reigns at Geological Survey
- By Patricia Daukantas
- Oct 04, 2001
A Geological Survey database development group in the Midwest believes strongly in off-the-shelf software tools.
Harry House, a developer in the Wisconsin District of the Geological Survey's Water Resources Division in Madison, Wis., said his team turns to custom coding only as a last resort.
Out-of-the-box development takes no less skill than custom coding, he said, but it makes the developers more productive.
The division does basic hydrologic monitoring of streams and lakes. A Web site at wi.water.usgs.gov
posts the data about the flow of groundwater, rivers and streams in Wisconsin. The Madison office also runs a sophisticated testing laboratory for mercury pollution.
House started the database application team three years ago and has seen it grow from two to nine members. Every few months, they transfer data from the National Water Information System (NWIS) to the National Water Quality Assessment Program (NAWQA) data warehouse.
NWIS is a scientific database of hydrology and water-quality information. NAWQA holds some of the same water-quality data plus biological data not in NWIS.
The 10-year-old NWIS, a legacy system built with the Ingres database manager from Computer Associates International Inc., has not been fully migrated to relational status, House said.
NWIS is much bigger than NAWQA but it is not optimized for querying and data retrieval, as is NAWQA.
NAWQA uses the Oracle 8.1.7 database manager. House said he would like to upgrade it to the newer Oracle 9i, but 9i is not yet available for Microsoft Windows NT.Keeping up-to-date
The team has used several tools for extraction, transfer and loading over the years, House said. They tested Informatica PowerMart 5 from Informatica Corp. of Redwood City, Calif., last year and upgraded to PowerMart 5.1 this year.
'We typically stay close to the latest release,' House said.
They use the Informatica software 'for virtually every project that has any kind of data movement from source to target,' he said. Group members like its speed at loading and its detailed error tracking.
'We're an Oracle shop completely,' House said, although his group uses other tools when necessary. He would like to migrate his servers to the open-source Linux operating system, but a change in Oracle Corp.'s Linux support delayed the project, he said.
Oracle formerly supported Red Hat Linux from Red Hat Inc. of Durham, N.C., but early this year switched to the SuSE Linux distribution from SuSE Inc. of Oakland, Calif., House said.
'We have tons of Intel equipment, and I wasn't going to throw all that away' to buy Unix platforms, House said.
So his group has delayed the Linux conversion until they could learn more about the SuSE distribution. 'We'll take another shot at it in the spring,' House said.Small potatoes
The division's databases don't get enough hits to justify large platforms with proprietary Unix OSes, House said. 'It's not like we're the Defense Department here,' he said. The servers include eight Dell PowerEdge 1400 servers, two PowerEdge 2400s, and one each of the PowerEdge 2300 and 6300 models.
When the development group started three years ago, it had only one 200-MHz uniprocessor server to work with. 'It was just a workstation'that's all we could afford,' House said.
Besides NWIS and NAWQA, House's group works on other government databases.
They developed the underlying database application for a beach water-quality Web site for the city of Milwaukee. To start the project, Milwaukee got two years' funding from the Environmental Protection Agency's Environmental Monitoring for Public Access and Community Tracking (EMPACT) program.
After the EMPACT funding ran out, the Wisconsin Department of Natural Resources kept the project going, House said. The Milwaukee and Racine, Wis., health departments administer the site, at infotrek.er.usgs.gov/pls/beachhealth
House's team transformed the underlying database from a custom-coded PL/SQL Web interface over an Oracle database into an Oracle portal.
'We threw out all the custom code,' House said. 'The portal gives us a lot more capability.'
House said his group is now working with EPA and other agencies on a prototype of the National Environmental Methods Index'a nationwide database of analytical methods for pollutants in water samples.
Except for one portion of the Geological Survey's Gateway to the Earth initiative, which calls for creating a data warehouse of research publications, the team is restricted to working for government agencies that have taxing authority as well as data sets related to the Water Resources Division's mission. In other words, members can't design, say, mapping databases or financial management systems.Natural resources
Another work in progress is the Wisconsin DNR's statewide fish and habitat database, which tracks fish health and distribution. The department never had a centralized database, so it contracted with House's team to build one. Registered users can check out the database, which is still in development, at infotrek.er.usgs.gov/pls/wdnr_biology_wdb
Another project, the National Environmental Methods Index for EPA and the Geological Survey, is not ready for public release yet, House said.