Mine safety's built on data
- By Wilson P. Dizard III
- Jun 12, 2002
George M. Fesak, Labor's data miner
(GCN Photo by Olivier Douliery)
The United States stands as the country with the largest, safest, most diverse and most productive array of mines in the world.
The Labor Department's Mine Safety and Health Administration has overseen improvements in safety that have accompanied improvements in mine productivity over the career of George M. Fesak.
Since October 1995, Fesak has directed MSHA's Program Evaluation and Information Resources Office. He has supervised the creation of a data warehouse and now is developing an advanced database, network and browser system for agencywide use.
From 1988 to 1995, Fesak directed the mine agency's Program Policy Evaluation Office, where he reviewed and assessed a range of programs.
A graduate of Villanova University, Fesak began at the agency as an electrical engineer in 1971 in its District 5 office in Arlington, Va. From 1974 to 1988, he worked as an electrical engineer on the staff of the agency's Coal Mine Safety and Health Safety Division.
GCN senior writer Wilson P. Dizard III interviewed Fesak at his MSHA office in Arlington. GCN: What are the main components of the Mine Safety and Health Administration's data warehouse project?
FESAK: We operate two mainframe systems: an IBM 9672 in New Jersey that we lease time on from SunGard Data Systems Inc. of Wayne, Pa., and a Honeywell Jupiter 9000 at the Defense Enterprise Computer Center in San Antonio.
On those two hardware platforms, we have five major systems: one that tracks accident injury information at coal mines, another that tracks coal mine inspectors and enforcement actions, a system to monitor inspections and violations at mines for metals and nonmetals other than coal, an assessment system to oversee fine assessments, and one to track miners' training and certifications.GCN: Each miner has to be qualified for specific activities in the mines?
FESAK: Right'the systems keep track of the miners' qualifications.
The problem with having these diverse systems is it makes it very difficult'nearly impossible'for users to get cross-platform data. Say if you wanted to look at data from the system to track coal mine violations and compare it to data on injuries in coal mines, the tools available are not that good.GCN: Was middleware an option for retrieving such data from different platforms?
FESAK: Yes, middleware was an option. I believe that it would have been very difficult to implement middleware on two disparate platforms like that. Middleware was fairly readily available to get at data on the IBM system, but it wasn't as available for the Honeywell.GCN: Do you use different operating systems?
FESAK: The Honeywell runs GCOS, and the IBM OS/390. We determined that it would be very difficult to use middleware to piece together those platforms, so we undertook the development of the data warehouse.
We began the effort in 1995. The first effort was to get the mining information and the accident and injury information to the data warehouse. The accident and injury information was on the IBM mainframe, and a lot of the mine information was on the Honeywell system, so we figured this would be a good test of our ability to pull this off.
This was the hardest phase. It was the first time we had done something like this, and we also had redundant data'information about mines in four of those five major systems.GCN: How did you do it?
FESAK: We had to reconcile a lot of differences, and we did a lot of data cleaning. We had to come up with standard mine names. We brought the first pilot up in 1996 and made it available for a limited number of users to get a feel for whether it was useful or not.GCN: And then you progressively fielded it?
FESAK: It was viewed as useful, so in 1997 we added the inspection and violation data to the warehouse both for coal and metal mines. We began an extensive training program for users on a wide scale.
In 1998 and 1999, we moved to the Defense computing center in San Antonio. Between that and the year 2000 date change, we had our hands full for a while, so we couldn't get back to this until after Y2K.
But in the middle of 2000, we got the metal and nonmetal health samples in the data warehouse, then in April 2001 we added assessments information. Last December, we got the coal health samples in the warehouse, so now we have virtually all the agency's data in the warehouse.GCN: Why did you choose a warehouse server from the Teradata division of NCR Corp.?
FESAK: We are using a Teradata 4800 with four Pentium II processors and eight 36G drives. The operating system is NCR's flavor of Unix.
We selected Teradata for a number of reasons. The performance is good because of the multiple-processor architecture. The scalability is good. If you need more processing power, you add more processors, and if you need more storage, you add storage drives.
There's a failover architecture that makes it very reliable. If a processor goes bad, the system keeps running and shares the load across the other processors. As you replace the bad chip, you're in business the whole time.
We required a relational database management system, and it also has some efficient management utilities, so you don't have to do a lot of maintenance on weekends. It's been very reliable.GCN: How do users access the data?
FESAK: Users enter via the MSHA frame relay network. The query tool is a network application, and it connects to the Teradata system through open database connectivity.
The system is available at about 100 offices: major facilities in Arlington, Va., Denver, Pittsburgh, and Beckley and Tridelphia, W.Va.; 17 district offices; and about 80 field offices that can be as small as one- or two-person operations.GCN: How do you train users and control their access to the system?
FESAK: This package is not available to all users. We actually required training before we gave users access to the tool. Because, as we say in our training class, it's a powerful tool, but you can make powerful mistakes. People really need to know the data and how to use the tool, or you can get queries that will give you results that aren't correct.
We have about 350 users now out of about 2,350 employees. So access to the system really is limited.GCN: What did you learn in this process that you could pass on to other agencies?
FESAK: One thing that really worked well for us was to break the project into smaller pieces. We didn't try to do everything at once.
First, we did the mine information and the accident and injury information. Then we added the systems for inspections and violations, and samples and assessments. We did it one piece at a time.
Also, you really should count on cleaning the data. When you move data over to a relational database from a nonrelational system, you run into discrepancies. For example, the names of some mines were spelled differently in different systems. We had to fix that, but it improved data quality agencywide, including on the mainframe systems.
Support from top management is essential. You are taking a risk when you make this data available. Under the old system, a request for data had to pass through several channels before it made its way to the user. Now this is nearly instantaneous'you can go in and conduct queries.
It required a high-level shift in management philosophy to give users easy access to the data. I have gotten really good support from my bosses.GCN: Does the Teradata system have a firewall and multiple levels of security?
FESAK: Yes, it does. It is inside the MSHA firewall, so it is protected from external users. It is a 100 percent internal system.GCN: How much did the warehousing project cost?
FESAK: Right now we are paying about $200,000 a year to lease the system. We developed the warehouse almost entirely internally, basically using staff resources. We have about 25 application people in Denver. The team that did this is about six or eight people. It was collateral duty.