NOAA's archivist balances preservation and accessBY PATRICIA DAUKANTAS
| GCN STAFF
Weather data comes in all forms: coastal charts annotated in flowery 19th-century script, photographs from 1960s satellites, Doppler radar images and handwritten notes from backyard meteorologists.
A data center at the National Oceanic and Atmospheric Administration is organizing all that information into easily retrievable digital records.
The goal of the Climate Database Modernization Program (CDMP) is to improve data accessibility, said Kenneth D. Davidson, deputy director of NOAA's National Climatic Data Center in Asheville, N.C.
The center is part of NOAA's National Environmental Satellite, Data and Information Service, which operates government-owned weather satellites. Besides NCDC, NESDIS runs data centers for geophysical data in Boulder, Colo., and for oceanographic data in Silver Spring, Md.Chock-full of data
NCDC's home page, at www.ncdc.noaa.gov
, advertises the Asheville center as the 'world's largest archive of weather data.' The satellite, radar and conventional observation data, in both digital and hard-copy formats, probably adds up to more than 1 petabyte, or 1 million gigabytes, Davidson said.
'Everybody views the Library of Congress as this massive amount of information,' Davidson said. 'Well, it's nothing compared to what we have.'
By phone, fax and e-mail, the Asheville center handles about 3 million data requests per year. They range from the simple'local weather on someone's birth date'to the complex'two months' worth of observational, satellite and radar data for an entire state.
'Lots of our biggest customers are lawyers,' Davidson said. Certified NCDC data can be used as evidence in court without testimony from NOAA witnesses.
Kenneth D. Davidson, deputy director of NOAA's National Climatic Data Center in Asheville, N.C., oversees the nation's meteorology archives.
Other frequent customers include government agencies, industry, researchers and transportation companies.
NCDC's online store offers one-time weather data purchases, subscriptions for specific locations and free data downloads. Users can search for data by type or simply browse a list of extreme-weather facts and images.
About seven years ago, Congress gave NOAA funds for the Environmental Data Rescue Program, but the goal then was mainly to copy data from deteriorating media'paper, microfilm and microfiche.
In fiscal 2000 Congress ended that program and started CDMP with an emphasis on accessibility.
'They permit some preservation, but access is the real key,' Davidson said. 'In everything I do, I have to demonstrate that I'm improving access for the general public.
'I'm still doing some preservation, but I can't just move things from paper to paper. If I have a piece of paper with observations that somebody wrote out in 1820, I have to be able to make the information accessible.' The Asheville center has 28T of data online. More arrives at a rate of 750G per day, and not all is digital.
For example, the National Weather Service has a network of 11,000 volunteer observers who write down meteorological observations at their homes or schools and mail the forms to Asheville.
Contractors do most of the scanning and keying in of about 13,000 forms each month, Davidson said. The contractors are Doxsys Inc. of Bethesda, Md., and Image Entry Inc. of both Rocket Center, W.Va., and London, Ky. Lason Inc. of Troy, Mich., works as a subcontractor to Doxsys.
Scanning and keying are both necessary because current handwriting recognition programs are only about 60 percent to 70 percent efficient, Davidson said.
Handwriting in the 19th century was totally different in style from today's script. 'It's beautiful writing, but try to get it so a computer can understand it,' Davidson said.
NCDC's basement has almost 30,000 square feet of floor space filled with 12-foot shelves, and every shelf is packed with data, Davidson said.
NCDC warehouses the atmospheric data, but certain data on oceans and coastal zones is key to the overall climate picture. NCDC and another NOAA agency, the National Ocean Service, are converting mean shoreline height records to digital form. Some tidal charts that date back to the 1850s have been available only by visiting the Silver Spring center.
'The importance of that data is in determining how close you can build to the water in each state,' Davidson said. Digitizing those charts would be 'a major step forward for NOAA.'Anonymous sources
Early NESDIS satellite data still exists in hard-copy pictures. Storage of digital images began in 1978 with the launch of the TIROS-N environmental data satellite.
Davidson said it's important to digitize the older data to aid climate researchers in building a longer-term climate database.
'In today's world data can be distributed pretty easily,' he said. Through the center's virtual data system, users can retrieve data from the NESDIS or NCDC Web sites without knowing where it came from.
NCDC started using large automated tape libraries from IBM Corp. in 1995 and has expanded them as needed. 'You can have access to a large volume of data within seconds,' Davidson said.
The Asheville center uses IBM Corp.'s High Performance Storage System software to organize its mass storage. An Oracle Corp. database management system runs on IBM and Sun Microsystems servers, with scratch space on a RAID array as a staging area for retrieving from the tape libraries.
About 110 million pages of historical information did not get digitized under the 1990s data rescue program. 'It costs a lot of money to place data online and maintain it online,' Davidson said.
CDMP funds pay for the contractors' work but cannot be spent directly on infrastructure improvements such as new tape libraries, he said.