Weather agency goes first-CLASS

David J. Vercelli was the original program manager for CLASS.

Henrik G. de Gyor

One of the world's largest stores of environmental data is now online at a dual-sited federal portal called CLASS.

The Comprehensive Large Array-data Stewardship System in March began serving up more than 41T'that's terabytes'worth of National Oceanic and Atmospheric Administration environmental data collected from satellite and ground observations. CLASS operates out of the National Climatic Data Center in Asheville, N.C., and the Office of Satellite Data Processing and Distribution in Suitland, Md.

Each site has two dual-processor IBM p660 eServers running AIX, plus a robotic tape library, said Charles Bryant, a computer specialist at Suitland.

Both sites share the same data catalog and are synchronized in near-real time so either site can fail over to the other if necessary. The eServers communicate over a 45-Mbps T3 channel on a 155-Mbps Synchronous Optical Network OC-3 dedicated connection.

IBM Informix Dynamic Server Enterprise Edition replication keeps the two archives synchronized 'within a few seconds of real time,' Bryant said.

'Browse images' of large satellite data sets, such as oceans or continents, also stay synced. At class.noaa.gov, users can search and order from 28 available types of atmospheric, coastal, ocean and other data products.

Bryant said requests are filled in minutes, hours or days depending on the size of the data sets. 'We try to be flexible with our user community within the confines of our resources.'

CLASS has about 24,600 registered users'a mixture of public- and private-sector climatic researchers and weather forecasters. Eventually they will be able to order and pay for their data sets online.

Because the data sets are so large and varied, CLASS must use spatial extensions to standard relational database management system formats, NOAA IT specialist David J. Vercelli'the original CLASS program manager'told the American Meteorological Society last year.

In addition to building CLASS, NOAA is 'developing tools to extract spatial information stored in the file record headers, and to ingest that data into the geospatial databases as [satellite] orbits and scan lines,' Vercelli said. Users will immediately benefit, he said, if they can make general Structured Query Language queries as well as geospatial ones.

By fiscal 2011, NOAA will have spent $117 million on CLASS development, compared with 'an estimated cost of $212 million to maintain the status quo' of data dissemination, agency spokesman John Leslie said.

The 10-year total cost to operate individual archive systems would have been more than $400 million, compared with $180 million for CLASS, Leslie said.

Compliance

Beyond the cost savings, he said, indirect benefits will flow from complying with National Archives and Records Administration policies for data storage and preservation''which current systems generally do not,' Leslie said'and the ability to handle new data sets.

The CLASS archive now holds data from polar-orbiting operational environmental satellites and geostationary operational environmental satellites, although not all the existing GOES tape data is present yet.

'We will go back and re-ingest the older data and process it for CLASS,' Bryant said, as part of the agency's data rescue campaign.

There are many mechanisms besides satellite observations for collecting data sets, and NOAA has to undertake 'a data campaign to try to ingest each new set,' Bryant said. Among other things, the agency has to write an algorithm to process it.

Thousands of sites provide data to NOAA, some on tape, some from remote sensors, other sets from ground radar or direct download. And down the road, CLASS will likely have new sources of environmental data to archive as well as faster connectivity through Internet2 and other high-speed networks.

The descriptive metadata for each category in CLASS tells, for example, who collected it, when and where, what instruments were used and how they were calibrated.

'Some data types have more metadata than others,' Bryant said. 'There's no set amount. The goal is to develop a metadata capability to measure up to standards' set by the Federal Geographic Data Committee and International Standards Organization. The data archive does not yet meet those standards, however.

Vercelli told the meteorology society last year that NOAA's future metadata repository will use an Oracle Corp. relational database management system, ArcIMS mapping software from ESRI of Redlands, Calif., Extensible Markup Language and XML stylesheets, and Java access tools from Blue Angel Technologies Inc. of Valley Forge, Pa.

The main contractor for the multiyear development of CLASS was Computer Sciences Corp. Online access to the GOES data was developed by TMC Technologies Inc. and the Institute for Scientific Research, both of Fairmont, W.Va., and Fenwick Technologies Inc. of Morgantown, W.Va.

The vast storehouse now holds 6.4 million archived files totaling 41T. By 2015, CLASS will be collecting an estimated 1,200T per year from existing and next-generation satellites and other sources.

'We're in the petabyte range down the road,' Bryant said.

Reader Comments

Please post your comments here. Comments are moderated, so they may not appear immediately after submitting. We will not post comments that we consider abusive or off-topic.

Please type the letters/numbers you see above