Second-guessing Irene: Could 120-petabyte array make better predictions?

Predicting the outcome of severe storms is one of the hardest jobs there is, based on complex models with many variables. The difficulty was evident with Hurricane Irene, which wound up being less severe than forecasters predicted.

Perhaps the weak point isn’t computing power or human oversight. Perhaps it’s simply storage.

Related coverage:

Hurricane season? Really? FEMA widget can help you prepare.

The amount of data storage needed to run the sorts of simulations necessary to help predict weather patterns is, quite simply, really huge. Certainly it’s larger by degrees of magnitude than is currently needed for a desktop computer application or even basic storage for an entire network.

IBM researchers have announced that that they are now able to link more drives together than ever before – 200,000 to be precise – into one giant continuous drive. Individually, these drives are just your ordinary serial-attached small computer system interface (SCSI) drives, but when IBM puts them together they yield 120 petabytes of storage.

If you haven’t been working with long-term, back-up storage, you might not be familiar with that prefix. Well, you know what a gigabyte is — your desktop’s hard drive and even your key drives are likely quantified in this unit. A petabyte is more than 1 million gigabytes (actually, 1,048,576 – binary, remember?). That’s right – there are as many gigabytes in a petabyte as are there are kilobytes in a gigabyte.

For some real-world perspective, imagine you are creating the most fantastically impressive slideshow presentation. Dozens of slides, graphics, a little animation, the works – you know, the kind of file that your network administrator yells at you about because it’s taking up too much storage space and no, you cannot e-mail it to anyone. Well, 120 petabytes can hold about 1.5 million of those.

IBM’s research lab in Almaden, Calif., is building the system for an unidentified client that is planning to do supercomputer simulations of real-world events, according to Technology Review.

And that kind of power also could be applied to generating more accurate weather predictions.

There are a lot of concerns about combining that many discs, not the least of which is failure rate. With even the best equipment, after a while there becomes a small chance at least one drive in an array will fail each and every day.

Under this arduous environment, basic redundant array of independent disk (RAID) architecture simply wouldn’t cut it. Even the most secure RAID structure in common usage, RAID 6, has a fault tolerance of two disks, meaning if three drives failed before any were replaced, you’d be out of luck.

So IBM had to come up with its own brand of redundant disc array. The researchers aren’t sharing any details, of course, but it involves lots of replacement disks already mounted and software that speeds up a restoration process when it detects additional drives failing. IBM claims this storage system wouldn’t lose any data in a million years of constant use, with no loss in performance. It’s nice to know that the machine can outlive our children’s children’s children without further maintenance.

Now this is all well and good, but the issue that people had with Hurricane Irene was not in the accuracy of the predictions, but in the accuracy of what they were told to expect. Storm predictions are always a combination of two things. The weather services never want to potentially underplay a storm’s strength for the sake of the lives it could cost if they were wrong in the other direction. Also, the media needs to drive their viewer numbers and does this by overstating the danger at times.

So even if this new technology will help to more accurately predict the danger a storm presents, what the public will hear will always be worse.

About the Author

Greg Crowe is a former GCN staff writer who covered mobile technology.


  • Records management: Look beyond the NARA mandates

    Pandemic tests electronic records management

    Between the rush enable more virtual collaboration, stalled digitization of archived records and managing records that reside in datasets, records management executives are sorting through new challenges.

  • boy learning at home (Travelpixs/Shutterstock.com)

    Tucson’s community wireless bridges the digital divide

    The city built cell sites at government-owned facilities such as fire departments and libraries that were already connected to Tucson’s existing fiber backbone.

Stay Connected