Not all clouds created equal
A major bottleneck in scientific discovery is now emerging because the amount of data available is outpacing local computing capacity, according to authors of new paper published on PLOSone.
And though cloud computing gives researchers a way to match capacity and power with demand, the authors wondered which cloud configuration would best met their needs. According to the paper, Benchmarking undedicated cloud computing providers for analysis of genomic datasets, the authors benchmarked two cloud services, Amazon Web Services Elastic MapReduce (EMR) on Amazon EC2 instances and Google Compute Engine (GCE), using publicly available genomic data sets and a standard bioinformatic pipeline on a Hadoop-based platform.
They found that GCE outperformed EMR both in terms of cost and wall-clock time, though EMR was more consistent, which is an important issue in undedicated cloud computing, they wrote.
The time differences, the authors said, “could be attributed to the hardware used by the Google and Amazon for their cloud services. Amazon offers a 2.0 GHz Intel Xeon Sandy Bridge CPU, whilst Google uses a 2.6 GHz Intel Xeon Sandy Bridge CPU. This clock speed variability is considered the main contributing factor to the difference between the two undedicated platforms,” they wrote.
The authors did note that while cloud computing is an “efficient and potentially cost-effective alternative for analysis of large genomic data sets,” the initial transfer of the data into the cloud was still a challenge. One option, they suggested, would be for the data providers to directly deposit the information to a designated cloud service provider, thereby eliminating the need for the researcher to handle the data twice.
More detail about the benchmarking and results are available on PLOSone.
Posted by GCN Staff on Oct 01, 2014 at 1:28 PM