big data

What to do when data's too big to just transfer to the cloud

As government agencies consider moving their enterprise data to the cloud, their first question might be: How does it get to the cloud? In most cases, data can be transmitted  via FTP or HTTP protocols, but for some applications — like life sciences, sensor and video surveillance applications — the data is just too big to fit through the pipe. What’s the best option? 

Pack it up and ship it out.

Some major cloud vendors now offer a service whereby clients can ship physical media to the data center, where it can be uploaded, eliminating overly long data transfer times. Bulk imports are especially useful when data is first ported to the cloud or for backup and offsite storage. The fees for this service vary, and some cloud providers will also download data from the cloud and ship it via physical media. 

AWS Import/Export accelerates transferring large amounts of data between the AWS cloud and portable storage devices that clients ship to Amazon. It uses the company’s multimodal content delivery network that can transmit terabytes of data faster than a T-3 leased line to transfer data from physical media to Amazon S3, Amazon EBS or Amazon Glacier. Amazon charges $80 for each device handled; other costs depend on which Amazon cloud is used as well as the time it takes Amazon to upload the data or decrypt the device. For more information, see the AWS Import/Export documentation.

Google Cloud Storage Offline Disk Import is an experimental feature that is currently available in the United States only. The service gives clients the option to load data into Google Cloud Storage by sending Google physical hard drives that it loads into an empty Cloud Storage bucket. Google requires that the data be encrypted. Because the data is loaded directly into Google's network, this approach might be faster or less expensive than transferring data over the Internet. According to Google, import pricing is based on a flat fee of $80 per HDD irrespective of the drive capacity or data size. After that, standard Google Cloud Storage pricing fees apply for requests, bandwidth and storage related to the import and subsequent usage of the data, according to the company.

HP Bulk Import Service is still in private beta, but it allows users to load their data into HP Cloud Block Storage or HP Cloud Object Storage. The new service, which is expected to be released in fall 2013, will let users send hard drives directly to HP’s data centers, where data can be rapidly uploaded and transferred.

Rackspace’s Bulk Import to Cloud Files is a service that lets clients send Rackspace physical media to be uploaded directly at the data centers, where “migration specialists” connect the device to a workstation that that has a direct link to Rackspace’s Cloud Files infrastructure. Rackspace will not decrypt data, though the company plans to offer that option in the future. Rackspace charges $90 per drive for bulk imports.

For cases where the data is consistently too large to transmit and access demands won’t allow the latency inherent in shipping data, Apsera offers its Fast Adaptive Secure Protocol (FASP) data transfer technology that eliminates the shortcomings of TCP-based file transfer technologies such as FTP and HTTP, the company’s website explains. On a gigabit WAN, FASP can achieve 700-800 megabits/sec transfers with high-end PCs and 400-500 megabits/sec with commodity PCs, the company said.

Aspera said its software is in use and accredited for SIPRnet, JWICS and FIPS 140-2, and it has been vetted by the intelligence community for large data transfers over military networks. It is also used in the 1000 Genomes Project that exchanges data between the National Center for Biotechnology Information and the European Bioinformatics Institute.

About the Author

Susan Miller is the executive editor of GCN. Follow her on Twitter: @sjaymiller.

Reader Comments

Wed, Sep 11, 2013 Helpful NoVA

You missed a very viable option that is gaining popularity. Select cloud vendors, like AWS, allow you to connect physical hardware directly to your resources inside their cloud. The Amazon Direct Connect product, 1 Gb or 10 Gb bandwidth options, would allow an agency to bulk upload data to and from AWS over a dedicated link... no carrier required.

Wed, Aug 7, 2013

So what is the point of delivering big data physically if that is what we tried to get away from years ago. So now the cloud can not handle big uploads so you have to physically deliver them, that is ridiculous. Created another cost of time and money.

Wed, Aug 7, 2013 Cyber H

Susan, Cloud computing is driving a new wave of innovation in the area of big data. The open source solution from HPCC Systems provides a single platform that is easy to install, manage and code. Designed by data scientists, HPCC Systems is a data intensive supercomputer that has evolved for more than a decade, with enterprise customers who need to process large volumes of data in a 24/7 environment. Its Thor Data Refinery Cluster, which is responsible for ingesting vast amounts of data, transforming, linking and indexing that data, and its Roxie Data Delivery Cluster are now offered on the Amazon Web Services (AWS) platform. Taking advantage of HPCC Systems in the cloud provides a powerful combination designed to make Big Data Analytics computing easier for developers and can be launched and configured with the click of a button through their Instant Cloud solution. More at http://hpccsystems.com

Please post your comments here. Comments are moderated, so they may not appear immediately after submitting. We will not post comments that we consider abusive or off-topic.

Please type the letters/numbers you see above