Concierge service for big data transfers
- By Carolyn Duffy Marsan
- Mar 30, 2015
Leading-edge data centers are grappling with how to transfer massive datasets of 10 terabytes or more from one location to another. That’s often the case in the government and research spheres, where large data transfers between campuses or departmental data centers are often necessary.
To handle these massive transfers, data center managers are working with services such as Globus, developed by the Energy Department’s Argonne National Laboratory.
Globus is a cloud-based data transfer service that supports the sharing of large datasets in a way that carefully manages bandwidth and improves reliability, according to its developers.
“We started as a high-performance secure file transfer service,” said Vas Vasiliadis, director of products, communications and development for the Computation Institute at the University of Chicago’s Argonne National Lab.
“If you want to move terabytes or petabytes of data from a national lab back to your campus, we are a service that will act as a third-party mediator or controller to make sure the data transfer completes. We recover errors automatically and notify you when we’re done,” he said.
The Massachusetts Green High Performance Computing Center, which supports researchers in Boston from the Center’s base in western Massachusetts, is one user of the service. MGHPCC director John Goodhue said the software has an interface that’s easy for scientists to use without needing IT support. The benefit of Globus is that it optimizes the way a large file is transmitted across the network.
“Globus figures out the speediest way to get the file from here to there,” Goodhue said. “It has a set of performance monitoring tools to periodically check those paths and make sure nothing is hindering the transfer rate. You can think of it as an overlay on the Internet that is very careful about the paths it chooses and also tests those paths to make sure the transfer rates can be very high.”
Goodhue said Globus makes the transfer “simple, fast and transparent for researchers to move big datasets from one place to another.’’
The Globus service has been available for five years and includes 30 federal laboratories and universities as its customers.
One of the advantages of Globus is that it allows the end user to manage, move and share very large data sets without involving IT department personnel. The model has uses for enterprise data as well as scientific data, Vasiliadis said.
“We’re handling the administrative burden and letting our users take advantage of the high-performance storage systems we have in place,” Vasiliadis said. “Transferring data is really time consuming and error-prone, and it shouldn’t be that way. We give the user a simple browser tool, and they can move terabytes of files and forget about it. They don’t have to babysit the transfer.”
Argonne offers other Globus services that take advantage of the transfer technology, including a data publication and discovery service that allows researchers to share their data with others through a cloud-based platform.
“We give them the mechanism to describe their data using metadata and to assemble it and spread it across multiple systems for storage,” Vasiliadis said. “We give the data a permanent identifier, which allows the researcher or institution to curate it in a way that makes sense for them.’’
Carolyn Duffy Marsan is a writer based in Milwaukee, Wisc., covering enterprise technology.