iPlant delivers free big data tech to the bioscience community
- By Paul McCloskey
- Oct 03, 2013
The National Science Foundation recently renewed a $50 million grant to a group of universities that has taken an innovative approach to delivering big data tools to plant science researchers who might not have the resources for them.
The grant to the iPlant Collaborative was made to University of Arizona’s BIO5 Institute and its partners, including the Texas Advanced Computing Center at the University of Texas, Austin; Cold Spring Harbor Laboratory in New York; and the University of North Carolina Wilmington.
The group’s aim is to create a “national cyberinfrastructure for the biological sciences,” that would provide high-performance computing, including cloud services and open data sets, to smaller, individual research teams whose work would be limited or slowed down without them.
"iPlant's mission is to merge ever-evolving computational technology and shared data capabilities with collaborative human brainpower, essentially changing the way we approach life science research,” said Stephen Goff, iPlant's principal investigator and project director, in a UA release.
In particular, the project hopes to tackle one of the paradoxes of today’s big science research: the ability to generate so much data it becomes difficult or unfeasible to analyze. The technology to keep up with the processing demands of big science has historically been available only to well-funded individual research groups.
The increasingly huge data sets collected by research teams and other organizations can be too big to properly analyze. In some cases, even too big to move. It’s something government and academia have been trying to address. Lawrence Berkeley National Laboratory, for example, recently developed a new was to search large data sets with a technique called “distributed merge trees.” And Rice University, working on a NSF grant, is building an energy-efficient optical network designed for big data.
iPlant officials say they now want to “level the playing field,” so that technical tools are provided for free or at cost, “to all plant biologists, regardless of their position or means,” according to iPlant.
"We've always had big data, but now we have the usable tools and technology to act on it,” said Nirav Merchant, iPlant's cyberinfrastructure faculty advisor at UA.
The IT systems and services available to research teams through the iPlant Collaborative include high-performance computing , data mining, simulation and bioinformatics tools, including open-source software developed by a distributed team of programmers.
iPlant staff also will provide maintenance services for project data, including data set hosting, back-up systems, patch management and monitoring. All data and software tools are free or made available on open-source terms.
Tools and apps particular to plant biology also are in the basket of services available to the iPlant researchers.
The “DNA Subway,” for example, simplifies DNA sequence annotations and helps students and faculty compare genomics workflows. My-Plant is a social network community for plant biologists and researchers to use for collaboration.
Developers are at work on other systems molded to the needs of the plant science community, including CoGe, an online system for simplifying retrieval of genomic information and sequence data. The system uses iPlant’s data store and hosting services to scale to thousands of genomes.
For less computer-savvy users, iPlant’s “Discovery Environment” recommends software tools that can tap supercomputer-scale applications without the need for sophisticated software expertise.
The tools help advance the work of “many different types of scientists, teachers and students who otherwise might not communicate with one another,” said Fernando Martinez, director of the UA BIO5 Institute.”
“Doing so creates the kind of multidisciplinary environment necessary to crack the toughest problems in modern biology."
Paul McCloskey is editor-in-chief of GCN. Follow him on Twitter: @Paul_GCN.