When an outbreak hits, Magellan speeds up genetic sequencing
- By Henry Kenyon
- Oct 24, 2011
For several years, scientists at the Argonne National Laboratory have been using an automated, software-based system to sequence the genomes of bacteria and microbes. But when that system recently became overwhelmed by a surge in demand, researchers turned to a cloud-based system that virtualized the process, allowing them to sequence hundreds of genomes within hours.
Argonne’s Rapid Annotation using Subsystems Technology program was developed in 2007 to automate the laborious process of making sense of an organism’s genome.
DOE explores cloud computing for big science
Magellan explores the cloud as a research tool
RAST matches segments of the genetic code of a new bacterium or protein against a catalog of sequenced genetic material. The system’s final result is an annotated genome with a list of an organism’s probable genes and proteins.
A human scientist must still verify this final part, but the entire process can be completed in hours rather than the months — or even years — it once took, said Ross Overbeek, an Argonne computer scientist who was involved in RAST’s design.
But the system wasn’t made to handle a sudden surge in capacity. Designed to process 60 to 80 genomes a day, in June 2010 it was hit with an enormous spike in requests resulting from an E. coli outbreak in German hospitals that was resistant to many existing treatments.
Use of the RAST system is free and open to any scientist, but the system was suddenly hit by demands of up to 200 genomes an hour as researchers sought to pin down and characterize the exact strain of E. coli.
The RAST team turned to the Magellan framework, a cloud computing project managed by the Energy Department, which was designed to boost research by providing additional servers and virtualization tools for scientific research. RAST was duplicated on Magellan, which greatly increased the power and capability of the system.
There are two versions of RAST. The current version uses the Magellan cloud framework, Overbeek said. RAST’s gene sequencing has increased from 100 to 200 genomes a month to more than 1,000 — with surges as high as 450 a day. However, he added that sequencing demand is elastic and usually falls in the range of 1,000 to 1,500 genomes per month.
But rapid response and processing are essential; otherwise, the entire system backs up. By moving to Magellan, the laboratory can reasonably handle such surges, Overbeek said.
The changes in the technology are allowing researchers to sequence the genomes of almost every known pathogen group. The ability to sequence a genome in a matter of hours is a major change, Overbeek said. “I’m excited by the fact that you can have access to an annotated genome in hours,” he said. “This used to take a year to do back in the 1990s.”
Advances in the technology also allow scientists to process batches of related genomes, a process that Overbeek encourages scientists to pursue. “I would like people to think that sequencing is essentially free,” he said.
For microbial genomes, the technology allows researchers to sequence a disease’s genes almost immediately after an outbreak has occurred and pass the data on to other health care research facilities around the world. “This is something that wouldn’t have happened two or three years ago,” he said.
The technology for automated gene sequencing is advancing so quickly, Overbeek said, that in the not-too-distant future, he foresees hospitals being able to do their own gene sequencing. “The way hospitals deal with outbreaks or even bacteria is going to change over the next few years,” he said.
A second, updated version of the RAST software is now in use, and there is a laptop version that is being beta tested. The beta version of RAST can annotate a genome within 10 minutes and is capable of running on a handheld device such as a smart phone or tablet computer.