NCI launches cloud-based cancer genomics data platform
- By Chase Gunter
- Jun 08, 2016
The National Cancer Institute announced the opening of the Genomic Data Commons, a publicly accessible database that allows researchers to access, analyze and upload genomic data to advance cancer research.
The GDC, built and managed by the University of Chicago Center for Data Intensive Science, harmonizes cancer data from 12,000 patient records and makes it available online for researchers. The two petabytes of data come from earlier NCI programs: the Cancer Genome Atlas and the Therapeutically Applicable Research to Generate Effective Therapies.
The platform's web-based aggregation of records allows anyone to easily search, access and filter the data, and NCI hopes to jumpstart the crowdsourcing of medical research by encouraging researchers to upload their own findings for analysis.
"Increasing the pool of researchers who can access data and decreasing the time it takes for them to review and find new patterns in that data is critical to speeding up development of lifesaving treatments for patients," said Vice President Joe Biden, who was on hand for the GDC opening.
In the past, downloading and navigating such a massive trove of genomic data would have been nearly impossible, director of NCI's Center for Cancer Genomics Dr. Louis Staudt told reporters.
"The data has been available, but was very, very cumbersome to get," Staudt said. "To download all of the data from the cancer genome atlas would take 3 weeks of continuous download [and] require $1 million of software, and a team of people to ensure privacy.… Only very well-funded and well-positioned researchers were able to access the data."
However, Staudt said that by moving to a cloud-based architecture in which large-scale computations take place, public access and global participation are unlocked. And NCI anticipates even greater cloud interoperability in the future.
"This is currently a private cloud operating at the University of Chicago that can interoperate with the Amazon cloud and will later interoperate with" a variety of other public cloud platforms that will include Google and Microsoft, Staudt continued. "This is just the beginning of that."
NCI wants researchers to "take advantage of the software we've built from these genomes… and share their data with the world," said Dr. Warren Kibbe, director of NCI's Center for Biomedical Informatics and Information Technology.
Kibbe told reporters that the GDC is "the basement level of a large effort" from NCI to assemble a comprehensive catalog of cancer patients' medical records "in a computable environment that researchers around the world can have access to."
The data will remain in raw form, meaning researchers will be able to analyze the information as new computational technologies and methods arise.
Kibbe said the goal is to collect data from 100,000 patients to create a substantive sample size. "GDC is one step from turning that into a reality," he continued. "It's very unlike anything we've had before… We can put their data together with all the other publicly available data that's been produced from cancer patients."
The GDC builds on the Obama administration's previous actions to individualize health care, namely the Cancer Moonshot and Precision Medicine Initiative. According to the National Institutes of Health, the GDC is a part of PMI's $70 million fund for NCI to research cancer genomics.
This article first ran on FCW, a sister site to GCN.
Chase Gunter is a staff writer covering civilian agencies, workforce issues, health IT, open data and innovation.
Prior to joining FCW, Gunter reported for the C-Ville Weekly in Charlottesville, Va., and served as a college sports beat writer for the South Boston (Va.) News and Record. He started at FCW as an editorial fellow before joining the team full-time as a reporter.
Gunter is a graduate of the University of Virginia, where his emphases were English, history and media studies.
Click here for previous articles by Gunter, or connect with him on Twitter: @WChaseGunter