Science community building standard data formats
- By Mark Pomerleau
- Jan 16, 2015
In the medical and scientific fields, data and file sharing are critical because they accelerate discoveries and breakthroughs. But research can still be frustrated by the lack of data sharing standards, as is the case in neuroscience.
Unlike common data sharing file formats, such as JPEG for images, the neuroscience field does not have standard formats. To address this situation, the Neurodata without Borders project – a yearlong initiative to create a “unified data format for cellular-based, neurophysiology data based on representative use cases” – hosted a hackathon late last year to brainstorm ideas for standard file formatting.
In addition to the Nerodata without Borders project, other prominent laboratories are working to develop a global portal for scientists and researchers to share information and data without having to download special software.
“This issue of standardizing data formats and sharing files isn’t unique to neuroscience. Many science areas, including the global climate community, have grappled with this,” said Oliver Ruebel, computational scientist at the Lawrence Berkeley National Lab. “Sharing data allows researchers to do larger, more comprehensive studies. This in-turn increases confidence in scientific results and ultimately leads to breakthroughs.”
Standards for neuroscience data have become increasingly important with the Obama administration’s BRAIN Initiative, for instance, which challenged the neuroscience community to discover new ways to address brain diseases and trauma.
This work is expected to generate a deluge of data, according to Berkley Lab, so before researchers can even begin taking measurements, they must first develop a standard format for labeling and organizing data, sharing files, and scaling up software to handle massive amounts of information.
To come up with those conventions, Ruebel worked closely with Berkeley Lab scientists Peter Denes and Kristofer Bouchard and UCSF neurosurgeon Edward Chang to design BrainFormat, a neuroscience data standardization framework. It uses open source Hierarchical Data Format (HDF) technologies, which has helped a variety of scientific disciplines organize and share their data.
In addition to data format standardization, HDF is also optimized to run on supercomputers. So by building BrainFormat on this technology, Berkley Lab said, neuroscientists will be able to use supercomputers to process and analyze their massive datasets.
Mark Pomerleau is a former editorial fellow with GCN and Defense Systems.