NIH funds new tools to crack genomic big data

NIH funds new tools to crack genomic big data

As part of its Big Data to Knowledge Initiative, the National Institutes of Health recently awarded several grants to the biomedical research community for development of software  tools to handle data compression, data visualization, data provenance and data wrangling. NIH is dividing a total of $6.5 million among 15 winning recipient programs in this fiscal year.

Timely access to genomics data is critical to health care research because tracking genomic-based changes helps identify what is predisposing patients to certain diseases and the responses that treatments and therapies are generating.

Because the data associated with genomics is enabled by large quantities of DNA sequencing (expected to grow in the near future), it is of “paramount importance,” according to NIH, to find ways to efficiently, accurately and quickly compress data and to recognize techniques for sharing, accessing, visualizing and searching variously formatted genomic data.

The awards fell into four categories:

Data compression, which becomes more important as digital imaging increases and storage and compute capabilities are pushed to the limit.

Data provenance, which tracks the creation, modification and movement of data during analysis.  New provenance tools help researchers better understand the methods used by others for a particular experiment and to compute quality and trustworthiness scores for data.

Data visualization research, which helps researchers derive new insights by visualizing different data types from across multiple studies.

Data wrangling, or the use of automated tools to convert or map data across different forms to make it more accessible to a variety of applications.

Overall, the research program’s delivery of accessible high-performance software suites for managing  genomic data will help to manipulate, transfer and access massive datasets used by governmental and NIH-sponsored projects.

Better compression software will reduce the cost of data storage and analysis, and more effective sharing tools with improve researchers’ accessibility to complex data. Additionally, by requiring that these tools be open-source, NIH said, these awards open the door to future innovations and improvements based upon the initial developments.

About the Author

Amanda Ziadeh is a former reporter/producer for GCN.


  • Records management: Look beyond the NARA mandates

    Pandemic tests electronic records management

    Between the rush enable more virtual collaboration, stalled digitization of archived records and managing records that reside in datasets, records management executives are sorting through new challenges.

  • boy learning at home (Travelpixs/

    Tucson’s community wireless bridges the digital divide

    The city built cell sites at government-owned facilities such as fire departments and libraries that were already connected to Tucson’s existing fiber backbone.

Stay Connected