Unleash the data scientists: How government can use big data
- By Paul McCloskey
- Feb 08, 2012
At a 2009 workshop on data integration sponsored by the National Research Council, a scientist said advances in biomedical research were increasingly being driven by patterns in data rather than by a new understanding of biological processes.
More than two years later, this might seem like a quaint notion. “Of course science is data driven,” we might say today. But it does acknowledge an important shift: Scientists are now more likely to make the next game-changing medical discovery by finding connections between data than through traditional laboratory research methods.
But despite the promises of data integration, being able to work easily with large datasets is currently a difficult proposition. In scientific research circles, for instance, it’s often a challenge to even confirm the existence of data to correlate.
Fast science: DOE turns on its 100 gigabit data network
Who won in Iowa, Facebook, Twitter or the polls?
“Data discovery all too often depends on word of mouth,” even among researchers working in the same field, the NRC said. When a health-care researcher wants to match data from another field — say, sociological data — the odds against finding relevant data are even higher.
In fact, the management of research and other types of large datasets is in its infancy. As noted in our story in this issue, only recently have tools become available to exploit big data. Innovative tools such as Apache Hadoop are making it possible to store and process very large distributed datasets economically. And schemes for improving metadata and transparency of large datasets are improving.
But government should not wait until such tools become more mature or robust to develop productive big data strategies. Without such a strategy, government agencies risk falling behind data production rates that have hit petabye levels for many projects, especially those focused on scientific, geospatial, astronomical and climate research.
Part of that strategy should take into account the human capital requirements of the coming era of big data. Our story notes the emergence of a new breed of “data scientists,” people uniquely equipped to exploit the scientific and economic potential of well-managed big data analytics. Gartner Group’s Anne Lapkin calls them a triple threat, combining a rare set of expertise in statistical analysis, technology and business modeling. “It’s an entirely new skill set,” she said.
According to a recent survey by EMC Corp., data scientists are in fact a new breed. They are twice as likely as a traditional business intelligence analyst to apply advanced algorithms to data and 37 percent more likely to make business decisions based on that data. This is a group especially trained to nurture data-driven organizations, the firm said.
If so, it’s a group that government agencies would do well to invest in now. Although, strictly speaking, data analytics is not new to most agencies — Congress acknowledged its importance nearly 20 years ago when it passed the Government Performance and Results Act — times have changed. Today’s analytical tools are high-performance technologies, and they just might help agencies support the next medical game changer or cure a true big data monster: the federal budget deficit. In view of such challenges, data scientists should have a seat at the CIO's table.
Paul McCloskey is senior editor of GCN. A former editor-in-chief of both GCN and FCW, McCloskey was part of Federal Computer Week's founding editorial staff.