What's holding back Hadoop?
- By Troy K. Schneider, Jonathan Lutton
- Apr 24, 2015
Hadoop -- the open-source, distributed programming framework that relies on parallel processing to store and analyze both structured and unstructured data -- has been the talk of big data for several years now. And while a recent survey of IT, business intelligence and data warehousing leaders found that 60 percent will Hadoop in production by 2016, deployment remains a daunting task.
TDWI -- which, like GCN, is owned by 1105 Media -- polled data management professionals in both the public and private sector, who reported that staff expertise and the lack of a clear business case topped their list of barriers to implementation:
Barriers to implementation |
Respondents who checked each category |
Inadequate skills or difficulty of finding skilled staff |
42% |
Lack of compelling business case |
31% |
Lack of business sponsorship |
29% |
Lack of data governance |
29% |
Security for Hadoop data |
29% |
Lack of metadata management |
28% |
Excessive hand coding required of Hadoop |
27% |
Cost of staffing Hadoop admin/development |
25% |
Cost of implementing a new technology |
22% |
Difficulty of architecting big data analytic system |
22% |
Immature support for ANSI-standard SQL |
19% |
Interoperability with existing systems or tools |
19% |
Software tools are few and immature |
19% |
Enterprise-class manageability |
17% |
Not enough information on how to get started |
16% |
Slow pace of hand-coded development |
16% |
Cannot make big data usable for end users |
13% |
Handling data in real time |
13% |
Existing user-defined DW architecture |
12% |
Poor quality of Hadoop data |
11% |
Software tools need higher-level language support |
10% |
Hadoop's high operational expenses |
9% |
Enterprise-class availability |
9% |
Other |
2% |
The respondents did, however, see a wide range of uses to justify the deployment efforts, including:
HDFS applications |
Respondents who checked each category |
Complementary extension of a data warehouse |
46% |
Data exploration and discovery |
46% |
Data staging for data warehousing and data integration |
39% |
Data lake |
36% |
Queryable archive for non-traditional data
|
36% |
Computational platform and sandbox for analytics |
33% |
Enterprise data hub (for both new and traditional data) |
28% |
Business intelligence (reporting, dashboards) |
27% |
Queryable archive for traditional enterprise data |
19% |
Operational data store (ODS) |
17% |
Repository for content, records management |
17% |
Operational application support (apps on Hadoop data) |
11% |
Don't know |
3% |
Other |
1% |
And just 6 percent said Hadoop deployments were not in their organization's plans at all:
When do you expect to have HDFS in production?
- 2012 - 2014
The full report, which also includes best practices and implementation trends, is available here.
About the Authors
Troy K. Schneider is editor-in-chief of FCW and GCN, as well as General Manager of Public Sector 360.
Prior to joining 1105 Media in 2012, Schneider was the New America Foundation’s Director of Media & Technology, and before that was Managing Director for Electronic Publishing at the Atlantic Media Company. The founding editor of NationalJournal.com, Schneider also helped launch the political site PoliticsNow.com in the mid-1990s, and worked on the earliest online efforts of the Los Angeles Times and Newsday. He began his career in print journalism, and has written for a wide range of publications, including The New York Times, WashingtonPost.com, Slate, Politico, National Journal, Governing, and many of the other titles listed above.
Schneider is a graduate of Indiana University, where his emphases were journalism, business and religious studies.
Click here for previous articles by Schneider, or connect with him on Twitter: @troyschneider.
Jonathan Lutton is an FCW editorial fellow. Connect with him at [email protected]