How big data tool experience tracks with tech salaries

Big data pays big money.

That’s the conclusion of a couple of recent surveys that found that data analysts and engineers with big data chops are earning more than $120,000, compared with the reported average IT salary of $89,450. And Storm and Spark users can pull down $150,000, compared to the median total salary of all data analysts and engineers of $98,000.

"Big data made a big showing last year and we're seeing it this year too," said Shravan Goli, president of tech job consultancy, in a statement. "Tech professionals who analyze and mine information in a way that makes an impact on overall business goals have proven to be incredibly valuable to companies. The proof is in the pay."

And while that is surely good news for data scientists in the financial and marketing sectors, government agencies are getting pinched. Like private-sector enterprises, they see there are insights and efficiencies to be had through analysis of big data, but agencies can’t compete on salary.

The median total salary of government data analysts and engineers was significantly lower– by approximately $17,000 – than median salaries earned by data analysts and engineers across other industries, according to a recent salary survey by O’Reilly Media, which also analyzed the tools used by data professionals.  Unsurprisingly, respondents who work for government vendors reported higher salaries.

Other contributing salary factors included age, gender, years in the field, employee level, degrees held and usage of cloud technology.  Among O’Reilly’s findings:

  • Every year of age added $1,100, with an additional $1,400 for every year of experience working in data.
  • Women earned a median of $13,000 less than men, a number consistent with the general U.S. population.
  • Those with doctorates earned $11,000 more, and every position increase added an average $10,000 to salary.
  • Those using cloud technology earned $13,000 more than those who didn’t.

High-end, high pay

Data engineers who have experience with  Storm and Spark earn the highest median salaries, according to O’Reilly.

Apache Storm is a distributed, fault tolerant, real-time computation system for processing large volumes of high-velocity data. Its speed makes it useful for real-time analytics, machine learning and continuous computation.

Apache Spark is a big data processing framework that improves traditional Hadoop-based analytics. It uses in-memory primitives and other enhanced technologies to outperform MapReduce and offers more computational options, with tool libraries for enhanced SQL querying, streaming data analytics, machine learning and more.

Other high-salary tools were IBM’s Netezza, Cassandra, Amazon Elastic MapReduce, Homegrown (avt), Pig, Hortonworks, Teradata and Hbase (all with median salaries over $130,000).

The more tools a data professional used, the higher the salary, with those using up to 10 tools earning a median salary of $82,000 rising to $110,000 for those using 11 to 20 tools and $143k for those using more than 20.

The tools most typically used by respondents were programming languages, databases,  Hadoop  distributions,  visualization  applications,  business intelligence  programs,  operating  systems or  statistical  packages.

Aside from operating systems, SQL was the most commonly used tool, with R and Python closely behind Excel. Over 50 percent of respondents used these four top data tools, followed by Java and JavaScript with 32 percent and 29 percent respectively. MySQL was the most popular database, closely followed by Microsoft SQL Server.

The study also looked at tools commonly used together and tried to determine the relationship between tool clusters and salaries.

These clusters were:

  • Cluster 1: Windows; C#; SPSS; Visual Basic, VBA; SQL; Business Objects; Oracle BI; PowerPivot; Excel; Oracle; SAS; Microstrategy; MS SQL Server.
  • Cluster 2: Linux; Java; Redis; Hive; Amazon; ElasticMapReduce (EMR); MongoDB; Homegrown ML Tools; Storm; Cloudera; Apache Hadoop; Hortonworks; Spark, MapR; Cassandra; Hbase; Pentaho; Mahout; Splunk; Scala; Pig.
  • Cluster 3: Python; R; Matlab; Natural Language/Text Processing; Continuum Analytics (NumPy + SciPy); Network/Social Graph; libsym; Weka.
  • Cluster 4: Mac OS X; JavaScript; MySQL; PostgreSQL; D3; Ruby; Google Chart Tools/Image API; SQLite.
  • Cluster 5: Unix; C++; Perl; C.

After discarding clusters 4 and 5 because they were not significant indicators of salary, O’Reilly determined that users of Cluster 2 and 3 tools earn more, with each tool from Cluster 2 contributing $1,645 to the expected total salary and each tool from Cluster 3 contributing $1,900.

The report confirms trends that have been evolving for some time: Hadoop is on the rise, cloud-based data services are important and those who know how to use the advanced, recently developed tools of big data typically earn high salaries.  

“For  future  research  we  would  like  to  drill  down  into  more  detail about the actual roles, tasks, and goals of data scientists, data engineers,  and  other  people  operating  in  the  data  space.  After all,  an individual’s contribution – and thus his salary – is not just a function of demographics, level/position, and tool use, but also of what he actually does at his organization,” noted John King and Roger Magoulas, writers of the report.

About the Author

Kathleen Hickey is a freelance writer for GCN.


  • Records management: Look beyond the NARA mandates

    Pandemic tests electronic records management

    Between the rush enable more virtual collaboration, stalled digitization of archived records and managing records that reside in datasets, records management executives are sorting through new challenges.

  • boy learning at home (Travelpixs/

    Tucson’s community wireless bridges the digital divide

    The city built cell sites at government-owned facilities such as fire departments and libraries that were already connected to Tucson’s existing fiber backbone.

Stay Connected