A few parts of the federal government have used Big Data for years to analyze and solve problems, but advances in technology have opened it up to many other agencies. Three long-time players in the field explain what Big Data is, the advantages it brings for government agencies, what resources they need to employ it effectively, and what issues they will need to consider as they develop Big Data programs.
q1 What is Big Data? What’s the difference between it and other technologies such as data warehousing, data mining and business intelligence analysis? View Answers

John Jolly
Vice President and General Manager, Cyber Systems Division, General Dynamics Advanced Information Systems

There are many different assets and devices on the networks collecting massive amounts of data and traditional methods of managing, processing, and correlating at terabyte and exabyte quantities aren’t working any more.

The key difference is that much of this data is unstructured and can’t reside in traditional databases. So, Big Data is all about how we handle this data, how we process it, and how we create information and knowledge out of it in order to drive a business.

Dale Wickizer
Chief Technology Officer, U.S. Public Sector, NetApp

Big Data is a new term describing stuff that’s been in play for a while. The difference is that today, in addition to what people have traditionally thought of as business data, you also have massive amounts of unstructured and machine generated data from social media (e.g., Facebook and Twitter), and observational data from video surveillance and reconnaissance technologies, along with the related sensor information.

In fact, NetApp views Big Data in three broad areas: Analytics, Bandwidth and Content.

But whether you talk about Big Data in terms of volume, variety and velocity, or as transactions, interactions and observations, it is increasingly difficult for people to manage and gain benefit from the data using existing tools. Having to do this while facing constantly changing mission requirements and shrinking budgets is an even bigger challenge.

Tasso Argyros
Co-President, Teradata Aster

It’s more than just large amounts of data, though that’s definitely one component. The more interesting dimension is about the types of data. So Big Data is increasingly about more complex structures and how you go about capturing and analyzing transaction and interaction data. We also see a lot of different data sources such as image, text, voice, machine data and so on, whose structure can change depending on the analysis.

Big Data is also about big analytics. It’s about applying more sophisticated algorithms to get a deeper insight out of the data without compromising on the scale. At the end of the day, the most important thing about the data is what you can do with it. Unlike with other techniques such as BI, Big Data allows people to tap into much larger amounts of data on the fly. It’s about using the data to discover new insights.

q2 Agency executives and CIOs are being pressed to do a lot with IT now, and for at least the foreseeable future are facing budget constraints. Is Big Data real or hype? Why should they consider it a more urgent investment than anything else that’s on their plate? View Answers

John Jolly
Vice President and General Manager, Cyber Systems Division, General Dynamics Advanced Information Systems

Big Data is very much real. We’re seeing rapid growth in both the type and quantity of data and the need for a different class of technology that can be used to correlate it. A concern is how to keep pace with the paradigm shift to Big Data while balancing tight IT budgets. Executives who can postpone this shift can likely do so because they aren’t yet stressing their current technology limits, but many agencies are projecting data growth that places them in the Big Data technology space very soon. Thus, a careful understanding of one’s strategic technology roadmap should parallel investments in the appropriate technologies. While Big Data solutions can introduce new challenges, the efficiencies gained are numerous and the favorable cost of ownership arguments are powerful.

Dale Wickizer
Chief Technology Officer, U.S. Public Sector, NetApp

It’s certainly real as far as the content is concerned. Nobody throws anything away, because they feel it might be needed for trend analysis or for compliance reasons. For years, NetApp has been focused on eliminating physical copies of the production data that naturally build up because of backup, dev/test, archive, etc. We’re also encouraging the use of orchestration and automation to help manage a lot more data with the same headcount.

What is mostly hype right now is the emerging Big Analytics market. The media hype is leading people to believe that Big Data equals Hadoop and that Hadoop solves world hunger. However, the reality is that most of the money is still being spent on traditional analytics tools. That said, new analytics tools, like NoSQL and columnar databases that leverage Hadoop, are showing great promise, but they need to mature.

Tasso Argyros
Co-President, Teradata Aster

We believe Big Data should be a growing area of importance for federal agencies because they are one of the world’s biggest creators and consumers of data, and because what they do directly influences the lives of hundreds of millions of people. The opportunities for using Big Data are just tremendous, because it can help agencies solve some of the key problems government faces today.

For example, health care costs are rising very quickly and a big cause of that is the complexity of health care systems. Big Data can help optimize things in a way that humans alone cannot. It’s also uniquely positioned to help in areas such as fraud and cybersecurity, where traditional techniques of data gathering and analysis are traditionally very reliant on humans. And humans are difficult and expensive to scale.

q3 What impact does cloud computing have on Big Data? Do agencies that are deploying cloud technologies need to take steps to address Big Data? View Answers

John Jolly
Vice President and General Manager, Cyber Systems Division, General Dynamics Advanced Information Systems

First, let’s define cloud computing to exclude storage and virtualization technologies and include computational processing technologies that can process big quantities of data in acceptable amounts of time.

Advances in storage densities and virtualization alone can enable IT consolidation and significant cost savings, but it is cloud computing that is enabling the missions of enterprises who are chartered to process and correlate vast quantities of data. Recent advances in cloud computing have made performance computing affordable and we’re now seeing it play a central role in numerous sectors.

Enterprises that have a reason to process Big Data need to pay attention and plan for how they are going to handle and interact with it, but note that traditional IT issues still apply (e.g., information assurance, identity management, etc.). An interesting side note is that we found that the processing and density solutions associated with Big Data technologies can present significant challenges on existing data centers. Recognizing the need for infrastructure upgrades early on in the planning process is critical.

Dale Wickizer
Chief Technology Officer, U.S. Public Sector, NetApps

Ubiquitous use of virtualization technology is driving cloud adoption, providing agencies with more options than ever before to economically manage their data and application services. If they can get beyond the political and budget challenges, they can either host their services on internal clouds or choose from a variety of external cloud providers. Big Data is really just doing that at a much greater scale, supporting a wider variety of devices and data sources. The two fit well together.

In fact, Big Analytics was born in the cloud. Yahoo! created Hadoop as an open source alternative to the Google File System. Both Google and Yahoo! may already be operating at exabyte scale; Amazon recently reported that it has more than a trillion objects under management through its cloud services.

So the cloud tools are definitely there to do Big Data content and analytics. The question now becomes one of risk, cost, and who has the ultimate responsibility for the data.

Tasso Argyros
Co-President, Teradata Aster

Big Data is a very cloud-friendly space, and some people might even describe Big Data as a cloud technology. Teradata Aster, for example, provides a lot of these cloud capabilities out of the box, and our product can run on either private or public clouds. So I would say that Big Data and the cloud are very complementary.

q4 Tools such as Hadoop are synonymous with Big Data. How mature are they? What other technologies and tools do agencies need? View Answers

John Jolly
Vice President and General Manager, Cyber Systems Division, General Dynamics Advanced Information Systems

The maturity gauge depends on one’s perspective. Some companies are using Hadoop to drive targeted advertising, eBay is using it for search optimization, Twitter is using it to store and process Tweets. To them it is obviously mature enough to be a significant part of their enterprises. In some camps, the use of open source technologies is favored, and in others commercial solutions are favored. One should remember that discussions around the use of Hadoop quickly evolve into discussions of ten to twenty other tools associated with its use, and each also has a maturity level associated with it.

One should remember that at the end of the day, agencies and companies alike need qualified people who understand how to use and develop the tools and how to apply them where they are needed in their architectures to meet their goals.

Dale Wickizer
Chief Technology Officer, U.S. Public Sector, NetApps

Hadoop, like much of the overall emerging Big Data analytics market, is very immature. It has all of the excitement and the terror of an emerging market: Excitement, because of the growing variety of new products; and terror, because no one can predict how many of these companies will be around in 2-3 years.

NetApp is doing a lot of due diligence to sort through these new technologies and, where appropriate, we include them in pre-packaged rack solutions that have gone through integration and performance testing. The goal is to drive out risk and make it easy to procure and consume these newer technologies, while focusing more on the mission.

For Big Analytics, gluing Hadoop to the application layer is today’s big challenge. Right now it takes a team of PhD’s to figure out exactly how to do that. It reminds me of where Unix was 20 years ago. As Hadoop matures, there will be less discussion about Hadoop and more discussion around the applications that either run on top of it or have Hadoop functionality built-in.

Tasso Argyros
Co-President, Teradata Aster

Hadoop is a great technology, but it is very far from being a silver bullet for all Big Data. Just like any other technology it has certain strengths and weaknesses that make it more or less appropriate for certain use cases. So agencies have to make sure they really understand the characteristics of the problem they want to address, and then use Hadoop for the parts it is most appropriate for.

You can use Hadoop as a central store and staging area for the data, for example, and perhaps to do some data refining. But if you want to do iterative analytics and discovery, if you want to allow your business analysts themselves to tap into Big Data from existing BI and reporting tools, Hadoop doesn’t provide the appropriate interfaces to do it directly on the platform, nor does it have the performance and concurrency you need.

For that you need other tools that can layer in front of Hadoop, such as SQL-H™ which we just introduced. With it, the business can directly access the data in Hadoop. They can easily join the Hadoop data with their other data, and use what they know about their business, SQL and their existing tools.

q5 To gain the most advantage from Big Data, agencies will need broad access to government-wide data. True? Given that most agencies traditionally like to keep close control over their data, does this pose a problem? View Answers

John Jolly
Vice President and General Manager, Cyber Systems Division, General Dynamics Advanced Information Systems

It’s partially true. I personally believe that various missions will produce dramatically improved results if the vast amounts of data these agencies have can be better leveraged to provide a single picture, or situational awareness.

Agencies don’t necessarily have to give up control of their data in order to share. Technologies exist, but there needs to be a willingness to use them. I’m optimistic that one of the things that’s going to happen is that agencies will be more willing to share data, because of the advantages they see they can derive from Big Data, advanced analytics and cloud computing.

Dale Wickizer
Chief Technology Officer, U.S. Public Sector, NetApps

True. The greater the access to government-wide data, the better the potential insights and new capabilities for its citizens and leaders.

More money is wasted in the federal government because of “stovepipes of excellence” than anything else, so there has to be a dramatic change there as a first step. Oftentimes what people say are security issues, for example, are really about governance or political issues. They don’t want to share, because they don’t want to lose control. The way budgets are structured reinforces bad behavior.

So, yes, there are barriers, most of which are non-technical. But the technology to securely enable sharing is available.

Tasso Argyros
Co-President, Teradata Aster

This is marginally true, but the good news is that I have seen a lot of initiatives coming from the federal government to open up the data to help build an ecosystem for people to build interesting applications. The Obama administration recently announced that agencies should make data open as a default and provide APIs so that people can use the data.

That will all create many more opportunities for government agencies to do more and be successful with Big Data. That being said, I don’t think agencies need to wait for that vision to become fully real before getting started. Use case opportunities are plentiful today, and most organizations have more than one that’s very important to them and that they can get started with.

q6 What are the security and information assurance implications of Big Data? View Answers

John Jolly
Vice President and General Manager, Cyber Systems Division, General Dynamics Advanced Information Systems

Big Data doesn’t pose issues by itself above and beyond existing concerns, but the underlying technologies that drive it, such as cloud computing and virtualization, add complexities and elements that must be addressed to keep a security posture solid.

But there are also opportunities for improvements, including aggregating the data and putting it into the cloud means you could potentially secure it more efficiently as it is in one location.

In fact, Big Data could be used to help companies address many of the enterprise IT security issues they face today. It can help them understand activity on their networks in near real-time, giving them the ability to make better security decisions.

Dale Wickizer
Chief Technology Officer, U.S. Public Sector, NetApps

That’s a broad question. There’s a tendency with a lot of new technologies to simply wink at the issue of security to bring new capability to market faster. That’s scary. The security in Hadoop is a big question mark right now, but will be resolved over time. On the other hand, with Big Analytics, you can bring more data types together, potentially gaining greater security insight faster, and increase the overall cyber security posture. There’s a tension or trade-off between the two. It will be interesting to see how organizations sort this out.

Tasso Argyros
Co-President, Teradata Aster

Security and privacy are very important for Big Data, which is about capturing transactions and interactions. It’s also about capturing new types of data that could hold important information that may or may not be appropriate to release. What this means is that the technology infrastructure needs to provide the right characteristics so that government organizations can decide which data to open up and which should be secure. The problems about what data should and should not be encrypted don’t go away with Big Data.

q7 What are three illustrative use cases for Big Data from specialized to general purpose? View Answers

John Jolly
Vice President and General Manager, Cyber Systems Division, General Dynamics Advanced Information Systems

Enterprise IT security and network defense is one, and that’s only going to build on Big Data solutions as storage and network traffic explodes. The analysis collected about the network is going to help agencies find targets and threats, helping them better understand where the vulnerabilities are. It is no longer sufficient to simply build a perimeter moat around the kingdom; you must constantly patrol the city streets as well.

Big Data analytics provide extremely powerful capabilities in terms of being able to ingest, store, exploit and disseminate imagery and full motion video. That’s a non-trivial task as it involves being able to handle a lot of large volume data very quickly, and Big Data is well suited to that task.

The third is fusion of disparate data sources. In terms of the data that’s produced by social media alone, there’s a huge amount of this data on the Web. There are many agencies looking at ways to use that data to support their missions and user base. Again, it’s all about bringing data sources together and using Big Data analytics to help agencies make better decisions and take more effective action.

Dale Wickizer
Chief Technology Officer, U.S. Public Sector, NetApps

Look at NOAA and the multiple petabytes of data it pulls down from its satellites. In addition to managing that data, they manage the high bandwidth networks required to move it where it needs to be within NOAA. They augment this data through high performance computing (HPC) modeling and simulation, creating data products and services, which other agencies and millions of people rely on. An example of this is the National Weather Service.

Another example is the Center for Medicare and Medicaid Services (CMS) and the Social Security Administration, who are today running traditional analytics on huge database systems that are used to pay benefits to millions of citizens. They are thinking of complementing those with Hadoop clusters to fight fraud.

Lawrence Livermore National Laboratory works on grand scientific challenges and has the top HPC environment in the world, supporting roughly 1.6 million compute cores, around 1.6 petabytes of main memory and over a terabyte/sec of data bandwidth. It’s a long-time Big Data practitioner.

Tasso Argyros
Co-President, Teradata Aster

The first is health care, which is a huge opportunity for Big Data. The data in health care is complex, and a lot of the analytics that need to happen go well beyond traditional data mining or data warehousing. More importantly, Big Data can significantly reduce costs in health care by identifying fraud, helping patients find the most cost effective and efficient care, and helping institutions understand how best to cut costs.

Fraud by itself has many different use cases. One situation we were involved in had to do with tracking how people would exchange money online, and the customer had to be able to find where money laundering was happening. That was really interesting because we had to produce a graph of the money transfers to determine how the money flowed and to detect where the fraud was happening.

The final use case is cybersecurity, and situations where you have a lot of end points, computers and other devices that communicate with each other. We would be able to understand it also as a graph problem and figure out how the information and message flow was key to both preventing attacks and being able to analyze them. If an end point is infected by a virus, for example, we’d be able to track how that end point was infected, who else may be infected, and if there are any suspicious end points that might be part of a hidden malicious network.

q8 What changes, if any, are needed to standards that deal with such things as metadata? Are any new or different standards needed? View Answers

John Jolly
Vice President and General Manager, Cyber Systems Division, General Dynamics Advanced Information Systems

The transformation to Big Data will yield major informational and operational benefits, but it is beyond individual humans to track in a simple manner. Greater automation of information exchange and broader governance constructs are just the first step to providing robust Big Data solutions.

Standards evolve and new standards are constantly being developed. For example, there are new draft standards for the cloud that are being developed by commercial companies to provide metadata-aware solutions for Big Data. The most important thing for agencies to keep in mind with respect to the Big Data environment is to take a flexible approach to standards and consider what they’re trying to achieve within the enterprise architecture before implementation.

This means that, if current standards are sufficient for their needs, they don’t need to do anything. If they’re not, agencies will need to engage with industry communities that are working on new and different standards as they bring a unique set of dynamic and diverse environments to the landscape. Some agencies have begun to do this and we’re starting to see their requirements being addressed in a broader, holistic manner.

Dale Wickizer
Chief Technology Officer, U.S. Public Sector, NetApps

The challenge is how do you economically archive data stored as objects with self-describing metadata, and how do you manage distributed data repositories of these objects, through a unified metadata layer so that it appears as one big archive to users? There are existing standards out there when it comes to preservation and archiving, such as OAIS and ANSI Z39.50. Perhaps those need to be updated to factor in the new Hadoop and NoSQL technologies.

Some of the standards NetApp has been working on, such as the SNIA Cloud Data Management Interface (CDMI), make it easier to move data from cloud to cloud in an object-based manner. That’s available now. We also have a distributed object storage product called StorageGRID, which allows organizations to store their objects in distributed repositories all around the world using a unique object identifier on them. That also exists today.

The next big thing would be to have a fast analytics metadata layer that could reach down into objects using CDMI, pulling data and metadata from wherever it lives, and do analytics against it to answer questions. While Hadoop may be a part of that, analytics and other applications need to be updated to allow for both object and file access. Perhaps standards could help facilitate some of that effort.

Tasso Argyros
Co-President, Teradata Aster

Standards are important, and metadata is one of the things that has been neglected by the Big Data open source community. But on the other hand it’s a strength of platforms like Teradata Aster that combine Big Data frameworks such as MapReduce with structure, relational stores and metadata layers.

Without a well-defined metadata layer it’s hard to provide access to Big Data to a broad set of business analysts. If there is no standard, you can’t easily enable the adoption of Big Data across the organization because it is too painful for the analyst to define and manage structure in the data. Another thing is that data openness and the ability to properly synchronize data is important for an agency’s ability to take full advantage of the potential of Big Data.

The final thing is that, when it comes to connecting different data and Big Data platforms and you have a portfolio of technologies to deal with, being able to integrate the metadata layers is very important. Teradata Aster is the only Big Data vendor that has a metadata integration technology with Hadoop that makes data exchange from Hadoop to Aster and back much more efficient.

q9 Do current government regulations help or hinder agencies in implementing Big Data programs? What, if anything, needs to be changed or added? View Answers

John Jolly
Vice President and General Manager, Cyber Systems Division, General Dynamics Advanced Information Systems

I think the real issue is the need for a public-private partnership that brings industry and government together to develop and evolve standards that support and facilitate the implementation of Big Data programs.

Current government standards are sufficient for what agencies need to get out of Big Data now. What agencies need to do going forward is, first, follow and adopt these standards as it makes sense for them and, secondly, where they see gaps, to get involved with the private sector in developing new standards to address them.

Dale Wickizer
Chief Technology Officer, U.S. Public Sector, NetApps

The old policies, which are still in place, about how agencies form budgets and perpetuate “stovepipes of excellence” need to stop. Instead, we really need to think about how to incent people to be better stewards of data and to share that data more freely.

Also, we need to remove the mandates that require agencies to use tape for storage. That’s ridiculous. This might sound a bit self-serving from a storage guy who sells disks, but the reality is that when you have massive petabyte tape libraries and every 7-10 years companies come out with new tape drives that can’t read the old format, you are caught in a loop of perpetual and very costly tape migrations.

Increasingly, people expect their data to be online all the time for quick access. Today, you can’t recover petabytes of tape data in any reasonable amount of time, let alone quickly comb through that data to support time critical analytics or trending.

Tasso Argyros
Co-President, Teradata Aster

What we’ve seen recently is the federal government moving aggressively to enable Big Data, with recently some $200 million of funding announced for fundamental research for developing Big Data technology. I also believe it would be very beneficial to have legislation that would make it easier, and perhaps mandatory for certain agencies, to make some or all of their data available publically. We need some cover, both political and legal, that would make it easier and less risky for agencies to release Big Data use cases and applications, and also to create an ecosystem for non-government people to use Big Data.

q10 What impact will Big Data initiatives have on current IT and data architectures? Will they pose a challenge to the way agencies currently store and manage data? View Answers

John Jolly
Vice President and General Manager, Cyber Systems Division, General Dynamics Advanced Information Systems

Big Data initiatives will have a substantial impact. You’ll continue to see the trend of data migrating to the cloud and agencies will continue to look at new architectures that enable them to do this. They’ll also be looking at the security they need to defend and secure the data.

It’s also going to drive changes in those organizational barriers that prevent the sharing of data both within and between organizations. Plus, it will help shift system architectures within the federal government toward more open standards and APIs. And, because many agencies still have to deal with legacy technologies, this transformation is going to help them move to more open architectures and standards in a much more cost-efficient way. I’m optimistic that Big Data initiatives will create value for agencies.

Dale Wickizer
Chief Technology Officer, U.S. Public Sector, NetApps

I think they’re already seeing the stress in the infrastructures they have now. The biggest stress point, however, is the people. We’re talking about the volumes of data growing by a factor of 50 in the next ten years. Over that same timeframe, network bandwidth will only grow by a factor of 10 and agency IT headcount by just 50 percent.

Government agencies that have been avoiding disciplined operations (e.g., ITSM) and figuring out how to provide higher degrees of automation are going to have to step it up. They’ll soon be in the situation where they’re not going to be able to manage the orders of magnitude more data per headcount than they handle today. The only way to do that is through strong, repeatable processes and low- or zero-touch automation.

Tasso Argyros
Co-President, Teradata Aster

Big Data done right is an extension of existing architectures. A big trap for organizations, not only government, is to think that with Big Data everything has to change. You could do that. You could change everything and have a completely new architecture with Big Data, but the cost would be enormous. So, the key to being successful with Big Data is to see it as an extension of the existing infrastructure.

But that does mean you have to understand how Big Data technologies work with existing technologies and skill sets, and that you are working with technology platforms that have the capability of connecting those skill sets with the existing parts of the ecosystem.

q11 Big Data implies integrating data with different origins, formats, and standards. What technologies and techniques are in place for overcoming those irregularities and making that data usable and accessible? View Answers

John Jolly
Vice President and General Manager, Cyber Systems Division, General Dynamics Advanced Information Systems

There are a large number of technologies available, everything from enterprise services to platform-as-a-service. There are numerous cases where agencies have shared data in very different formats that have been used to create unique and valuable insights.

Hadoop is designed to cope with massive amounts of unstructured data. The very heart of that technology is designed to handle irregularities that help make the data more usable and accessible to analytics, which would not have been the case just a few years ago when everything had to be stored in traditional databases.

Technologies that enable the co-mingling of structured and unstructured data are the technological underpinning of numerous Big Data advances to date, and I have been astonished at how fast and how nimble the open source community has been in developing them. Innovation is very much alive and well.

Dale Wickizer
Chief Technology Officer, U.S. Public Sector, NetApps

That’s the power of Hadoop and some of the NoSQL databases. They’re not locked into dealing with just structured data, they don’t have to go through the fairly extensive extraction, translation and load batch-like process you use with that. They effectively do that process as you use the data, and increasingly there’s a focus on near real-time analytics.

Over time there’s going to be a blend of structured and unstructured data that people will work with, and so there’ll need to be a way to handle that. Companies that come from a traditional analytics background are doing a lot of work on that now. Eventually, unstructured data will dominate, comprising more than 80% of the data.

Tasso Argyros
Co-President, Teradata Aster

The great thing about Big Data is that it enables a paradigm that you don’t have to spend too much time formatting a structure in the data when you load. From that perspective, MapReduce is a great framework because it can ingest unstructured data at one end and produce structured information at the other.

Teradata Aster, for example, uses MapReduce so it can store unstructured data and transform that data to have more structure, and then we also have SQL and the structured query capabilities to open up the data to a lot of business analysts and business intelligence tools. With this capability, Teradata Aster can combine Big Data with existing, traditional structured data to improve overall insights for agencies.

So you really need hybrid technologies that combine both Big Data frameworks like MapReduce that are pointed at the unstructured datasets, as well as more structured frameworks like SQL that can allow existing skill set and BI tools to be adopted into Big Data.

q12 To what extent will the analytic promise of Big Data be a catalyst for related agencies to link programs or seek greater shared services opportunities? View Answers

John Jolly
Vice President and General Manager, Cyber Systems Division, General Dynamics Advanced Information Systems

Big Data analytics can dramatically improve situational awareness for numerous parts of the government. I believe the cat is out of the bag on the power of Big Data, and more and more agencies will participate in the transformation over time.

While agencies won’t be required to share more because of Big Data, the analytical promise will be a strong catalyst to encourage sharing, especially when there is a commonality of need and vision.

Dale Wickizer
Chief Technology Officer, U.S. Public Sector, NetApps

I think you already are seeing that. With the announcement out of the White House of major R&D funding for Big Data initiatives, they’re trying to push that very thing right now. Various agencies in the government are coming together on Big Data because they can see the tsunami of data heading their way, and they’re looking at the reality of their budgets and realize that they can’t handle this individually.

Take CMS, for example, and what they are doing with Hadoop. That involves a measure of HPC type of behavior, which is outside of CMS’ comfort zone. So, they are collaborating with the Oak Ridge National Laboratory, which has been doing HPC for a long time, to get an education. And that’s very encouraging.

Tasso Argyros
Co-President, Teradata Aster

I think Big Data will drive a lot of that. For example, there is a lot of opportunity for tackling human services problems, to make sure the right service is delivered effectively to everyone or to link services that are similar like Social Security and Medicare. So the trend to data openness in that way may actually enable some service areas and service collaboration that didn’t exist before.

This will be a process and it will take time, but there is already low-hanging fruit that people can get started with. A lot of health care data is already being made available by the federal government and other agencies that everyone can use to build integrated solutions for public use. There are innovative services today, for example, that help you self-diagnose a problem and then figure out what is the right medical facility to get the most appropriate care.

All of this would not be possible without Big Data. So I definitely think Big Data is driving a trend where data openness will create front-end services that will hide some of that complexity. That alone will force services to be more integrated.

q13 If agencies hire companies to help them with Big Data programs, what should they look for in terms of what those companies need to provide? View Answers

John Jolly
Vice President and General Manager, Cyber Systems Division, General Dynamics Advanced Information Systems

I cannot emphasize enough that agencies should look for skilled analysts and developers who not only understand and are knowledgeable of the technology and tools relevant to the space of the agency, but who also have an intimate understanding of an agency and its unique mission and goals. This level of understanding enables them to integrate the right capabilities to tailor the appropriate architecture for better clarity and insight. The wrong solution can be more expensive or damaging than doing nothing at all.

The second is they really need someone who takes a strong enterprise architecture approach to the problem, so that you’re not just chasing a shiny Big Data box and installing that on a network. Instead, agencies need to step back and look at the big picture, including the size of the agency and the requirements that have to be defined to meet user needs, at the beginning of the process so they end up with a solution that fits and drives the mission.

There are many vendors already in the Big Data market that have some very unique and valuable products, but many of them only solve a portion of the problem. That’s the shiny object problem – buying and implementing the shiny object as part of their enterprise architecture and processes without stepping back and looking at the long-term implications. When it doesn’t do exactly what they wanted it to do, they’ll have to live with it or go get the help of an experienced vendor. So why not do that at the beginning?

Dale Wickizer
Chief Technology Officer, U.S. Public Sector, NetApps

First and foremost, they need to look for a track record of execution excellence so they can know that the guys they are thinking of hiring know what they’re doing. Ask for references, and check them! For example, with Hadoop there aren’t a lot of people with the necessary expertise, who know how to tie it into the analytics layer. The good news is that there’s recognition of the shortfall in these data scientists and people are trying to grow them. But, so far, there are only a handful of companies who are really good at Big Analytics.

The second consideration is whether the company has the ability to scale to meet demand. If I’m one of your 15 major customers, is work on my stuff going to be delayed because you don’t have the resources? Once again, the companies who are good at Big Analytics now tend to be on the small side and there aren’t that many of them. If demand ramps up quickly, they could have problems delivering on schedule.

Tasso Argyros
Co-President, Teradata Aster

First, there are a lot of point solutions in Big Data today. That’s natural because it’s a new area, but point solutions can mean agencies will spend much more money than they need to and not get the full benefit of Big Data. So they should take some time to find out who can provide the full Big Data solution rather than just point solutions.

Second, look for companies that have opened themselves up fully to the Big Data business. There are technologies today, especially in the open source space, which are great for organizations that are just comprised of engineers. But it’s not trivial for those technologies to enable the business side of an organization to directly ask Big Data questions.

And finally, who can take the Big Data hype and translate it down to business use cases and provide a realistic return on investment and the timelines to ensure the project is successful.

q14 How different are Big Data skills from those that agencies employ for other IT needs? How can agencies compete for people with these skills, given the projected shortage of those people? View Answers

John Jolly
Vice President and General Manager, Cyber Systems Division, General Dynamics Advanced Information Systems

The skills needed are significantly different. They typically require a greater degree of mathematical expertise than is traditionally found in most IT organizations. However, the skills can be developed over time with the right training and evolution of the workforce.

Areas of government where you can find these skills are in such specialties as high performance computing and grid computing. People with that experience would help agencies ease the transition to Big Data.

Dale Wickizer
Chief Technology Officer, U.S. Public Sector, NetApps

Big Data is old hat to those agencies that have been involved in HPC. They have had years of experience and expertise in scheduling and running jobs at scale, across a large, shared infrastructure. I would say the same is true with respect to large content repositories, like the Library of Congress, NOAA, NASA, NGA, etc.

But you’re right. Those skillsets are different from those commonly used in enterprise IT.

The skills are out there if people are humble enough to go and ask for the help, but even those resources are limited. So, don’t expect to just outsource your way out of your predicament.

For example, HPC is a relatively small market of less than $100 billion, compared to the $2 trillion enterprise market. Asking advice from these organizations is one thing, but trying to convince them to handle your Big Data for you is another matter, as they would quickly be overwhelmed.

Tasso Argyros
Co-President, Teradata Aster

It’s definitely true that, if you look at places like Google and Yahoo, they use pure MapReduce open source technology. They’re investing enormous amounts of money to hire and retain talent. However, I think this is a technology problem. With the right technology, which you can use along with open source products, then 90 percent of the skills you need should already exist in most organizations. The rest of the skill sets are more those of data scientists, and these are select people that the agencies can attract since the overall numbers are not too high.

However, if an agency tries to use purely open source technology, the pyramid is inverted. Now 90 percent of the skills for Big Data are new skills and only 10 percent are reusable skills that the agency already has. And attracting so much talent so quickly is just impossible. So it’s more about using the right Big Data architecture and the right technologies to ensure that Big Data can happen with existing skill sets.

q15 What does the future hold for the evolution of Big Data? What do you expect the situation to be both a year and five years from now? View Answers

John Jolly
Vice President and General Manager, Cyber Systems Division, General Dynamics Advanced Information Systems

The future is very bright. What we’re seeing today is a paradigm shift from how you deal with data and data storage to how you use data as an enterprise asset. Using Big Data analytics to help identify value in that data will help accelerate that shift. So, in much the same way that agencies and businesses evolved from having folders of digital documents to enterprise databases and data warehouses, agencies will move from struggling with Big Data problems to creating actual solutions that revolve around Big Data in the next five years.

The key to that is continued maturity and ease of deployment, as well as further development of commodity solutions by the market that can be tailored to agency architectures. While agency CIOs are under continuous budget pressures, they also have much more data and they’ll be increasingly pushed to identify how they’re going to deliver efficiencies in processing and leveraging that data.

In five years, Big Data will be another enterprise asset; a tool that agencies can use to derive value out of data that will help them deliver their mission requirements.

Dale Wickizer
Chief Technology Officer, U.S. Public Sector, NetApps

In terms of capacity, we are talking about organizations today handling terabytes to petabytes of data. NetApp has many customers today who manage more than 50-100 petabytes. We have one customer who already manages about an exabyte of data, but those are the exception.

Five years from now organizations will more routinely manage in the petabyte to exabyte range. Ten years from now, if the predictions on data growth of a 50-fold increase in 10 years hold true, organizations in the multi-exabyte range will be common. What that means to organizations with respect to storage management is pretty staggering.

But the size of the data is just one of the challenges. The one that really concerns me is what’s needed in terms of investment in network bandwidth to move all that data around. Network bandwidth is only growing by 10 times over that same time period. There may be a tendency toward more distributed data centers, which flies in the face of the Federal Data Center Consolidation Initiative. How will agencies handle that?

With budgets shrinking, agencies are going to have to think outside of the box and work smarter as well as harder. I sometimes wonder whether the number one “killer app” for Big Data analytics might not be one that turns Big Data into Little Data. That is, it provides agencies sufficient visibility into the value of the data they have, so they can safely weed out and throw away 90 percent of stuff that’s of little value to their main mission.

I know what you’re thinking: Impossible! But, hey, I can dream can’t I?

Tasso Argyros
Co-President, Teradata Aster

I think a year from now the market will have a more mature perspective about Big Data. Instead of having open source technologies being overhyped, you will see people coming to the realization that it takes a portfolio of Big Data technologies to be successful.

Five years from now I expect the technologies themselves will have matured. I think people will have moved on from thinking about technology and more about the business use cases and total cost of ownership. I also expect Big Data to be very, very widespread with the majority of government organizations relying on Big Data to drive a lot of their cost cutting, service enhancements and anti-fraud initiatives.

I also think Big Data will become more specialized. Just like cloud computing a few years ago was a very generic term, and now you have public and private clouds, we’re talking about how virtualization fits into all of that. Similarly, I think Big Data will have broken up into subcategories that people will be able to use depending on their needs, such as technologies that can store data in certain ways and maybe refine it or preprocess it, or where you need certain levels of performance.