big data

Big data = big exposure. What can you do about it?

Agencies are looking to “big data” to help solve some of the most pressing challenges facing government today. But big data often brings its own challenges in the form of IT and information security concerns. Too often, agencies approach big data as if it were an expansion of or significant increase in their database capability. Yet the term “big data” means much more than just a large database; it encompasses new tools, technologies, and deployment and operational methods. It is usually inextricably part of cloud computing and virtualization strategies. From an information security perspective, big data can mean “big exposure” to risk if approached solely from a traditional IT perspective.

Similar to traditional it approaches

While an authoritative definition of big data is debatable, the following proposed by Forrester’s Mike Gualtieri, is one that IT security professionals can easily grasp, given that their mission has traditionally been focused on how data is processed, stored and transmitted:

Big data is the frontier of a firm’s ability to store, process, and access (SPA) all the data it needs to operate effectively, make decisions, reduce risks, and serve customers.”

Certain aspects of big data include traditional IT approaches with traditional challenges that do not require an entirely new perspective. In fact, many agencies already have the foundation laid for developing an approach to big data security. That foundation includes mature processes for cloud computing, continuous monitoring and Federal Information Security Management Act compliance. For example, as agencies optimize their continuous monitoring capabilities, they can utilize existing tools that support big data, including vulnerability management and patching services. While these capabilities are all necessary first steps to approaching big data security, a new perspective is in fact required when considering the differences between big data and the large data processing and storage of the past.

The differences

Big data consists of many new technologies, tools and practices (Hadoop, NoSQL, Pig, Hive, HBase, etc.) as well as data warehousing strategies, many of which are new to the security professional and create a complex operating environment.

The following examples represent some of the complexities that are non-traditional causes for concern from both a security perspective and an IT governance perspective:

Database structure. Although most traditional database vendors support big data, they operate as SQL-based or another type of relational structure. Hadoop and other next-generation databases are designed for unstructured data.  

Scalability. While most structured database systems are designed to “scale up” based on the size of the host machine, next-generation technologies are often designed to “scale out,” or cluster. Instead of having a single large database server, an agency may have 500 smaller systems operating together as a cluster. Some of these systems could be virtual, some physical, and some in the cloud.

Configuration management. Traditionally, FISMA (through FIPS-200) has required agencies to develop robust configuration management plans, develop configuration and change management boards, and ensure that security impact analysis is performed as part of system changes. When working with big data, mature and robust configuration and change management is a must.

Cost. Since new nodes could be spun up in almost any cloud provider’s environment, or even on additional desktops within an agency, tight control over IT resources and spending must be in place.

Operations. Who is responsible for patching? Who is responsible for vulnerability scanning? What happens if the software has a vulnerability and there is no vendor to contact for support? Ensuring even basic maintenance of operations and allocating additional resources merit rigor in the decision-making process. With many big data platforms capable of utilizing cloud services out of the box, the security team must be aware of any changes being performed as part of the system lifecycle.

Big data still relies on the same IT infrastructure as systems did in the past but can greatly expand and complicate it. New software, such as Hadoop, lacks mature security models, assessment techniques and automated tools. This means security teams will need to rely largely on an array of operational and managerial techniques — including segmentation and robust, auditable access controls — to help ensure big data does not become “big exposure.” Security teams must look at big data from a holistic perspective of protecting the infrastructure and operating system, applying as much automation and existing policy as possible.

By applying the existing approaches under FISMA with mature change and configuration management processes, agencies can begin to securely leverage the power of big data. Security teams will need to become more integrated and involved in the lives of data scientists and business units to understand how they are operating and where they need support. While big data is new to many agencies, the principles in protecting information and bringing mature management to an operation often is not. Agencies should leverage their existing operational and managerial controls to protect new technologies while automated tools are developed to add further rigor, maturity and automation.


  • Records management: Look beyond the NARA mandates

    Pandemic tests electronic records management

    Between the rush enable more virtual collaboration, stalled digitization of archived records and managing records that reside in datasets, records management executives are sorting through new challenges.

  • boy learning at home (Travelpixs/

    Tucson’s community wireless bridges the digital divide

    The city built cell sites at government-owned facilities such as fire departments and libraries that were already connected to Tucson’s existing fiber backbone.

Stay Connected