When data grows old: An agency model to optimize aging data
- By Greg Gardner
- Dec 04, 2015
In the public sector, we often talk about the impact of aging IT infrastructure. Replacing legacy systems is costly and time consuming. As a result, CIOs are constantly challenged to identify and implement the optimal path to modernization.
Lost in that discussion, however, is the fact that data ages differently than the infrastructure upon which it is stored. Attention is often given to the volume, source and type of data, but less consideration goes to when the data was created and how long agencies will require ready access to it.
Data is not ageless. This fact has significant implications for how agencies secure and access their data. Taking a life cycle approach to data management and embracing several key tenets opens the door to a much more efficient approach that leverages a variety of widely accepted secure storage technologies.
A data life cycle management model
Let’s consider a major government function, perhaps a personnel management or logistics application. Life cycle data that underpins that application, from its creation to the point at which it can either be deleted or placed in long-term storage, can be illustrated with the simple model below.
At point 1, during the development and testing stage, sample data is best stored in inexpensive infrastructures like those offered by cloud service providers. Once the application goes live at point 2, the organization must invest significantly in the availability, veracity and security of relevant data to support its key applications and analytics. This capability can be provided internally/on-premise or purchased as a service. Over time, however, data ages to point 3 where monitoring tools identify data that has not been touched in months or years. The prudent manager then makes the call to move that aged data to secure but less expensive, less accessible long-term storage, point 4, thus freeing rapidly responsive but more expensive storage for younger, still-viable data.
This is a simple model, but its effective execution is based on three key tenets: appropriate centralization through a shared-service approach, transparent visibility of the data and the physical/virtual infrastructure that makes it available to the application and the ability to deduplicate, compress and secure data that is moved into low-cost, long-term storage.
Centralization and shared services
Over the past 40 years, the IT field has struggled with the centralization of IT resources. In the 1970s and early 1980s the common approach was to centralize. Eventually, the inability of those centralized systems to meet user needs led to the decentralized approach that emerged in the 1990s. The challenge for today’s CIOs is to appropriately balance those two extremes. Key is convincing subordinate organizations to adopt a strong enterprise shared service environment.
Richard McKinney, CIO of the Department of Transportation, testified to Congress on this subject in early November 2015. Noting DOT’s unsustainable course of decreasing modernization investment and increasing operations costs, he asserted, “A shared service model should manage the 60-70 percent of the current IT landscape that is commodity IT, and services should be provided as a utility to each business unit.”
“Striking the balance for centralization will not only strengthen operational IT as an enterprise service but it will also drive down the costs of providing commodity IT services,” he added. As McKinney has seen at DOT, it is almost impossible to effectively understand and manage a decentralized, stovepiped IT infrastructure.
Even centralized shared service data infrastructures consist of heterogeneous physical and virtual servers, multiprotocol networks and storage systems. Management of these infrastructures is most often accomplished using vendor-specific element managers operated by distinct technology teams responsible for servers, storage, networks, applications and so on. Technology services delivered to the end user, however, depend on the coordination of physical and logical configurations controlled by these different technology teams.
In order to more effectively support their user communities, IT organizations must use enhanced management tools to help them visualize and better understand how disparate infrastructure elements work together to support key applications and services. Aligning operational management with infrastructure service delivery lets IT managers deliver more reliable, more effective operations to the businesses they support at a lower overall cost. Specific tools include configuration management, monitoring and alerting, capacity management and performance management. Without these applications, IT managers simply cannot make informed decisions like those required at point 3 on the data lifecycle management model.
Appropriate management software also gives IT managers detailed infrastructure visibility. It enables effective data management by providing a complete suite of tools that lets managers control, automate and analyze their data infrastructure, while a rich application programming interface layer enables integration into the wider IT service delivery chain. Device management, problem detection, standardization, data aging and the ability to closely monitor and report on environmental statistics provide valuable insight and enable precise control. Additionally, automation capabilities feature policy-based workflows, automated error identification and application-consistent backup and cloning capabilities. Analytical functions include service management, data capacity planning and service assurance monitoring in a multivendor IT landscape.
Securing data as it ages
The management of aging, unused data often corresponds to how much security and control agencies want to exert over it. The fact is that securing aging data at point 4 in the data lifecycle is far from black and white. An agency might need to move sensitive data off its operational systems and into a secure environment where it will likely remain untouched but must be accessible for the rare occasions when it is needed.
One approach to securing aging, untouched data or provisioning backup/recovery stores involves using physical or virtual appliances to compress, deduplicate, encrypt and then seamlessly and securely backup data and workloads to the cloud. Agency storage teams, for example, can now quickly spin up a cloud-based appliance with a cloud service provider and move their data to that cloud with enterprise-class speed and security. They would pay only for what they use, and they retain the encryption keys. This approach is increasingly popular among government organizations.
There are other key aspects to managing how and where data moves as it ages. First, IT managers must account for data movement between storage technologies, such as flash, spinning disk, magnetic tape, optical discs, etc. For example, inexpensive spinning disk storage may be preferable to less-accessible magnetic tape for long-term storage at point 4.
Second, agencies must factor in data movement between different cloud environments (hybrid, public, private). To do so requires a “data fabric” that allows them to control, integrate, move and consistently manage their data across different cloud environments and multiple cloud vendors. As data ages, the ability to move this data to less expensive storage -- but still recover it when necessary -- becomes paramount.
The data life cycle management model offers IT managers a simple, effective tool for understanding and expressing the value of intelligent data management. It clearly demonstrates the value of investing heavily in keeping oft-used data immediately available but then moving, managing and securing unaccessed, aging data to ensure maximum storage efficiency. The need for efficient data storage is compounded as agencies become more centralized and embrace a shared service approach.
Agencies can leverage the data life cycle management model to articulate the optimal “state” for data (hot/warm or cold/frozen) based on its age. Mission critical data that must be accessed regularly (between points 2 and 3 of the model) demands a hot/warm state, but as that data “cools” to the point where it is no longer regularly accessed, it is appropriate to move it to a secure but less expensive state -- archived, but still accessible. It is neither feasible nor fiscally prudent for IT managers to keep all their agency’s data in a hot state; they must be highly efficient in making their data only as accessible as it needs to be.
Ultimately, optimizing data as it ages depends on establishing a flexible, hybrid environment that permits the data to reside in the most optimal state for storage efficiency.