Data governance for each step in data’s journey
- By Amy O’Connor
- Jul 07, 2016
Data governance is a critical yet challenging business process in a world where data volumes, types and uses are rapidly expanding. Government agencies must employ comprehensive governance strategies to ensure data is properly collected, managed and used, all while protecting individuals’ privacy and ensuring agencies can quickly and effectively make use of data to drive mission accomplishments -- the true purpose of collecting it in the first place.
Information governance -- addressing the rules of handling and retaining documents, emails, social media and other information -- has become a common topic of discussion among government agencies particularly in light of the numerous mandates agencies are required to abide by. On the other hand, data governance -- the policies for managing, using and securing data -- is not discussed as often, but it is just as important. As the foundation of information governance, data governance involves assessing the quality of data, protecting the data to ensure appropriate use, providing auditing and lineage transparency and enforcing governance policies, among other things.
Data is a living asset, constantly evolving and moving, taking many different paths as it is accessed and changed by various users and systems as it travels from collection to delivery. Consequently, agencies should consider data governance throughout the entire data journey, as suggested below.
Data quality assessment when collecting data. The collection of data from sources into a big data system is an important step in the data journey, and the most important data governance consideration at this stage is quality assessment. The quality of data can be correlated with the value of the information that comes from it -- the “garbage in, garbage out” principle. As processes for extracting data from source system, loading it into a big data system and transforming it – commonly called ELT processes -- are put in place, the quality of the data must be assessed and tight feedback loops with the data owner/creator enforced to ensure continual improvement of data quality over time.
Data protection when enabling exploration and analysis. A key data governance consideration is access control. Small groups of people, such as data scientists and engineers, need fairly broad access to data, but they shouldn’t, for example, be able to publish changes to the data for broader consumption without going through a governance process. On the other hand, there are smaller sets of data -- those that have gone through a production governance publication process -- that need to be made available to large groups of people. The best policy is to set access control at the data level to ensure appropriate use of the data, no matter which application or person is accessing it. In addition, authentication controls should be in place to verify that those accessing data are who they say they are, and all data should be encrypted to ensure security through the remainder of the data journey.
Lineage tracking and auditing when delivering data. When analysts, engineers and developers deliver the information to consumers, lineage tracking is important. Consumers in the agency environment range from department directors to the general public, and the data may be delivered in dashboards and other business intelligence tools or in mobile, web and interagency applications. Understanding how the data has been modeled and processed is essential to providing the level of visibility that ensures insights delivered are correct and that errors can be traced and fixed. To satisfy auditing requirements, systems should also be in place to document who touched what, when, why and how.
Governance enforcement throughout the data lifecycle
Throughout data’s journey, agencywide data governance planning should be incorporated. Organizations should create an executive data governance council comprised of people who understand the business value of data. The data management team, which understands the intricacies of the data, should make policy recommendations to the council. That same data management team is then tasked with setting data protection and other data governance rules agreed to by the executive council.
Good data governance guides the appropriate use of data, which leads to better insights, information and, ultimately, innovation. Service to citizens -- the ultimate mission of every government agency -- warrants good data governance in place every step in the data journey.
Amy O’Connor is a big data evangelist at Cloudera.