A search and discovery environment for data and code can show how an agency's information is connected and grouped within projects or departments.
If you're old enough, you probably remember visiting the library and thumbing through drawers of index cards that explained how to find a book in the stacks and included information about its author and publication. That piece of furniture was the card catalog, and when it comes to government data, we can learn a lot from that analog dinosaur.
While there is a push in government for centralization and consolidation of everything from servers to IT departments, organizational data remains scattered. It’s on servers and in the cloud; it’s on desktops, laptops and cell phones; it’s in stacks of paper in file rooms; it’s in databases, spreadsheets and PDFs. Agencies have little idea what data they own, where it lives, what’s in it, who has access, when or if it’s been changed and how it’s being used in programs or reports.
Have you ever been told, “We’re being audited. I need you to find every piece of data in the agency that deals with tax refunds,” or “I need you to find every database that has Social Security numbers in it.” If your job depended on it, could you do that within minutes? Likely not.
It doesn’t need to be that way.
The capability exists now for a digital card catalog for all agency data assets. Such a catalog provides a search and discovery environment for data and code in the enterprise and shows how it’s all connected and grouped within projects or departments. With a few clicks of a button users can do a Google-style search to find all data related to “tax refund” or “Social Security number.” The data catalog returns a full list of every data source, software program or report that meets the search criteria.
Like the index cards in the library's card catalog, the digital version also provides metadata -- every piece of information known about that data asset. Information such as the organizational owner, technology/format, location, security controls, when and how it was last updated and its structure can be displayed to authorized users. The catalog even provides descriptive statistics about its content, such as how the data in the file or database is formatted and where it’s being used in the agency. It allows requestors to check out the data using a pre-defined workflow, play with it in an on-demand temporary analytical environment and then check it back in when they’re done, just like a library.
Organizations like the American Heart Association have created web catalogs with the goal of encouraging academia and the private sector to use their data for research. They built a website where researchers can find the data they want and check it out in an on-demand environment. Once data is organized and cataloged, new capabilities emerge, such as:
- A recommender system that suggests to a user other datasets to integrate to enrich their data, and provides the code to merge the datasets.
- A data modernization strategy that allows IT to change the location and storage format of data while minimizing the impact to the business.
- Identification of duplicate copies of data based on statistical characteristics of the data itself rather than just filenames and sizes.
Moving forward with cataloging data
To get started, ask your business users how they currently find existing and new datasets, how existing data is used, who has access to it and where it is documented. Check to see if there are any issues around data being “held hostage” from your analysts. The next step is to understand the benefits that could be realized by organizing the data and code by leveraging cloud and/or container deployment options. Assess the amount of time that could be saved by having quick and easy access to existing and new datasets, both inside and outside the organization. Determine the amount of security risk that could be mitigated by greater transparency around all data assets, and whether negative audit findings could be resolved by such a solution.
The old library card catalog approach can help get government agency data under control, making it truly usable to support better decisions.