The devil is in the data grooming

Migrating data from existing databases to new systems raises challenges, whether organizations are moving data to on-premises systems or the cloud. For the Census Bureau, the move provided an opportunity to clean up data and get rid of duplication.

A data migration to the cloud also gives agencies an opportunity to validate their data and make sure that they extract the right data and that the data is free of errors, said Willette Allen, the bureau's chief of partnership and data services.

Related coverage: 

To keep nimble, Census turns to CRM cloud

In fact, loading the data into the system was one of the easier tasks, said Niki Clayton, director of the public sector at Acumen Solutions. “The real challenge with migration was data cleanup.”

Systems often have duplicates, such as organizations with the same name or organizations with variants of the same name or acronyms. For the Census Bureau, one of the main challenges was identifying what was unique and what was a duplicate.

To accomplish that task, Acumen used Oracle’s Fuzzy Logic software, which let the IT team look for terms that are similar but not the same. The systems integrator also used extract, transform and load tools provided by to move legacy data from databases into the Integrated Partner Contract Database. 

Those tools allow developers to upload data from a flat file, Microsoft Excel spreadsheets or SQL databases and then map the field of the files to files that are in the new database, Clayton said. 

“As long as you know the source data is clean, you are good to go,” Clayton said. “In our case, we needed to do that additional data cleansing.”

The bureau maintained the legacy systems off-line, Clayton said, noting that migrating data from one system to another is more challenging if people are continuing to add data to the existing system.

Just as important as getting the data in is getting the data out of the cloud, Allen said. Agencies must have a clearly defined process and deliverable schedule for when the data will be extracted from a cloud system once the contract and or services are no longer needed, Allen said.

About the Author

Rutrell Yasin is is a freelance technology writer for GCN.


  • Records management: Look beyond the NARA mandates

    Pandemic tests electronic records management

    Between the rush enable more virtual collaboration, stalled digitization of archived records and managing records that reside in datasets, records management executives are sorting through new challenges.

  • boy learning at home (Travelpixs/

    Tucson’s community wireless bridges the digital divide

    The city built cell sites at government-owned facilities such as fire departments and libraries that were already connected to Tucson’s existing fiber backbone.

Stay Connected