The devil is in the data grooming
- By Rutrell Yasin
- Aug 16, 2011
Migrating data from existing databases to new systems raises challenges, whether organizations are moving data to on-premises systems or the cloud. For the Census Bureau, the move provided an opportunity to clean up data and get rid of duplication.
A data migration to the cloud also gives agencies an opportunity to validate their data and make sure that they extract the right data and that the data is free of errors, said Willette Allen, the bureau's chief of partnership and data services.
To keep nimble, Census turns to CRM cloud
In fact, loading the data into the system was one of the easier tasks, said Niki Clayton, director of the public sector at Acumen Solutions. “The real challenge with migration was data cleanup.”
Systems often have duplicates, such as organizations with the same name or organizations with variants of the same name or acronyms. For the Census Bureau, one of the main challenges was identifying what was unique and what was a duplicate.
To accomplish that task, Acumen used Oracle’s Fuzzy Logic software, which let the IT team look for terms that are similar but not the same. The systems integrator also used extract, transform and load tools provided by Salesforce.com to move legacy data from databases into the Integrated Partner Contract Database.
Those tools allow developers to upload data from a flat file, Microsoft Excel spreadsheets or SQL databases and then map the field of the files to files that are in the new database, Clayton said.
“As long as you know the source data is clean, you are good to go,” Clayton said. “In our case, we needed to do that additional data cleansing.”
The bureau maintained the legacy systems off-line, Clayton said, noting that migrating data from one system to another is more challenging if people are continuing to add data to the existing system.
Just as important as getting the data in is getting the data out of the cloud, Allen said. Agencies must have a clearly defined process and deliverable schedule for when the data will be extracted from a cloud system once the contract and or services are no longer needed, Allen said.
Rutrell Yasin is senior editor for GCN covering cloud computing.