A Texas-sized cloud storage challenge
- By Amanda Ziadeh
- Jun 01, 2016
The Texas State Library and Archives Commission had a Texas-sized job on its hands when it was tasked with securely preserving all the records produced during former Gov. Rick Perry’s administration. Those materials totaled some seven terabytes of data and 4,000 cubic feet of paper. On top of that, there were 26 terabytes of other electronic records. According to Jelain Chubb, the state archivist and director of the Archives and Information Services Division at TSLAC, data included everything from digitized audio cassettes from the Texas State Senate to video files, emails, databases, maps, photo collections, post cards and more.
Those sources became the foundation of the Texas Digital Archive, a searchable online repository designed to manage, preserve and provide access to the electronic records collections held at the Texas State Archives.
“We needed a way to preserve and provide access to those records just like we do with the records that we created in paper base form,” Chubb told GCN. Most of those 26 terabytes of files were still sitting on servers and drives and needed a permanent home. A state mandate also required that certain and essential government records remain permanently accessible.
To find a solution, TSLAC issued a comprehensive request for proposal that outlined its requirements for a standards-based digital archival preservation system that would conform to the Open Archival Information Systems standards and meet TSLAC’s encryption requirements.
TSLAC was looking for a cloud-based solution so it could easily and securely provide access to archived records with minimal IT overhead. As it was, the cost of server space in the state data center was high and unsustainable.
The agency wanted its data on Amazon Web Services’ GovCloud because that solution addresses specific regulatory requirements of federal, state and local agencies and is built to host sensitive data and workloads. The Texas Department of Information Resources also had concerns about where the records would reside in the cloud and wanted the most secure storage possible.
“Some of the records created and received by state governments have restrictions, and we need to make sure that those restrictions were followed both during the transfer of the records and then while they’re at rest in our system,” Chubb said.
In the end, TSLAC chose the Preservica digital preservation system, running on AWS GovCloud.
While Preservica had clients using the cloud-based version of its data preservation suite, it had not yet worked with the AWS GovCloud service. “Our requirement led Preservica to speed up their partnership with the Amazon Government Cloud,” making TSLAC the first Preservica client to use AWS GovCloud, Chubb said.
Preservica’s software handles storage, content management and the updates needed to preserve data as file formats evolve. Several copies of records are saved in different places in the cloud so there is always a backup and original, and each record is accompanied by descriptive metadata so it can easily be identified and accessed in the future.
To handle its records, the agency purchased two tiers of cloud storage. It bought 12 terabytes of Amazon S3 for frequently accessed digital data and 25 terabytes of lower-cost storage space in Amazon Glacier for large, infrequently accessed digital master files. “That’s one aspect that Texas has made good use of,” Preservica CEO Jonathan Tilbury said. “They’re quite clever about their storage to keep the price down.”
Preservica also provided technical assistance, and online training to both the TSLAC staff and a user group of archivists so they could share best practices. TSLAC’s archivists were in charge of making sure data was correctly structured and tagged, as well as migrating and transferring the records into the Preservica system. Chubb said it took several months of continuous work to upload the information and descriptive metadata to the cloud, taking a year to complete the whole project.
Preservica also powers the Texas Digital Archive, which is an online portal for public access of state records. The platform integrated with the Preservica product, and as soon as data is loaded and security tags are in place, records become visible on the website.
Because the site provides access to the electronic records collections of the Texas State Library and Archives Commission, including those transferred by state agencies or digitized by the State Archives, citizens and researchers don’t have to shift from agency to agency to find or request records -- and agencies can retire legacy preservation processes. “We think it’s overall going to be a savings to the state as we transfer in older legacy records that agencies are still having to maintain,” Chubb said.
By tracking the use of its digital archive site with Preservica’s user analytics, TSLAC can see the number of individual users, page views and item requests. From March through May, the portal saw 23,377 total page views and 1,844 users, 68.7 percent of which were new visitors and 31.3 percent were retuning visitors, according to Chubb.
For agencies looking for a similar solution, Chubb said Texas is sharing its RFP with other states, and both Chubb and Tilbury encouraged states to host digitized archival records on the cloud.
“Doing things on premise can take a long time. Cloud gives the opportunity to get started within days,” Tilbury said. “And you get updates instantly, the cost is lower, so it’s a better service and it’s cheaper. What’s not to like?”
Amanda Ziadeh is a former reporter/producer for GCN.