LOC hybrid hosting delivers scalability, access
- By Stephanie Kanowitz
- Mar 17, 2020
The Library of Congress is moving from an on-premises data center to a three-part hybrid hosting environment to increase its scalability and ability to provide fast, reliable access to digital collections.
Funded by Congress in 2017, the initiative will wrap up by Sept. 30, when the library vacates its legacy data center. Meanwhile, library staff members have been assessing which data and applications are best suited for each of the three new environments: the library’s own cloud, a vendor-supplied software-as-a-service (SaaS) environment or a new data center.
“Because of the different type of mission we have at the Library of Congress and because of the different type of data that we have, we really had to go for a hybrid hosting environment,” said library Deputy CIO Judith Conklin.
She categorized the library’s data into four sets: public, congressional, copyright and business processes.
Most public data is ideal for the cloud, Conklin said, especially as employees follow the librarian of Congress’s directive to “throw open the treasure chest” by digitizing many of the library’s analog collections.
“We’re doing that in two ways: We’re digitizing our analog collections and we’re starting to receive born-digital, meaning we never get it in an analog format … we just receive it digitally,” Conklin said. “A cloud environment can help us do that. We can present more and more with the capacity of the cloud by [posting] more and more digital collections.”
Conklin breaks the digital collections into two types: preservation and presentation. Presentation data is what visitors to the website can view and is stored in the cloud. Preservation data, on the other hand, is the “museum part” of the library, she said, and required by a CIO directive to remain on-premises.
Because the library’s mission is to serve Congress, it’s the keeper of legislative and congressional data, some of which needs to stay on-premises, she said. Congress.gov data is public and can therefore be in the cloud, but other data is more sensitive and must remain on-prem.
Similarly, anything that is still under copyright must be stored in the on-prem data center and treated as sensitive data. Financial, contractual and human resources data, which fuel the library’s business processes. can exist in a SaaS environment or the library’s cloud, depending on the level of sensitivity, Conklin said.
“SaaS to me is the vendor. A vendor has said, ‘We will do the entire IT stack for you,’” she said. “That’s preferred. If it is financially feasible, if it’s not too expensive -- the option that they’re giving. We prefer that to get out of the hardware business,” she said. “If it is not being offered in a SaaS or it’s not financially feasible, then our next preference is a platform-as-a-service up there in the cloud and [to] have a cloud [Federal Risk and Authorization Management Program] vendor take some responsibility for parts of the IT stack, with us owning the responsibility of the application and the data.”
At the end of the month, the library will go live with a limited pilot of a new application for copyright called Recordation that resides in the cloud, and it’s readying the National Library Service for the Blind and Print Disabled collection of downloadable braille and audiobooks and magazines to move to the cloud soon.
The transformation project grew out of a need to revamp or replace the existing data center, which fell short of the library’s standards and requirements for heating, ventilation and air conditioning. On the scale of data center service performance, “we were barely making a Tier 1,” the lowest ranking, Conklin said. “It was going to be more money to fix that problem and stay here on Capitol Hill than go somewhere to a new data center. So, Congress decided to fund us for a new data center and hybrid hosting instead of giving us funding for fixing our tier problem.”
For the new data center, the library contracted with a facility that provides Tier 3 services and built its racks in that facility. In August 2018, the library awarded Accenture a $27.3 million contract to migrate its data center to new hosting environments.
Challenges that have come up in the modernization process include cleaning up technical debt and reducing shadow IT.
“I wish we could go faster and just be done with it, but it is very, very hard [after working] in a data center for so long and with so much storage and so much data. It’s time-consuming and takes a lot of energy and funding and resources,” Conklin said.
The effort will be worth it, Conklin said.
“One of the benefits is we get out of -- I don’t want to say 100% -- the hardware business, but the percentage goes significantly down, and hardware is expensive,” Conklin said. “Another benefit is we get to choose the best environment for the type of data. Another one is we can really scale when we need to. If we have a big public announcement that everyone wants to see … our cloud vendor can help us do that…. This gives us so many more options than just being solely on-prem.”
Stephanie Kanowitz is a freelance writer based in northern Virginia.