Site renovation: How THOMAS transformed into Congress.gov
- By Stephanie Kanowitz
- Jul 23, 2014
After hosting more than 100 million visits in its lifetime, the Library of Congress’s venerable THOMAS online database of U.S. legislative information is nearing retirement. In September, the new, more technically advanced Congress.gov is set to take its place.
“There was nothing fundamentally wrong with THOMAS,” said Jim Karamanis, chief of Web services at LOC. “It was just a 20-year-old application that was way ahead of its time when it was launched.”
The modernization of THOMAS had two goals, said Mike Nibeck, information technology specialist at the library. The first was to update the technical infrastructure it was sitting on so that it could perform to scale more readily.
“We also wanted to modernize the user interface … and how we present complex information on legislative data, so we spent a lot of time early on focusing not only on the technical infrastructure but the information architecture as well,” Nibeck said.
“Our hope was that the new system would not only perform better, scale better and be able to grow with new technology, [but that] it would also be fundamentally easier to use and provide more information to the customers.”
Those requirements come together in Congress.gov’s new interface, where users will notice several immediate differences.
First, the screen and data are mobile-friendly, scaling correctly on a variety of differently sized mobile platforms. Congressional members’ data has been added and search functions enhanced to allow faceting and other advanced options. (Faceted navigation uses metadata to provide users with easier-to-use options for clarifying and refining queries on mobile devices.)
“In the older THOMAS system, all the data was siloed, either by Congress or [according to] whether it was legislative data,” Nibeck said. “Now we can search across all of the data in the system in a single search.”
The Congress.gov team started work at the server layer by building a virtualized server infrastructure. PHP and Python, languages supported by the open-source community, were chosen for the system development. SOLR, also an open source tool, was picked for the search work.
Using open industry tools saved money and increased the pool of resources, Nibeck said. “I needed to be able to find various skilled developers to do this thing, and it’s much easier to find PHP and Python developers than it is to find proprietary Web content management … resources,” he said.
Layering and redundancy represent another notable enhancement of Congress.gov, project leaders say.
“The legacy system was a simple thread on a really big, behemoth legacy server,” Karamanis said. With Congress.gov, “there’s multiple levels to prevent any sort of failure from being seen by the end users. We have multiple caching levels, multiple Web servers – everything is redundant.”
Overall, the decision to revamp the LOC system was less the result of newly available technology than it was the sensitivity of its developers to a more natural technology progression. “We really took a step back and said, ‘What is modern Web development doing? What are the techniques? What are the technologies? What are the processes when you do Web development now?’” Nibeck added.
“We pretty much adopted all of the standard best practices, and in some cases we were maybe a little ahead of the curve on technology.”
Since Congress.gov began running in beta in 2012, THOMAS has continued to do its job. That means that data flowing into the THOMAS database also needs to flow into Congress.gov.
For the most part, incoming data remains the same for both systems. “We worked with all of our data partners, and our goal was to not force them to make any changes,” Nibeck said. “We did our best to accept whatever format they had, and we’ll do the transformation.”
In some cases, that meant that the team wrote specialized data components to pull data out of the THOMAS database and convert it to the Congress.gov format.
Often they were able to rewrite or use new data feeds from sources that didn’t exist when THOMAS was written. For example, there’s a newly written data feed to Congress.gov from the Government Printing Office’s Federal Digital System (FDsys), because the system didn’t exist when THOMAS was written.
Several lessons have emerged from this project, according to Nibeck and Karamanis.
“For a project this large you have to have buy-in from senior management,” Karamanis said. “Secondly, collaboration has to take place across all the different units in the library. So having that buy-in from senior management … and across all the units is really what has enabled us to make this system and roll it out.”
Having a small decision-making team was also crucial, Nibeck said, especially because the IT department was trying a new approach to rolling out the project. Instead of the traditional large-scale development approach It used agile methodologies, releasing small pieces of functionality quickly.
People were nervous about that tactic at first, he said, but they quickly came around when they saw how fast the team moved through iterations. “That iterative release cycle was critical to success,” Nibeck said. “We were able to deliver a very complicated system in a very short period of time with, for the most part, a very positive response.”