DOE search site is a big hit
DOE search site is a big hit
Energy puts PubScience through stress test to verify its heavy-duty capacity
By William Jackson
The Energy Department's new PubScience Web site for searching scientific journal citations has had a surprising volume of visitors since its October debut.
Energy's PubScience site draws enough queries to support use of a high-capacity server cluster that can handle 50,000 searches a day at measured performance levels.
Demand is not overwhelming for titles such as 'Semiconductor laser with regrown-lens-train unstable resonator: Theory and practice,' but Energy's Office of Science and Technical Information made sure the PubScience site could handle heavy traffic.
'Performance is a big issue for us,' said Vince Dattoria, senior systems engineer at OSTI. 'We didn't know what to expect, to be honest.'
PubScience, which focuses on the physical sciences, was modeled on the National Institutes of Health's PubMed site, which gets up to 500,000 search requests a day for about 500 journals. PubScience started with about 1,000 journals, and Dattoria thought 50,000 searches a day would not be unreasonable at the start.
'A resource like this usually has slow going until word gets around,' he said.
OSTI hired Neal Nelson and Associates of Chicago to stress-test the site's hardware. It turned out that the two servers OSTI started with were not up to the task.
'Neither of the original machines would have the capacity as standalone servers to support the expected number of users,' said Neal Nelson, the company president.Now serving
OSTI then reconfigured the hardware, putting a Sun Microsystems Ultra HPC 10000 server in front as a gateway and load balancer for two other Sun servers. An Ultra HPC 3000 was upgraded with four CPUs, 5G of RAM and 100G of online disk storage. An Ultra HPC 3500 with two 400-MHz CPUs, 8G of RAM and additional online storage joined the array.
Dattoria said that although he had no solid figures yet, traffic on the site was growing faster than he expected. The page has had visitors from 50 countries, and OSTI estimated that PubScience traffic quickly outstripped the traffic on Energy's main home page.
PubScience not only gives Energy scientists easy access to scientific literature, it also functions as part of OSTI's mandate to disseminate departmental research. It is hosted by OSTI at pubsci.osti.gov, and by the Government Printing Office at its GPO Access site, at www.access.gpo.gov
'We are here to serve the department,' Dattoria said. 'GPO serves the public.'
The search engine was built with Open Text Search SDK from Open Text Corp. of Waterloo, Ontario. The 21 publishers participating in PubScience, or their intermediaries, send citations directly by File Transfer Protocol. 'We have scripts we developed based on the format for each publication,' Dattoria said, and the scripts tag the citations with Hypertext Markup Language links to the full text on the publishers' Web sites.
Most publishers require a subscription or fee to access full text of articles.
Nelson's Chicago laboratory did the stress-testing remotely with a bank of Pentium computers running scripts simulating 25, 50, 75 and 100 simultaneous queries. The remote terminal emulation tests applied the same workload over and over so that system elements could be reconfigured and bottlenecks pinpointed.
The configuration can handle up to 8,400 searches an hour, more than double the original rate, and the average wait during 50 simultaneous searches has dropped from 48 seconds to five.
Nelson said OSTI was wise to test system capacity before going online.
'Many people will put up a site and wait for the problems,' he said. Doing so with PubScience probably would not have been a disaster, Nelson said, 'but I think we accelerated things.'