NIST math project expands the horizons of Web publishing
- By William Jackson
- Oct 20, 2011
Mathematical truths are eternal, but the comprehensive "Handbook of Mathematical Functions," a best-seller of the National Institute of Standards and Technology since its publication in 1964, had become badly out of date by the end of the 20th century.
“Mathematics does not stand still,” said Daniel Lozier, Mathematical Software Group leader in NIST’s Applied and Computational Mathematics Division.
NIST to unveil cloud technology roadmap at forum in November
Going beyond PDF for the Web
New properties are discovered for formulas that have been known for centuries, and new formulas and functions are discovered, so NIST set out in 1999 to update the handbook. This became an 11-year effort requiring a team of 50 international experts who vetted more than 10,000 equations and created hundreds of illustrations. It also required the development of new tools for math-oriented authoring and searching on the Web.
“We wanted to take advantage of that piece of the Internet to bring this information to users,” said Lozier, who was general editor for the project.
Computers had changed the scientific landscape since the publication of the original handbook.
“In 1964, people were doing calculations with hand calculators or with pencil and paper,” he said. About half of the original 1,046-page book consisted of tables for doing those calculations. “Nowadays, most people compute functions with computer programs. Nobody was buying the 1964 handbook for the tables by 1995. They were buying it for the formulas.”
Web graphs and visualizations
The new edition of the handbook, published in May 2010 by Cambridge University Press, is about the same size as the original, but with the obsolete tables removed there are double the number of formulas included. Its online counterpart, the "Digital Library of Mathematical Functions," evolved even further.
“We didn’t do much thinking about it at the beginning,” Bonita Saunders, mathematician in the Mathematical Software Group, said of the Web format. “It became a big part of the project.”
“There are over 600 graphs and visualizations on the website,” said Saunders, who managed the production of the images. Many of the images are interactive 3-D visualizations that can be panned, rotated and zoomed in on. “We did state-of-the-art graphics on the Web.”
The Web production was a challenge because the Web itself evolved over the 11 years of the project.
“You make predictions about where things appear to be going,” said Bruce Miller, NIST physicist and information architect for DLMF. “You make your best guesses about software, and you are continually changing course because the Web is continually changing.”
The original plan was to present pages in HTML, the traditional markup language for Web design, with the mathematical equations fixed as images. Over time more advanced markup languages became available that could make the equations dynamic objects. One of these was MathML, a markup language for mathematical expressions for which browser plug-ins and applications were available, but which had not been widely used.
“It seemed like the right thing to do, at the time,” said Miller, who is on the World Wide Web Consortium standards committee for MathML. “It ultimately turned out that it was a pretty good prediction.”
Dynamic math objects
But it was not perfect. Authoring Web content in MathML is not easy, and mathematicians were more comfortable writing in LaTeX, a document preparation system for technical and scientific documents that was used as the input language for the DLMF. A way was needed to translate LaTeX to MathML as well as to HTML with images, and no tool for this existed. So NIST invented the LaTeXML translator. Making the equations dynamic objects allowed the authors to embed and link to a wealth of metadata to make a rich Web interface.
The next challenge was to enable online searching of the "Digital Library of Mathematical Functions," which was complicated by the fact that the bulk of the publication consists of mathematical formulas rather than text, said Abdou Youssef, professor and chairman of the Department of Computer Science at George Washington University.
“We wanted something that was as effective as Google is for text searches,” Youssef said, but Google was completely inadequate. “The non-linear structure was completely foreign to text search.” Standard search engines do not understand the symbols or the logic used in formulas.
Searching mathematical expressions required breaking new ground, and Youssef headed the development of a math-oriented search engine. The DLMF engine is based on Lucene, maintained by the open-source Apache project, which was augmented with additional data and processing layers along with relevance tools for ranking results. Symbols are turned into text and put into a linear form, and normalization techniques were developed to deal with alternate ways of expressing a formula.
One of the hardest jobs was ranking search results in terms of relevance to the user.
A Google for formulas
“Google does a very good job of figuring out what the user means and what his priorities are,” Youssef said. “But Google techniques don’t work with math.” One problem was that when searching a self-contained database, the popularity of a site does not help with ranking. But “we had one advantage that Google didn’t have: We had complete control over the contents” of the database being searched and could use the metadata to help determine the relevance.
“We have made quite a bit of headway,” Youssef said. “I can’t say we’ve closed the book; there is a lot more to be done. But overall I think we’re pretty happy with the search capability.”
The DLMF is hosted on an Apache server, and because not all browsers support MathML it is offered in HTML with the equations as images as well as the richer MathML format.
Because it is digital, the online version of the handbook does not have to wait 41 years to be refreshed. To keep it in step with the printed version, substantial additions to the website will not be made until a new print edition is published. But it is updated as needed to clarify and make corrections. The third update was made in August to clarify an equation and add several bibliographic citations.
The DLMF is likely to prove as popular as its print predecessor, having generated more than 268,000 page views in its first three weeks online, and NIST officials believe it will become the agency’s new best-seller.
“Math has always been an essential part of modern science,” Lozier said. “These functions have proved their worth in scientific applications.”
William Jackson is a Maryland-based freelance writer.