4 Mathematical Search

In traditional books and handbooks, and even in well-structured Web sites, an index and a table of contents are often adequate for finding information. In the DLMF Handbook and Web site, however, the contents contain so many formulas and other mathematical constructs that mere indexes and table of contents fall extremely short. Rather, a special search system has to be provided in order for users to quickly locate what they are looking for.

Furthermore, although text search and retrieval is a mature technology and many text search systems are available, math search presents new demands and issues that the text search community never had to face. To identify the major math search issues, it helps to consider the reasons why conventional search systems are inadequate for math search. We recognize three major reasons.

The first is that mathematical contents often involve non-alphabetical symbols that are not understood by current search systems, or at least not rightly interpreted. Terms like Gamma(1/2), P_n(x), x**5, or d^2y/dx^2-x=0 are either meaningless or improperly read and processed by current systems.

The second and more challenging reason is that formulas and equations, as well as other mathematical constructs, have rich structures that convey much meaning. Current search engines are not ``aware'' of those structures, do not capture or index them, and are thus unable to search for information that involve structural relationships and patterns. A query like sin(x + log x) is no different to a current search system than sin x + log x. Similarly, x (y + z) is misinterpreted as x y + z, if interpreted at all.

The third and most challenging reason is that the many equivalent ways in which mathematical terms can be expressed, which correspond to synonyms in text search, are often much more complex than textual synonyms, and thus cannot be fully captured in a thesaurus. A summation or a product of two or more terms can be expressed in many equivalent ways due to commutativity and associativity laws. Numbers can be represented in multiple forms (e.g., 1/2 vs. 0.5 vs 2^{-1}). Polynomials can be expressed in many factored and unfactored forms. Trigonometric terms can be easily substituted by other equivalent trigonometric terms. Indeed, it can be argued that a large part of Mathematics is about the different and equivalent ways of expressing a concept or a quantity. Obviously, current search systems are not equipped to recognize those equivalences and take them into account when searching -- indeed, the problem is not solvable in general.

Therefore, the major math search issues can be summarized as follows:

The next subsection will discuss some of the approaches that are being taken to address those issues.



Subsections
Technical Aspects of the Digital Library of Mathematical Functions 1
Bruce R. Miller - Abdou Youssef
Translated by Bruce R Miller on 2002-12-17
Comments? DLMF_feedback@nist.gov
Digital Library of Mathematical Functions