4.1 Approaches to Math Search

Broadly speaking, two general approaches suggest themselves. The first approach is an evolutionary one, which augments existing text search systems with math-appropriate searching capabilities. The second approach is to create a math search system wholly from scratch to fully capture and index mathematical content.

The augmentation approach leverages the power and maturity of text search engines, and involves less work. Although it has inherent limitations as to how much mathematical structure it can capture, it has taken us a long way toward effective equation-based searching. It is the approach currently adopted by the DLMF, and will be discussed in more detail in the next subsection.

The second approach is much more demanding in time and effort. It will require not only the careful use and integration of various computer algebra and symbolic processing techniques, but also the development of novel indexing and search techniques for which research has barely begun. Certainly, parsing techniques for mathematical expressions have been developed for compilers and computer algebra systems [2,13]. Those techniques can and should be adopted by this approach as well as by the augmentation approach.

The indexing of mathematical structures is a largely new subject. Some work has been done on searching for integrals [4]. However, to the authors' knowledge, no work has been published on indexing full mathematical structures in such a way that users can search for structures of parts thereof. For example, a user can search for a whole matrix or a submatrix, a fraction or a denominator, a function with its argument or just the argument, a summation or just the summand, and so on. This ability to search with queries that specify the patterns to be found in named parts of a structure is possible with the mathematical indexing approach but cannot be fully achieved through the augmentation approach.

As for incorporating mathematical synonyms and equivalences, some aspects of automated theorem proving [14] and symbolic processing in computer algebra can be drawn upon. Specifically, a thesaurus-like dictionary of abstract equivalence rules will have to be used by the search system to detect concrete instances of equivalences between the query form of an expression and the contents' form(s) of the same expression. Applying abstract inference (equivalence) rules in this context is tantamount to matching (i.e., instantiating) them to concrete instances, which is a task that automatic theorem provers generally carry out. Sophisticated equivalence detection, however, is not a mature technology, and is probably too slow for online searching. Nevertheless, for certain kinds of equivalences, equivalence detection is feasible and will be addressed in our future work.

Technical Aspects of the Digital Library of Mathematical Functions 1
Bruce R. Miller - Abdou Youssef
Translated by Bruce R Miller on 2002-12-17
Comments? DLMF_feedback@nist.gov
Digital Library of Mathematical Functions