For the augmentation strategy to work, we need to perform the following fundamental tasks:
^
as known
mathematical operators. The textualizing process merely involves substituting
non-alphanumerical characters with pre-defined alphanumerical counterparts.
The main task of flattening is to identify and scope the parts of a formula such as numerators, denominators, exponents and arguments, and to surround them with pairs of matching markup tags. Table 1 below illustrates the concept.
The main approach, which performs both textualization and flattening, is
as
follows. First, each formula is parsed into a tree where each node is labeled with the
corresponding entity in the formula (e.g., ``+'', ``2'', ``x'',
and ``^''). Second, the parse tree
is traversed and converted into a textualized sentence with
all the proper matching pairs
surrounding formula parts. The granularity of the parts to be scoped depends on
how
much specificity we want to allow the users to have when formulating queries.
| ORIGINAL | TEXTUALIZED & FLATTENED FORM |
| 1-12-2 | |
x^{t-2} = 1 |
x BeginExponent t minus 2 EndExponent Equal 1 |
f(x) or f@(x) |
f ApplyAt BeginArgument x EndArgument |
(x+1)/(x-1) |
fraction (x plus 1)(x minus 1) |
| -alternatively- | |
| fraction BeginNumerator x plus 1 EndNumerator | |
| BeginDenominator x minus 1 EndDenominator |
Flattening deserves special attention, not only because it is inherent to text
search,
but also because it creates a new problem stemming from the multiplicity of
linear
forms that an expression can be mapped to. For example,
can be flattened in
two ways, depending on whether we put the subscript before or after the superscript.
If the order in the contents is different from the order in the query (after
flattening), then an undesirable mismatch results. Hence the need for
the next task.
The normal form is a data model that should be generic enough to model all expressions/formulas. We define a new data model called the sorted parse tree normal form. It is a parse tree where the children of any nodes are sorted from left to right whenever changing the order of the children does not alter the mathematical meaning, such as when the node is an associative and commutative operator, and the children are its operands. The ordering is based on any numerical or alphabetical associated key. For example, we can sort the children based on the number of descendent nodes that each child has in the parse tree. If two nodes have the same number of descendants, then we sort them alphanumerically based on the entity labels of the nodes. The normalization algorithm is an algorithm that sorts the parse tree.
| OPERATOR | MEANING |
| 1-12-2 | |
+, -, /, * |
the standard arithmetic operators |
^, ** |
Superscript or power |
_ |
subscript |
@ |
apply at as in f(x) or f@(x) |
=>, <=>, <== |
imply, equivalent, if |
!=, ~=, not= |
not equal |
| =- | equivalence ( |
=~ |
congruence ( |
| =. | dot equal ( |
-~ |
similar-to or equal ( |
~ |
similar-to ( |
~~ |
approximate ( |
| determinant, magnitude |
| QUERY | MATCHED ELEMENT |
| 1-12-2 | |
x^2 |
|
(x OR t)^3 |
|
$^3 |
any single-character cubed |
sqrt(x^2+1) |
|
| Gamma(1/3) | |
^(x+2) |
(x+2) in an exponent part |
/(x+2) |
(x+2) as a denominator |
(... x+2 ...)/ |
(x+2) embedded in a numerator |
We built an augmented search system for the DLMF. Its principal modules are described next.
An example of a generic math entry is that of the term derivative,
which includes among its synonyms the term differential or diff
for short. An example of a DLMF-specific entry is:
Those modules have been designed and implemented, and a working math search system is in place. The system has math search capabilities that far exceed those of text search engines. Still, the system is work in progress. Additional refinements are being made, and new capabilities are being added.
Among those new capabilities are (1) the proper handling of customary names and symbols that are widely used in the area of special functions, and (2) the support of basic equivalences, including commutativity and associativity, among others.
In the future, we will explore searching based on new and emerging encodings such as MathML and OpenMath. The precise direction that will be taken will be determined according to which of the new encoding standards will be adopted and become widely used and technology-supported.
Also, an all-math search engine, created from scratch for that purpose, may be developed, if and when the necessary research in indexing of mathematical structures yields the necessary understanding and techniques. Experience gained from the mathematical and scientific user community regarding math search, especially in reference to query languages, modes of search, and what kinds of math information users usually need to search for, will be an extremely valuable guide into building future math search systems.
|
![]() |