Ch.5 Mathematics

§ 5.1. Math Details

LaTeXML processes mathematical material by proceeding through several stages:

  • Basic processing of macros, primitives and constructors resulting in an XML document; the math is primarily represented by a sequence of tokens (XMTok) or structured items (XMApp, XMDual) and hints (XMHint, which are ignored).

  • Document tree rewriting, where rules are applied to modify the document tree. User supplied rules can be used here to clarify the intent of markup used in the document.

  • Math Parsing; a grammar based parser is applied, depth first, to each level of the math. In particular, at the top level of each math expression, as well as each subexpression within structured items (these will have been contained in an XMArg or XMWrap element). This results in an expression tree that will hopefully be an accurate representation of the expression's structure, but may be ambigous in specifics (eg.`what the meaning of a superscript is). The parsing is driven almost entirely by the grammatical role assigned to each item.

  • Not yet implemented a following stage must be developed to resolve the semantic ambiguities by analyzing and augmenting the expression tree.

  • Target conversion: from the internal XM* representation to MathML or OpenMath.

The Math element is a top-level container for any math mode material, serving as the container for various representations of the math including images (through attributes mathimage, width and height), textual (through attributes tex, content-tex and text), MathML and the internal representation itself. The mode attribute specifies whether the math should be in display or inline mode.

§ 5.1.1. Internal Math Representation

The XMath element is the container for the internal representation

The following attributes can appear on all XM* elements:

role

the grammatical role that this element plays

open, close

parenthese or delimiters that were used to wrap the expression represented by this element.

argopen, argclose, separators

delimiters on an function or operator (the first element of an XMApp) that were used to delimit the arguments of the function. The separators is a string of the punctuation characters used to separate arguments.

xml:id

a unique identifier to allow reference (XMRef) to this element.

¶ Math Tags

The following tags are used for the intermediate math representation:

XMTok

represents a math token. It may contain text for presentation. Additional attributes are:

name

the name that represents the `meaning' of the token; this overrides the content for identifying the token.

omcd

the OpenMath content dictionary that the name belongs to.

font

the font to be used for presenting the content.

style

?

size

?

stackscripts

whether scripts should be stacked above/below the item, instead of the usual script position.

XMApp

represents the generalized application of some function or operator to arguments. The first child element is the operator, the remainig elements are the arguments. Additional attributes:

name

the name that represents the meaning of the construct as a whole.

stackscripts

?

XMDual

combines representations of the content (the first child) and presentation (the second child), useful when the two structures are not easily related.

XMHint

represents spacing or other apparent purely presentation material.

name

names the effect that the hint was intended to achieve.

style

?

XMWrap

serves to assert the expected type or role of a subexpression that may otherwise be difficult to interpret — the parser is more forgiving about these.

name

?

style

?

XMArg

serves to wrap individual arguments or subexpressions, created by structured markup, such as \frac. These subexpressions can be parsed individually.

rule

the grammar rule that this subexpression should match.

XMRef

refers to another subexpression,. This is used to avoid duplicating arguments when constructing an XMDual to represent a function application, for example. The arguments will be placed in the content branch (wrapped in an XMArg) while XMRef's will be placed in the presentation branch.

idref

the identifier of the referenced math subexpression.

§ 5.1.2. Grammatical Roles

The role attempts to capture the syntactic nature of each item. This is used primarily to drive the parsing; the grammar rules are keyed on the role, rather than content, of the nodes. The role is also used to drive the conversion to presentation markup, especially Presentation MathML, and in fact some values of role are only used that way, never appearing explicitly in the grammar.

The following grammatical roles are recognized by the math parser. These values can be specified in the role attribute during the initial document construction or by rewrite rules. Although the precedence of operators is loosely described in the following, since the grammar contains various special case productions, no rigidly ordered precedence is given.

ATOM

a general atomic subexpression.

ID

a variable-like token, whether scalar or otherwise.

PUNCT

punctuation.

APPLYOP

an explicit infix application operator (high precedence).

RELOP

a relational operator, loosely binding.

ARROW

an arrow operator (with little semantic significance). treated equivalently to RELOP.

METARELOP

an operator used for relations between relations, with lower precedence.

ADDOP

an addition operator, precedence between relational and multiplicative operators.

MULOP

a multiplicative operator, high precedence.

SUPOP

An operator appearing in a superscript, such as a collection of primes.

OPEN

an open delimiter.

CLOSE

a close delimiter.

MIDDLE

a middle operator used to group items between an OPEN, CLOSE pair.

OPERATOR

a general operator; higher precedence than function application. For example, for an operator A, and function F, AFx would be interpretted as A(F)(x).

SUMOP

a summation/union operator.

INTOP

an integral operator.

LIMITOP

a limiting operator.

DIFFOP

a differential operator.

BIGOP

a general operator, but lower precedence, such as a P preceding an integral to denote the principal value. Note that SUMOP, INTOP, LIMITOP, DIFFOP and BIGOP are treated equivalently by the grammar, but are distinguished to facilitate (eventually!) analyzing the argument structure (eg bound variables and differentials within an integral). Note are SUMOP and LIMITOP significantly different in this sense?

VERTBAR
FUNCTION

a function which (may) apply to following arguments with higher precedence than addition and multiplication, or parenthesized arguments.

NUMBER

a number.

POSTSUPERSCRIPT

the usual superscript, where the script is treated as an argument, but the base will be determined by parsing. Note that this is not necessarily assumed to be a power. Very high precedence.

POSTSUBSCRIPT

Similar to POSTSUPERSCRIPT for subscripts.

FLOATINGSUPERSCRIPT

A special case for a superscript on an empty base, ie. {}^{x}. It is often used to place a pre-superscript or for non-math uses (eg. 10${}^{th}).

FLOATINGSUBSCRIPT

Similar to POSTSUPERSCRIPT for subscripts.

POSTFIX

for a postfix operator

UNKNOWN

an unknown expression. This is the default for token elements, and generates a warning if the unknown seems to be used as a function.

The following roles are not used in the grammar, but are used to capture the presentation style:

STACKED

corresponds to stacked structures, such as \atop, and the presentation of binomial coefficients.