LaTeXML The Manual

Chapter 3. Architecture

Like TeX, LaTeXML is data-driven: the text and executable control sequences (ie. macros and primitives) in the source file (and any packages loaded) direct the processing. The user exerts control over the conversion, and customizes it, by providing alternative LaTeXML-specific implementations of the control sequences and packages, by declaring properties of the desired document structure, and by defining rewrite rules to be applied to the constructed document tree.

Figure 3.1. Flow of data through LaTeXML's digestive tract.

The top-level class, LaTeXML, manages the processing, providing several methods for converting a TeX document or string into an XML document, with varying degrees of postprocessing and optionally writing the document to file. A LaTeXML::State object maintains the current state of processing, current definitions for control sequences and emulates the TeX's scoping rules. The processing is broken into the following stages

Digestion

the TeX-like digestion phase which converts the input into boxes.

Construction

converts the resulting boxes into an XML DOM.

Rewriting

applies rewrite rules to modify the DOM.

Math Parsing

parses the tokenized mathematics.

See Figure 3.1 for illustration. The first three stages are discussed in the following sections; the parsing of mathematics is covered in detail in Chapter 5.

The LaTeXML object binds $STATE, $GULLET, $STOMACH, and $MODEL to corresponding active objects during processing.

Contents