2 Using LaTeXML

§ 2.1 Basic XML Conversion

The command

latexml {options} --destination=doc.xml doc

converts the TeX document doc.tex, or standard input if - is used in place of the filename, to XML. It loads any required definition bindings (see below), reads, tokenizes, expands and digests the document creating an XML structure. It then performs some document rewriting, parses the mathematical content and writes the result, in this case, to doc.xml; if no --destination is suppplied, it writes the result to standard output. For details on the processing, see Chapter 3, and Chapter 5 for more information about math parsing.

BibTeX processing

If the source file has an explicit extension of .bib, or if the --bibtex option is used, the source will be treated as a BibTeX database. See 2.2 for how BibTeX files are included in the final output.

Note that the timing is different than with BibTeX and LaTeX. Normally, BibTeX simply selects and formats a subset of the bibliographic entries according to the .aux file; all TeX expansion and processing is carried out only when the result is included in the main LaTeX document. In contrast, latexml processes and expands the entire bibliography, including any TeX markup within it, when it is converted to XML; the selection of entries is done during postprocessing. One implication is that latexml does not know about packages included in the main document; if the bibliography uses macros defined in such packages, the packages must be explicitly specified using the --preload option.

Useful Options

The number and detail of progress and debugging messages printed during processing can be controlled using

--verbose or --quiet

They can be repeated to get even more or fewer details.

Directories to search (in addition to the working directory) for various files can be specified using

--path={directory}

This option can be repeated.

Whenever multiple sources are being used (including multiple bibliographies), the option

--documentid=id

should be used to provide a unique ID for the document root element. This ID is used as the base for id’s of the child-elements within the document, so that they are unique, as well.

See the documentation for the command latexml for less common options.

Loading Bindings

Although LaTeXML is reasonably adept at processing TeX macros, it generally benefits from having its own implementation of the macros, primitives, environments and other control sequences appearing in a document because these are what define the mapping into XML. The LaTeXML-analogue of a style or class file we call a LaTeXML-binding file, or binding for short; these files have an additional extension .ltxml.

In fact, since style files often bypass structurally or semantically meaningful macros by directly invoking macros internal to LaTeX, LaTeXML actually avoids processing style files when a binding is unavailable. The option

--includestyles

can be used to override this behaviour and allow LaTeXML to (attempt to) process raw style files. [A more selective, per-file, option may be developed in the future, if there is sufficient demand — please provide use cases.]

LaTeXML always starts with the TeX.pool binding loaded, and if LaTeX-specific commands are recognized, LaTeX.pool as well. Any input directives within the source loads the appropriate binding. For example, \documentclass{article} or \usepackage{graphicx} will load the bindings article.cls.ltxml or graphicx.sty.ltxml, respectively; the obsolete directive \documentstyle is also recognized. An \input directive will search for files with both .tex and .sty extensions; it will prefer a binding file if one is found, but will load and digest a .tex if no binding is found. An \include directive (and related ones) search only for a .tex file, which is processed and digested as usual.

There are two mechanisms for customization: a document-specific binding file doc.latexml will be loaded, if present; the option

--preload=binding

will load the binding file binding.ltxml. The --preload option can be repeated; both kinds of preload are loaded before document processing, and are processed in order.

See Chapter 4 for details about what can go in these bindings; and Appendix B for a list of bindings currently included in the distribution.