2 Using LaTeXML

§2.2 Basic Postprocessing

In the simplest situation, you have a single TeX source document from which you want to generate a single output document. The command

latexmlpost options --destination=doc.html doc

or similarly with --destination=doc.html4, --destination=doc.xhtml, will carry out a set of appropriate transformations in sequence:

  • scanning of labels and ids;

  • filling in the index and bibliography (if needed);

  • cross-referencing;

  • conversion of math;

  • conversion of graphics and picture environments to web format (png);

  • applying an XSLT stylesheet.

The output format affects the defaults for each step, and particularly, the XSLT stylesheet that is used, and is determined by the file extension of --destination, or by the option

--format=(html|html5|html4|xhtml|xml)

which overrides the extension used in the destination. The recognized formats are:

html or html5

math is converted to Presentation MathML, some ‘vector’ style graphics are converted to SVG, other graphics are converted to images; LaTeXML-html5.xslt is used. The file extension html is generates html5

html4

both math and graphics are converted to png images; LaTeXML-html4.xslt is used.

xhtml

math is converted to Presentation MathML, other graphics are converted to images; LaTeXML-xhtml.xslt is used.

xml

no math, graphics or XSLT conversion is carried out.

Of course, all of these conversions can be controlled or overridden by explicit options described below. For more details about less common options, see the command documentation latexmlpost, as well as Appendix H.

Scanning

The scanning step collects information about all labels, ids, indexing commands, cross-references and so on, to be used in the following postprocessing stages.

Indexing

An index is built from \index markup, if makeidx’s \printindex command has been used, but this can be disabled by

--noindex

The index entries can be permuted with the option

--permutedindex

Thus \index{term a!term b} also shows up as \index{term b!term a}. This leads to a more complete, but possibly rather silly, index, depending on how the terms have been written.

Bibliography

When a document contains a request for bibliographies, typically due to the \bibliography{..} command, the postprocessor will look for the named bibliographies. It first looks for preconverted bibliographies with the extention .bib.xml, otherwise it will look for .bib and convert it internally (the latter is a somewhat experimental feature).

If you want to override that search, for example using a bibliography with a different name, you can supply that filename using the option

--bibliography=bibfile.bib.xml

Note that the internal bibliography list will then be ignored. The bibliography would have typically been produced by running

latexml --dest=bibfile.bib.xml bibfile.bib

Note that the XML file, bibfile, is not used to directly produce an HTML-formatted bibliography, rather it is used to fill in the \bibliography{..} within a TeX document.

Cross-Referencing

In this stage, the scanned information is used to fill in the text and links of cross-references within the document. The option

--urlstyle=(server|negotiated|file)

can control the format of urls with the document.

server

formats urls appropriate for use from a web server. In particular, trailing index.html are omitted. (default)

negotiated

formats urls appropriate for use by a server that implements content negotiation. File extensions for html and xhtml are omitted. This enables you to set up a server that serves the appropriate format depending on the browser being used.

file

formats urls explicitly, with full filename and extension. This allows the files to be browsed from the local filesystem.

Math Conversion

Specific conversions of the mathematics can be requested using the options

--mathimages                   # converts math to png images,
--presentationmathml or --pmml # creates Presentation MathML
--contentmathml or --cmml      # creates Content MathML
--openmath or --om             # creates OpenMath
--keepXMath                    # preserves LaTeXML's XMath

(Each of these options can also be negated if needed, eg. --nomathimages) It must be pointed out that the Content MathML and OpenMath conversions are currently rather experimental.

If more than one of these conversions are requested, parallel math markup will be generated with the first format being the primary one, and the additional ones added as secondary formats. The secondary format is incorporated using whatever means the primary format uses; eg. MathML combines formats using m:semantics and m:annotation-xml.

Given the state of current browsers, when generating MathML it may be useful to conditionally include the MathJax library for rendering MathML in browsers that don’t support it natively. The following option will load MathJax into such browsers:

--javascript=LaTeXML-maybeMathJax.js

Graphics processing

Conversion of graphics (eg. from the graphic(s|x) packages’ \includegraphics) can be enabled or disabled using

--graphicsimages or --nographicsimages

Similarly, the conversion of picture environments can be controlled with

--pictureimages or --nopictureimages

An experimental capability for converting the latter to SVG can be controlled by

--svg or --nosvg

Stylesheets and Javascript

If you wish to provide your own XSLTCSS stylesheets or javascript programs, the options

--stylesheet=stylesheet.xsl
--css=stylesheet.css
--nodefaultcss
--javascript=program.js

can be used. The --css and --javascript options provide CSS stylesheets and javascript programs respectively; they can be repeated to include multiple files. In both cases, if a local file is referenced, it will be copied to the destination directory, but otherwise urls are accepted.

The core CSS stylesheet, LaTeXML.css, helps match the basic styling of LaTeX to HTML; certain bindings, such as amsart, automatically include additional stylesheets to better match the desired style. You can also request the inclusion of your own stylesheets from the commandline using --css option. Some sample CSS enhancements are included with the distribution:

LaTeXML-navbar-left.css

Places a navigation bar on the left.

LaTeXML-navbar-right.css

Places a navigation bar on the left.

LaTeXML-blue.css

Colors various features in a soft blue.

In cases where you wish to completely manage the CSS  the option --nodefaultcss causes only explicitly requested css files to be included.

Javascript files are included in the generated HTML by using the --javascript option. The distribution includes a sample LaTeXML-maybeMathjax.js which is useful for supporting MathML: it invokes MathJax11http://mathjax.org to render the mathematics in browsers without native support for MathML. Alternatively, you can invoke MathJax unconditionally, from the ‘cloud’ by using:

latexmlpost --format=html5 \
   --javascript='http://cdn.mathjax.org/mathjax/latest/MathJax.js \
   --destination=somewhere/doc.html doc

See 4.2.2 for more information on developing your own stylesheets. To develop CSS and XSLT stylesheets, a knowledge of the LaTeXML document type is also necessary; see Appendix I.