In the process of developing the Digital Library of Mathematical Functions, we needed a means of transforming the LaTeX sources of our material into XML which would be used for further manipulations, rearrangements and construction of the web site. In particular, a true ‘Digital Library’ should focus on the semantics of the material, and so we should convert the mathematical material into both content and presentation MathML. At the time, we found no software suitable to our needs, so we began development of LaTeXML in-house.
In brief, latexml is a program, written in Perl, that attempts to faithfully mimic TeX’s behaviour, but produces XML instead of dvi. The document model of the target XML makes explicit the model implied by LaTeX. The processing and model are both extensible; you can define the mapping between TeX constructs and the XML fragments to be created. A postprocessor, latexmlpost converts this XML into other formats such as HTML or XHTML, with options to convert the math into MathML (currently only presentation) or images.
Caveats: It isn’t finished, there are gaps in the coverage, particularly in missing implementations of the many useful LaTeX packages. But is beginning to stabilize and interested parties are invited to try it out, give feedback and even to help out.