LaTeXML The Manual

Chapter 3 Architecture

As has been said, LaTeXML consists of two main programs: latexml responsible for converting the TeX source into XML; and latexmlpost responsible for converting to target formats. See Figure 3.1 for illustration.

The casual user needs only a superficial understanding of the architecture. The programmer who wants to extend or customize LaTeXML will, however, need a fairly good understanding of the process and the distinctions between text, Tokens, Boxes, Whatsits and XML, on the one hand, and Macros, Primitives and Constructors, on the other. In a way, the implementer of a LaTeXML binding for a LaTeX package may need a better understanding than when implementing for LaTeX since they have to understand not only the TeX-view, primarily just the macros and the intended appearance, but also the LaTeXML-view, with XML and representation questions, aw well.

Flow of data through

Figure 3.1: Flow of data through LaTeXML’s digestive tract.

The intention is that all semantics of the original document is preserved by latexml, or even inferred by parsing; latexmlpost is for formatting and conversion. Depending on your needs, the LaTeXML document resulting from latexml may be sufficient. Alternatively, you may want to enhance the document by applying third party programs before postprocessing.