LaTeXML The Manual

Chapter 1 Introduction

(November 17, 2020)


Some of the more detailed portions of this manual have not kept uptodate with the evolution of the code and style of LaTeXML, but rather than delay release, we’ll improve the documentation in a later update.

For many, LaTeX is the preferred format for document authoring, particularly those involving significant mathematical content and where quality typesetting is desired. On the other hand, content-oriented XML is an extremely useful representation for documents, allowing them to be used, and reused, for a variety of purposes, not least, presentation on the Web. Yet, the style and intent of LaTeX markup, as compared to XML markup, not to mention its programmability, presents difficulties in converting documents from the former format to the latter. Perhaps ironically, these difficulties can be particularly large for mathematical material, where there is a tendency for the markup to focus on appearance rather than meaning.

The choice of LaTeX for authoring, and XML for delivery were natural and uncontroversial choices for the Digital Library of Mathematical Functions. Faced with the need to perform this conversion and the lack of suitable tools to perform it, the DLMF project proceeded to develop thier own tool, LaTeXML, for this purpose.

Design Goals

The idealistic goals of LaTeXML are:

  • Faithful emulation of TeX’s behaviour;

  • Easily extensible;

  • Lossless, preserving both semantic and presentation cues;

  • Use an abstract LaTeX-like, extensible, document type;

  • Infer the semantics of mathematical content
    (Good Presentation MathML, eventually Content MathML and OpenMath).

As these goals are not entirely practical, even somewhat contradictory, they are implicitly modified by as much as possible. Completely mimicing TeX’s, and LaTeX’s, behaviour would seem to require the sneakiest modifications to TeX, itself; redefining LaTeX’s internals does not really guarantee compatibility. “Ease of use” is, of course, in the eye of the beholder; this manual is an attempt to make it easier! More significantly, few documents are likely to have completely unambiguous mathematics markup; human understanding of both the topic and the surrounding text is needed to properly interpret any particular fragment. Thus, while we’ll try to provide a “turn-key” solution that does the ‘Right Thing’ automatically, we expect that applications requiring high semantic content will require document-specific declarations and tuning to achieve the desired result. Towards this end, we provide a variety of means to customize the processing and declare the author’s intent. At the same time, especially for new documents, we encourage a more logical, content-oriented markup style, over a purely presentation-oriented style.

Overview of this Manual

Chapter 2 describes the usage of LaTeXML, along with common use cases and techniques. Chapter 3 describes the system architecture in some detail. Strategies for customization and implementation of new packages is described in Chapter 4. The special considerations for mathematics, including details of representation and how to improve the conversion, are covered in Chapter 5. Several specialized topics are covered in the remaining chapters. An overview of outstanding issues and planned future improvements are given in Chapter 9.

Finally, the Appendices give detailed documentation the system components: Appendix A describes the command-line programs provided by the system; Appendix B lists the LaTeX style packages for which we’ve provided LaTeXML-specific bindings. Appendix C describes the various Perl modules, in groups, that comprise the system. Appendix D describes the XML schema used by LaTeXML. Appendix E gives an overview of the warning and error messages that LaTeXML may generate. Appendix F describes the strategy and naming conventions used for CSS styling of the resulting HTML.

Using LaTeXML, and programming for it, can be somewhat confusing as one is dealing with several languages not normally combined, often within the same file, — Perl, TeX and XML (along with XSLT, HTML, CSS), plus the occasional shell programmming. To help visually distinguish different contexts in this manual we will put ‘programming’ oriented material (Perl, TeX) in a typewriter font, like this; XML material will be put in a sans-serif face like this.

If you encounter difficulties, there is a support mailing list at latexml-project. Bugs and enhancement requests can be reported at Github. If all else fails, please consult the source code, or the author.

Danger! When you see this sign, be warned that the material presented is somewhat advanced and may not make much sense until you have dabbled quite a bit in LaTeXML’s internals. Such advanced or ‘dangerous’ material will be presented like this paragraph to make it easier to skip over.