2 Using LaTeXML

§ 2.2 Basic Postprocessing

In the simplest situation, you have a single TeX source document from which you want to generate a single output document. The command

latexmlpost options --destination=doc.html doc

or similarly with --destination=doc.html4, --destination=doc.xhtml, will carry out a set of appropriate transformations in sequence:

  • scanning of labels and ids;

  • filling in the index and bibliography (if needed);

  • cross-referencing;

  • conversion of math;

  • conversion of graphics and picture environments to web format (png);

  • applying an XSLT stylesheet.

The output format affects the defaults for each step, and particularly, the XSLT stylesheet that is used, and is determined by the file extension of --destination, or by the option

--format=(html|html5|html4|xhtml|xml)

which overrides the extension used in the destination. The recognized formats are:

html or html5

math is converted to Presentation MathML, some ‘vector’ style graphics are converted to SVG, other graphics are converted to images; LaTeXML-html5.xslt is used. The file extension html is generates html5

html4

both math and graphics are converted to png images; LaTeXML-html4.xslt is used.

xhtml

math is converted to Presentation MathML, other graphics are converted to images; LaTeXML-xhtml.xslt is used.

xml

no math, graphics or XSLT conversion is carried out.

Of course, all of these conversions can be controlled or overridden by explicit options described below. For more details about less common options, see the command documentation latexmlpost, as well as Appendix H.

Scanning

The scanning step collects information about all labels, ids, indexing commands, cross-references and so on, to be used in the following postprocessing stages.

Indexing

An index is built from \index markup, if makeidx’s \printindex command has been used, but this can be disabled by

--noindex

The index entries can be permuted with the option

--permutedindex

Thus \index{term a!term b} also shows up as \index{term b!term a}. This leads to a more complete, but possibly rather silly, index, depending on how the terms have been written.

Bibliography

When a document contains a request for bibliographies, typically due to the \bibliography{..} command, the postprocessor will look for the named bibliographies. It first looks for preconverted bibliographies with the extention .bib.xml, otherwise it will look for .bib and convert it internally (the latter is a somewhat experimental feature).

If you want to override that search, for example using a bibliography with a different name, you can supply that filename using the option

--bibliography=bibfile.bib.xml

Note that the internal bibliography list will then be ignored. The bibliography would have typically been produced by running

latexml --dest=bibfile.bib.xml bibfile.bib

Note that the XML file, bibfile, is not used to directly produce an HTML-formatted bibliography, rather it is used to fill in the \bibliography{..} within a TeX document.

Cross-Referencing

In this stage, the scanned information is used to fill in the text and links of cross-references within the document. The option

--urlstyle=(server|negotiated|file)

can control the format of urls with the document.

server

formats urls appropriate for use from a web server. In particular, trailing index.html are omitted. (default)

negotiated

formats urls appropriate for use by a server that implements content negotiation. File extensions for html and xhtml are omitted. This enables you to set up a server that serves the appropriate format depending on the browser being used.

file

formats urls explicitly, with full filename and extension. This allows the files to be browsed from the local filesystem.

Math Conversion

Specific conversions of the mathematics can be requested using the options

--mathimages                   # converts math to png images,
--presentationmathml or --pmml # creates Presentation MathML
--contentmathml or --cmml      # creates Content MathML
--openmath or --om             # creates OpenMath
--keepXMath                    # preserves LaTeXML’s XMath

(Each of these options can also be negated if needed, eg. --nomathimages) It must be pointed out that the Content MathML and OpenMath conversions are currently rather experimental.

If more than one of these conversions are requested, parallel math markup will be generated with the first format being the primary one, and the additional ones added as secondary formats. The secondary format is incorporated using whatever means the primary format uses; eg. MathML combines formats using m:semantics and m:annotation-xml.

Given the state of current browsers, you may wish to use a polyfill such as MathJax to support MathML on more platforms. See the example in 2.2 for one way to do it.

Graphics processing

Conversion of graphics (eg. from the graphic(s|x) packages’ \includegraphics) can be enabled or disabled using

--graphicsimages or --nographicsimages

Similarly, the conversion of picture environments can be controlled with

--pictureimages or --nopictureimages

An experimental capability for converting the latter to SVG can be controlled by

--svg or --nosvg

Stylesheets and Javascript

If you wish to restyle the generated HTML either by adding CSS or by customizing the XSLT, change its functionality by adding javascript, or even generate an alternative output format with XSLT, some combination of the following options will be useful.

--nodefaultresources          # Omits the default resources (css..)
--css=stylesheet.css        # Adds a new CSS stylesheet
--javascript=program.js     # Adds a Javascript
--stylesheet=stylesheet.xsl # Uses an alternative XSLT stylesheet
--xsltparameter=name:value # Sets an XSLT parameter

All but --stylesheet can be repeated to include multiple files or set multiple parameters. When a local CSS or javascript file is included, it will be copied to the destination directory, but otherwise urls are accepted.

The core CSS stylesheet, LaTeXML.css, along with certain styles or classes (article, report, book, amsart) which add stylesheets automatically, helps match the styling of LaTeX to HTML. You can also request the inclusion of your own stylesheets from the commandline using --css option. Some sample CSS enhancements are included with the distribution:

LaTeXML-navbar-left.css

Places a navigation bar on the left.

LaTeXML-navbar-right.css

Places a navigation bar on the left.

LaTeXML-blue.css

Colors various features in a soft blue.

In cases where you wish to completely manage the CSS  the option --nodefaultcss causes only explicitly requested (command-line) css files to be included.

Javascript files are included in the generated HTML by using the --javascript option. The distribution includes a sample LaTeXML-maybeMathjax.js which is useful for supporting MathML: it invokes MathJax11 1 http://mathjax.org to render the mathematics in browsers without native support for MathML.

--javascript=LaTeXML-maybeMathJax.js

The option can also reference a remote script; for example to invoke MathJax unconditionally from the ‘cloud’:

latexmlpost --format=html5 \
   --javascript=’https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js?config=MML_CHTML \
   --destination=somewhere/doc.html doc

See 4.2.2 for more information on developing your own stylesheets. To develop CSS and XSLT stylesheets, a knowledge of the LaTeXML document type is also necessary; see Appendix I.

Individual XSLT stylesheets may have parameters that can customize the conversion from LaTeXML’s XML to the target format. An obscure example is

--xsltparameter=SIMPLIFY_HTML:true

which causes a ‘simpler’ HTML to be generated. Generally, LaTeXML’s HTML relies on CSS to recreate the appearance of many features of LaTeX, but this sometimes results in somewhat convoluted HTML that may not be ideal in situations where CSS is not available. This parameter ‘dumbs down’ itemizations and enumerations by ignoring any custom item labels or numbers.