2 Using LaTeXML

§ 2.4 Site processing

A more complicated situation combines several TeX sources into a single interlinked site consisting of multiple pages and a composite index and bibliography.

Conversion

First, all TeX sources must be converted to XML, using latexml. Since every target-able element in all files to be combined must have a unique identifier, it is useful to prefix each identifier with a unique value for each file. The latexml option --documentid=id provides this.

Scanning

Secondly, all XML files must be split and scanned using the command

   latexmlpost --prescan --dbfile=DB --dest=i.html i

where DB names a file in which to store the scanned data. Other conversions, including writing the output file, are skipped in this prescanning step.

Pagination

Finally, all XML files are cross-referenced and converted into the final format using the command

     latexmlpost --noscan --dbfile=DB --dest=i.html i

which skips the unnecessary scanning step.

For example, consider a set of nominally stand-alone LaTeX documents: main (with title page, \tableofcontents, etc), A (with a chapter), Aa (with a section), B (with a chapter), …and bib (with a \bibliography). Assume that the documents use \lxDocumentID from \usepackage{latexml} to declare ids main, main.A, \main.A.a, main.B, …bib, respectively. And, of course, you’ll have to arrange for appropriate counters to be initialized appropriately, if needed.

Now, processing the documents with the following commands

# Conversion
latexml --dest=main.xml main.tex
latexml --dest=A.xml    A
latexml --dest=Aa.xml   Aa
latexml --dest=B.xml    B
                      
latexml --dest=bib.xml  bib
# Scan
latexmlpost --prescan --db=my.db --dest=/site/main.html main
latexmlpost --prescan --db=my.db --dest=/site/A.html    A
latexmlpost --prescan --db=my.db --dest=/site/Aa.html   Aa
latexmlpost --prescan --db=my.db --dest=/site/B.html    B
                      
latexmlpost --prescan --db=my.db --dest=bib.html        bib
# Pagination
latexmlpost --noscan --db=my.db --dest=/site/main.html main
latexmlpost --noscan --db=my.db --dest=/site/A.html    A
latexmlpost --noscan --db=my.db --dest=/site/Aa.html   Aa
latexmlpost --noscan --db=my.db --dest=/site/B.html    B
                      
latexmlpost --noscan --db=my.db --dest=bib.html        bib

This will result in a site built at /site/, with the following implied structure:

main.html
  A.html
    Aa.html
  B.html
    ...
  bib.html