4.1 LaTeXML Customization

§ 4.1.2 Digestion & Primitives

Primitives are processed during the digestion phase in the Stomach, after macro expansion (in the Gullet), and before document construction (in the Document). Our primitives generalize TeX’s notion of primitive; they are used to implement TeX’s primitives, invoke other side effects and to convert Tokens into Boxes, in particular, Unicode strings in a particular font.

Here are a few primitives from TeX.pool:

  DefPrimitive(’\begingroup’,sub {
    $_[0]->begingroup; });
  DefPrimitive(’\endgroup’,  sub {
    $_[0]->endgroup; });
  DefPrimitiveI(’\batchmode’,     undef,undef);
  DefPrimitiveI(’\OE’, undef, "\x{0152}");
  DefPrimitiveI(’\tiny’,        undef, undef,

Other than for implementing TeX’s own primitives, DefPrimitive is needed less often than DefMacro or DefConstructor. The main thing to keep in mind is that primitives are processed after macro expansion, by the Stomach. They are most useful for side-effects, changing the State.


The replacement is either a string which will be used to create a Box in the current font, or can be code taking the Stomach and the control sequence arguments as argument; like macros, these arguments are not expanded or digested by default, they must be explicitly digested if necessary. The replacement code must either return nothing (eg. ending with return;) or should return a list (ie. a Perl list (...)) of digested Boxes or Whatsits.

Options to DefPrimitive are:

  • mode=>(’math’|’text’) switches to math or text mode, if needed;

  • requireMath=>1, forbidMath=>1 requires, or forbids, this primitive to appear in math mode;

  • bounded=>1 specifies that all digestion (of arguments and daemons) will take place within an implicit TeX group, so that any side-effects are localized, rather than affecting the global state;

  • font=>{hash} switches the font used for any created text; recognized font keys are family, series, shape, size, color;

    Note that if the font change should only affect the material digested within this command itself, then bounded=>1 should be used; otherwise, the font change will remain in effect after the command is processed.

  • beforeDigest=>CODE($stomach),
    afterDigest=>CODE($stomach) provides code to be digested before and after processing the main part of the primitive.


Needs descrition!

Other Utilities for Digestion

Other functions useful for dealing with digestion and state are important for writing before & after daemons in constructors, as well as in Primitives; we give an overview here:

  • Digest($tokens) digests $tokens (a (LaTeXML::Core::)Tokens), returning a list of Boxes and Whatsits.

  • Let($token1,$token2) gives $token1 the same meaning as $token2, like \let.


The following functions are useful for accessing and storing information in the current State. It maintains a stack-like structure that mimics TeX’s approach to binding; braces { and } open and close stack frames. (The Stomach methods bgroup and egroup can be used when explicitly needed.)

  • LookupValue($symbol), AssignValue($string,$value,$scope) maintain arbitrary values in the current State, looking up or assigning the current value bound to $symbol (a string). For assignments, the $scope can be ’local’ (the default, if $scope is omitted), which changes the binding in the current stack frame. If $scope is ’global’, it assigns the value globally by undoing all bindings. The $scope can also be another string, which indicates a named scope — but that is a more advanced topic.

  • PushValue($symbol,$value,...), PopValue($symbol),
    UnshiftValue($symbol,$value,...), ShiftValue($symbol) These maintain the value of $symbol as a list, with the operatations having the same sense as in Perl; modifications are always global.

  • LookupCatcode($char), AssignCatcode($char,$catcode,$scope) maintain the catcodes associated with characters.

  • LookupMeaning($token), LookupDefinition($token) looks up the current meaning of the token, being any executable definition bound for it. If there is no such defniition LookupMeaning returns the token itself, LookupDefinition returns undef.


The following functions maintain LaTeX-like counters, and generally also associate an ID with them. A counter’s print form (ie. \theequation for equations) often ends up on the refnum attribute of elements; the associated ID is used for the xml:id attribute.

  • NewCounter($name,$within,options), creates a LaTeX-style counters. When $within is used, the given counter will be reset whenever the counter $within is incremented. This also causes the associated ID to be prefixed with $within’s ID. The option idprefix=>$string causes the ID to be prefixed with that string. For example,

      NewCounter(’section’, ’document’, idprefix=>’S’);
      NewCounter(’equation’,’document’, idprefix=>’E’,

    would cause the third equation in the second section to have ID=’S2.E3’.

  • CounterValue($name) returns the Number representing the current value.

  • ResetCounter($name) resets the counter to 0.

  • StepCounter($name) steps the counter (and resets any others ‘within’ it), and returns the expansion of \the$name.

  • RefStepCounter($name) steps the counter and any ID’s associated with it. It returns a hash containing refnum (expansion of \the$name) and id (expansion of \the$name@ID)

  • RefStepID($name) steps the ID associated with the counter, without actually stepping the counter; this is useful for unnumbered units that normally would have both a refnum and ID.