5.1 Math Details

§ 5.1.2 Grammatical Roles

As mentioned above, the grammar take advantage of the structure (however minimal) of the markup. Thus, the grammer is applied in layers, to sequences of tokens or atomic subexpressions (like a fractions or arrays). It is the role attribute that indicates the syntactic and/or presentational nature of each item. On the one hand, this drives the parsing: the grammar rules are keyed on the role (say, ADDOP), rather than content (say + or -), of the nodes [In some cases, the content is used to distinguish special synthesized roles]. The role is also used to drive the conversion to presentation markup, (say, as an infix operator), especially Presentation MathML. Some values of role are used only in the grammar, some are only used in presentation; most are used both ways.

The following grammatical roles are recognized by the math parser. These values can be specified in the role attribute during the initial document construction or by rewrite rules. Although the precedence of operators is loosely described in the following, since the grammar contains various special case productions, no rigidly ordered precedence is given. Also note that in the current design, an expresssion has only a single role, although that role may be involved in grammatical rules with distinct syntax and semantics (some roles directly reflect this ambiguity).


a general atomic subexpression (atomic at the level of the expression; it may have internal structure);


a variable-like token, whether scalar or otherwise, but not a function;


a number;


a structure with internal components and alignments; typically has a particular syntactic relationship to OPEN and CLOSE tokens.


an unknown expression. This is the default for token elements. Such tokens are treated essential as ID, but generate a warning if it seems to be used as a function.


opening and closing delimiters, group expressions or enclose arguments among other structures;


a middle operator used to group items between an OPEN, CLOSE pair;


punctuation; a period ‘ends’ formula (note that numbers, including floating point, are recognized earlier in processing);


a vertical bar (single or doubled) which serves a confusing variety of notations: absolute values, “at”, divides;


a relational operator, loosely binding;


an arrow operator (with little semantic significance), but generally treated equivalently to RELOP;


an operator used for relations between relations, with lower precedence;


an atomic expression following an object that ‘modifies’ it in some way, such as a restriction (<0) or modulus expression;


an operator (such as mod) between two expressions such that the latter modifies the former;


an addition operator, between RELOP and MULOP operators in precedence;


a multiplicative operator, high precedence than ADDOOP;


a generic infix operator, can act as either an ADDOP or MULOP, typically used for cases wrapped in \mathbin;


An operator appearing in a superscript, such as a collection of primes, or perhaps a T for transpose. This is distinct from an expression in a superscript with an implied power or index operator;


for a prefix operator;


for a postfix operator;


a function which (may) apply to following arguments with higher precedence than addition and multiplication, or to parenthesized arguments (enclosed between OPEN,CLOSE);


a variant of FUNCTION which doesn’t require fenced arguments;


a variant of OPFUNCTION with special rules for recognizing which following tokens are arguments and which are not;


an explicit infix application operator (high precedence);


an infix operator that composes two FUNCTION’s (resulting in another FUNCTION);


a general operator; higher precedence than function application. For example, for an operator A, and function F, AFx would be interpretted as (A(F))(x);


a summation/union, integral, limiting, differential or general purpose operator. These are treated equivalently by the grammar, but are distinguished to facilitate (eventually) analyzing the argument structure (eg bound variables and differentials within an integral). Note are SUMOP and LIMITOP significantly different in this sense?


intermediate form of sub- and superscript, roughly as TeX processes them. The script is (essentially) treated as an argument but the base will be determined by parsing.


A special case for a sub- and superscript on an empty base, ie. {}^{x}. It is often used to place a pre-superscript or for non-math uses (eg. 10${}^{th});

The following roles are not used in the grammar, but are used to capture the presentation style; they are typically used directly in macros that construct structured objects, or used in representing the results of parsing an expression.


corresponds to stacked structures, such as \atop, and the presentation of binomial coefficients.


after parsing, the operator involved in various sub/superscript constructs above will be comverted to these;


these are special cases of the above that indicate the 2nd operand acts as an accent (typically smaller), expressions using these roles are usually directly constructed for accenting macros;


this operator is used to represent containers enclosed by OPEN and CLOSE, possibly with punctuation, particularly when no semantic is known for the construct, such as an arbitrary list.

The content of a token is actually used in a few special cases to distinguish distinct syntactic constructs, but these roles are not assigned to the role attribute of expressions:


recognizes use of < and > in the bra-ket notation used in quantum mechanics;


recognizes use of { and } on either side of stacked or array constructions representing various kinds of cases or choices;


recognizes the use of { in opening specialized set notations.