MathML in CSS: Experimental Testbed

Overview

The purpose of this testbed is to investigate the capabilities of Cascading StyleSheets for rendering mathematical material generally, and MathML specifically. The main goal is to propose a modest set of extensions that would allow complying User Agents (browsers) to render MathML with reasonable fidelity, even those not supporting MathML natively. Ideally, this would be accomplished with extensions that have some general utility, avoiding single-purpose extensions that are tied to `quirks' of MathML. A further evolution of this exploration could propose additional styling properties for refining the presentation of math in browsers both with and without native MathML support.

The `folders' at left, or in the main frameless page, reveal many test cases collected by W3C that have been modified such that they are no longer in the MathML namespace, and should only be rendered by your browser according to an experimental mathml.css stylesheet (Compare this to the W3C MathML 2.0 test suite for MathML supporting browsers). As the current stylesheet uses only features of CSS 2.1, which is missing significant features needed for mathematics, the display is often rather poor (or surprisingly good, depending on your expectations and browser). Thus, additional capabilities are needed and these are discussed below, as well as in annotations within the stylesheet itself.

Of course, some failures may be due to my own misunderstanding of the specifications, as well as implementation bugs; comments and suggestions are welcomed.

General Observations

One particularly troubling aspect of math presentation using CSS is that CSS is considered to be `styling' and hence optional. Normally, the meaning is not expected to change depending on whether a styling hint is ignored. Mathematics, on the other hand, relies heavily on the styling to convey meaning. If, for example, the surd on a square root is handled by applying a special border to the base, its omission, or inadvertent override by an author putting a frame on the base, can be critical; the expression no longer looks like a square root! Even the choice of bold or italic fonts for a given character can be significant. Other than to bring attention to this issue, we propose no solution for it here.

Another general problem area is the sufficiency of fonts, encountered even by users of MathML supporting browsers. Faithful presentation of math requires a fairly large subset of Unicode glyphs in several fonts. In particular, font families are required for fraktur, script and double-struck fonts in addition to the generic families assumed by CSS. Again, we simply bring attention to the issue here.

Beyond that, the most essential features needed for presenting mathematical material missing from CSS are:

Stretchiness of characters, ideally both horizontally and vertically, especially parenthesis and brackets.
better control of relative vertical alignment of substructures.
Additional `decorations' of substructures (such as with border styles) to enclose them with lines, surds, etc.

(toward) Specific Proposals

Of the missing capabilities uncovered by the testbed, some can be adequately covered by features being proposed for CSS3. Others require new features, and others still simply do not seem feasable within the computational model of CSS. The following sections address these sets.

Needed Features of Current CSS3 Proposals

The current CSS3 proposals contain features that would be essential for handling MathML, assuming I've used them properly; it is hoped that these features, or similar ones, will be retained in the final proposals.

`attr()` in property values

This function allows attribute values from the DOM to be used for CSS property values. Since many MathML styling attributes were specifically designed to be compatible with CSS values, this function allows us to `copy' them to CSS. For example:

  *[mathcolor] { color: attr(mathcolor,color); }

However, see the proposed extension below.

Generated and Replaced Content Module

Two pseudo-elements for selectors defined in this module seem particularly useful.

:alternate

Although I'm not completetly sure I've grasped either the syntax or semantics, this capability is essential in that it allows rearranging the order of presenting children of elements. For example, while the order of the children of mroot seems reasonable from a semantic point of view

  <mroot> base index </mroot>

the order is wrong for presentation. The :alternate selector allows us to defer presentation of the base, using rules something like this:

  mroot>*:nth-child(1) { 
     content:""; }
  mroot>*:nth-child(1)::alternate { 
     move-to: mroot; }
  mroot>*:nth-child(2) { 
     font-size:smaller; vertical-align:super; }
  mroot>*:nth-child(2)::after { 
     content: pending(mroot); }
  msqrt>*:nth-child(2)::after::outside { 
     display:inline; border-style: solid none none surd; }

(where we've also used the proposed borders extensions; we've also ignored the finer positioning of the index). The pre-sub- and pre-superscripts of mmultiscripts can be handled with similar, but messier, rules.

:outside

This feature generates a (virtual?) containing element around selected elements that can get separate CSS properties. It can be used to avoid conflicting property values in several cases. Consider a simplified set of rules for mfrac.

  mfrac   { display:inline-table; }
  mfrac>* { display:table-row; }

A fraction nested within another will want to have its display set to both inline-table and table-row! Similar problems can occur if border properties are used to add the fraction line or surds; even by an author framing a specific subtree using the border property. The :outside pseudo-element can be used to avoid this by the modifying the rules as:

  mfrac            { display:inline-table; }
  mfrac>*::outside { display:table-row; }

Now, the numerator and denominator will be formatted according to thier own element tags, but within a table-row, as desired.

Proposed Features of Math-CSS

The following subsections give a sketch and some justification of additional capabilities that would be useful. As usual, `there are more than one way to do it' (Larry Wall), so these proposals may not be the optimum. For the most part, these proposals would fit in better as modifications of the current modules, rather than as a separate math module.

Extension of `attr()` to ancestors

Unfortunately, as it is defined, attr(name) only can access the name attribute of the current node; an extremely useful extension would be to allow access to the parent's, or other ancestor's, attributes. A simple proposal for the syntax of this would be attr(../name), attr(../../name), and so on. For example, consider handling explicit lengths for the line thickness on an mfrac. A rule that assigns a bottom border to the numerator could be written as:

  mfrac[linethickness]>*:first-child {
      border-bottom-width: attr(../linethickness); }

Generalizations of the `contains()` pseudo-class

This pseudo-class selects elements based on whether the text content of the element contains a given substring. The specification says that the argument "can be a string (surrounded by double quotes) or a keyword", but doesn't clarify what the set of keywords are. Two variations of, or alternatives to, this pseudo-class would find use in our stylesheets:

single character content: is necessary to select the case of mi containing a single character of content (This case normally should appear in italic rather than upright font).
exact matching content: would select those elements whose text content matches exactly, rather than simply that it contains the text; this would be useful for defining (primitive) analogues of operator dictionaries.

Both of these capabilities would easily be achieved with minimal regular-expressions (eg. contains('^.$') and contains('^foo$'), respectively, using perl-style regexps) Although incorporating full regular expression matching would probably be welcomed by authors, it would probably not be welcomed by implementors. Otherwise, appropriate names for these selectors are needed.

Border Extensions

CSS borders have the right functionallity for roots and notations, but several new border styles are needed. The additional borders styles include:

  surd              tilde
  left-parenthesis  right-parenthesis
  left-brace        right-brace 
  over-brace        under-brace
  left-bracket      right-bracket
  left-ceiling      right-ceiling
  left-floor        right-floor
  left-angle        right-angle
  hat   
  slash             backslash
  left-arrow        right-arrow        left-right-arrow
  up-arrow          down-arrow         up-down-arrow
  double-uparrow    double-down-arrow  double-up-down-arrow

although some of these only make sense in certain positions.

[An alternative, if slightly perverse, proposal would be to simply allow any unicode character as a border-style, meaning to use that glyph, stretched in the appropriate direction.]

With this proposal, square roots and other enclosing notations could be handled with the rules:

  mroot { 
     border-style: solid none none surd; }
  menclose[notation="radical"] { 
     border-style: solid none none surd; }
  menclose[notation="longdiv"] { 
     border-style: solid none none left-parenthesis; }

where we are expecting the border parts to meet at the corners. [This might be rather peculiar in the case of left-parenthesis.]

In fact, given an additional selector, see has() below, most cases of stretchy characters, such as parenthesized mrows, can be handled this way, as well.

Extension to selector combination `has()`

This proposal would allow a selector to select an element which has children that match a given selector, tentatively called has() (or alternatively, suchthat() or some other syntactic construct). Thus,

   A has(B)

with A and B being arbitrary selectors, would select all elements A that have children matching B, whereas the conventional

   A > B

matches elements B which are children of A. The key point is that in the proposal A is the subject of the selector.

In combination with other proposals here, a typical simple operator dictionary entry would be expressed as follows:

  mrow has(mo[contains('^($')]:first-child) { 
     border-left-style:left-parenthesis; }
  mrow > mo[contains('^($')]:first-child {
     display:none; }

This makes the first parenthesis invisible, but applies a parenthesis border to the containing mrow.

An alternative would be for rules to provide a way to assign properties to parents of the subject, rather than only to the subject itself.

Alternative for Stretchy Characters

As an alternative to `characters as borders', stretchy characters could also be achieved by two extensions. The first is an extension to box sizing that allows the size of a box to be specified as growing to the height or width of its containiner (there are various ways this could be specified).

This alternative would seem to be more difficult to implement, but doesn't require the has extension to selectors. It may also have interesting uses outside mathematics.

Vertical Alignment

It is clear from the testbed examples with multiple fractions and sub/superscripts on varying sized bases that there are problems with vertical alignment. Part of the problem seems to be that where the baseline of a box is is underspecified, and part seems to be that implementations have bugs. [According to CSS2 (next to last paragraph on page), the baseline of an inline-table should be the baseline of the first row. However, neither of the browsers I've tried respect this.]

Fraction alignment

A fraction should normally be aligned such that the bar of the fraction is aligned with the mathematical middle (slightly higher than half the ex-height; CSS3 LineBox module defines a separate mathematical basesline) of surrounding material. Using the CSS3 model, it would seem that what we want is to align the after-edge of the first row of the fraction with the mathematical baseline of the parent. It would seem that the following rules would achieve this, although I could easily have missinterpreted.

  mfrac { 
     display:inline-table; 
     inline-box-align:initial; 
     alignment-baseline:mathematical; }
  mfrac > * { 
     display:table-row; }
  mfrac > *:first-child { 
     alignment-adjust: after-edge; }

(assuming inline-box-align applies to inline-tables?)

Superscript alignment

Apparently, the subscript and superscript positions are simply defined by a fraction of the font height of the containing line, effectively moving them up or down a fixed amount without taking account of the height of the preceding box (eg a fraction or matrix).

Again, it may be that CSS3 LineBox module provides better control. Positioning a superscript by using

  msup { 
    display:inline-block; }
  msup>*:first-child + *  { 
    display:inline; 
    baseline-shift:80%; 
    font-size:smaller; }

(or whatever percentage) would seem more correct, because the shift determined by baseline-shift is computed from the line-height of the parent, rather than the height of the font.

Have I interpreted this right?

Still thinking...

A few points require further thought.

Content markup

can simply be hidden. But, note that a hidden outer element hides all inner content, no matter what CSS properties they have. Check that we can still work around the expected forms of parallel markup.

Displaystyle

An additional property, with the usual inheritance, could establish the displaystyle, but making use of its value is less clear. Defining a whole new set of display types just for the few that would make use of the property seems overkill. A simpler solution would be if it were possible for selectors to make use of the current value of a property (say a property()), such that the a few rules could cover the implications of displaystyle. Unfortunately, a property() selector may not fit well into CSS's computational model; it allows for loops such as:

   *:property(foo)='true' { foo:false; }

Bad News

Some features of MathML seem unlikely to be handled in a pure CSS framework, either because thier treatment doesn't seem to fit into the CSS computational model, or because they are be so specific and single purposed. Or, possibly, I simply haven't recognized a good approach.

mglyph: Is awkward and seems to need internal access to fonts.
maction: Most (or the most interesting; namely toggle) would seem to require connection to Javascript
mfenced separators: The general case of multiple separators would seem to require an iterative approach.
table rules: The general forms for specifying the rules on tables suffers from the same problem as separators on mfenced.

Contact Bruce Miller

Acknowledgements: This testbed has benefitted from input from Tim Boland, David Carlisle, George Chavchadnidze, Ian Hickson (so far).

Note: This testbed is derived from the W3C MathML 2.0 test suite. (See the relevant copyright.)