The NIST Digital Repository of Mathematical Formulae and Scalable Math SearchMoritz Schubotz
Department of Software Engineering and Theoretical Computer Science, T.U. Berlin
Friday, July 24, 2015 15:30-16:30,
One initial goal for the NIST Digital Repository of Mathematical Formulae (DRMF) is to seed our digital compendium with fundamental orthogonal polynomial formulae. We have begun to use the data from the NIST Digital Library of Mathematical Functions (DLMF) as initial seed for our DRMF project, namely Chapter 25. The DLMF input LaTeX source already contains some semantic information encoded using a highly customized set of semantic LaTeX macros originally developed by Bruce Miller (ACMD). Those macros are converted to content MathML using LaTeXML. During that conversion, the semantics were translated to an implicit DLMF content dictionary. This year, we have developed a semantic enrichment process whose goal is to infer semantic information from generic LaTeX sources. The generated context-free semantic information is used to build DRMF formula home pages for each individual formula. We demonstrate this process using selected chapters from the book "Hypergeometric Orthogonal Polynomials and their q-Analogues" (2010) by Koekoek, Lesky and Swarttouw (KLS) as well as an actively maintained addendum to this book by Tom Koornwinder (KLSadd). The generic input KLS and KLSadd LaTeX sources describe the printed representation of the formulae, but does not contain explicit semantic information. See http://drmf.wmflabs.org.
In addition, we describe the past and current developments with regard to math search using the Apache Flink based open source project Mathosphere. Mathosphere is a generic multi purpose math search engine. Mathoid is capable of searching the huge arXiv corpora as demonstrated in the NTCIR 10 and NTCIR 11 math search competitions. Furthermore, Mathoid also interactively returns search results for the DRMF users.
Speaker Bio: Moritz Schubotz is research associate in the Database Systems and Information Management Group in the Department of Software Engineering and Theoretical Computer Science at Technische Universität Berlin. Currently he manages the Database Lab course, supports the E-Learning initiative at TU Berlin and aims to improve the situation for scientific employees in the self government at TU Berlin. In his spare time he works on a PhD thesis about making mathematical content searchable. His vision is to find instantiations of mathematical concepts independent of the concrete representation in human readable documents. Beside that, he is maintainer of the Math extension at Wikipedia. In that context, the goal is to use HTML5 (i.e. MathML) to integrate mathematical notation to wiki pages without including ugly PNG images. Moreover he is involved in the Digital Repository of Mathematical Formulae that represents context free semantically enriched formulae using MathML.
Contact: H. Cohl
Note: Visitors from outside NIST must contact Cathy Graham; (301) 975-3800; at least 24 hours in advance.