# A Proposed Standard for Representing Numerical Information & A Mathematical Foundation for Networks with Clustering Analysis

Joseph E. Johnson
Department of Physics and Astronomy, University of South Carolina

Tuesday, January 13, 2015 13:00-14:00,
Building 101, Lecture Room F
Gaithersburg
Tuesday, January 13, 2015 11:00-12:00,
1-4058
Boulder

Abstract:

Numerical information is currently represented with the numerical values separated in documents and tables from the associated units, accuracy level (uncertainty), and defining metadata. While readable by humans, the lack of a standardization that integrates value_accuracy_units_metadata frustrates machine readability and leads to errors, ambiguities, and time delays resulting from human ‘preprocessing’. We propose a standard representation of numerical information that optimally integrates these four components of a number with the requirement that all numerical data be readable by humans and by machines without ambiguity. We call this data string a “MetaNumber”. We have developed three algorithms that manage the dimensional analysis, error propagation, and unlimited metadata descriptors and (using Python) encoded these algorithms on a high speed server as a multiuser system. With a limited start-up funding of \$100K, a team of 10 senior faculty and 12 students at USC are exploring this system for computation and data sharing. One aspect of this research is that such standardized data is encoded in a form that gives a unique internet path and therewith a unique name, to every numerical value. More importantly it provides, by association, the links to unlimited supplemental metadata descriptors in the network without the need for their transport in computation. Each user’s computations are logged and thus have a reference name for inclusion in future calculations. This feature provides an exact computation history (evolution) for each number from inception, in sensors and measurements, to their computed multigenerational integration as scientific information. In parallel to the work above, the author’s past research developed an innovative decomposition of the general linear Lie algebra and Lie group in ‘$n$’ dimensions, into the outer product of an$n$-dimensional Abelian scaling Lie group, and an$n(n-1)$dimensional ‘Markov type’ Lie group that preserves the sum of the elements of a vector. A restriction on the Lie algebra parameters to non-negativity was shown to lead to a Lie monoid of all continuous Markov transformations thus connecting all of Markov theory to the theory of Lie algebras and groups. Under a subsequent \$2.5M DARPA grant, the author proved that every possible network (as defined by a square $n \times n$ matrix of non-negative values with a missing diagonal) $C_{ij}$ was exactly isomorphic to the Lie algebra that generated all continuous Markov transformations. However in that Markov monoid, the diagonal is automatically defined! Thus when one is given any network, $C_{ij}$ , one can generate a one parameter family of Markov transformations that can be shown to model the flow of a conserved substance (such as money or substance). That follows because the Markov matrix preserves the sum of the elements of a vector. With this foundation, we have now proved that any network can be expanded (with no loss of information) in powers of the Renyi entropies of the rows and columns. Furthermore the Renyi entropies associated with each node can be sorted by magnitude to provide a unique ordering of the nodes allowing network comparisons and a solution to the $n!$ combinatorial problems with networks. Finally we have just shown that the eigenvectors provide an assumption- agnostic determination of all network clustering! This software is now being programmed on our server. I will show how these two seemingly separate components of this colloquium are integrated. Both of these initiatives were just presented in the KDIR conference in Rome Italy and will be published in their proceedings.

Speaker Bio: Dr. Johnson’s primary research interest is theoretical physics with specialization in the foundations of relativistic quantum theory using a foundation based upon Lie algebras and groups applied to the foundations of physics and to Markov theory. He continued this work and developed a new formulation of relativistic position operators and a relativistically covariant formulation of the Foldy-Wouthousen Transformation for charged spin $\frac{1}{2}$ particles. Later he found a new method of decomposing the Lie group and algebra for the most general continuous linear $n \times n$ transformations into a $n$-dimensional scaling algebra and an $n(n-1)$-dimensional Markov type Lie algebra. This latter algebra, when restricted using a particular Lie basis, generates all possible continuous Markov transformations (a Markov Monoid (MM)). This MM is instrumental in the study of entropy, information theory, and diffusion. One of his most important discoveries was that the MM Lie algebra is exactly isomorphic to all possible networks. This now allows the power of Lie groups and algebras to link to the theory of Markov transformations, with eigenvalue/eigenvector decompositions, and a general methodology of expanding any network as a sequence of multi-order Renyi (and Shannon) entropy spectral curves. The techniques not only inform the theory of networks and their classifications, but in his latest research, he has developed an agnostic algorithm for cluster identification. His USC R&D team (the Advanced Solutions Group – www.ASG.sc.edu) developed advanced software systems and he was the sole PI for over 120 grants for \$14M from 1992 to 2012 to USC. His funding by DARPA with \$2.5M in 2004-2008 allowed him to make the breakthroughs that use Markov entropy metrics for analyzing networks. Currently his work concentrates on (1) the proposed numerical metadata system (www.metanumber.com ), (2) the QRECT classroom system (www.QRECT.com) that uses advanced expert algorithms for self-correcting systems (a cloud software allows all students use internet devices (like iPhones) to respond to questions in general form during class and is seen on the instructor’s iPad.). (3) The mathematical foundations of networks (www.exasphere.com for older work). A new web site is being developed for recent network research. He continues his research in integrating information theory, Markov theory, Lie Algebras, measurement theory and related concepts.

Presentation Slides: PPT

Contact: B. Cloteaux

Note: Visitors from outside NIST must contact Cathy Graham; (301) 975-3800; at least 24 hours in advance.