Data Storage Implementations

Databases: Overview

A set of seven or so databases specify the structure and task of WebSubmit.  Databases are used because they are easier to update and replace than code.  In essence we will write a generalized meta-WebSubmit and implement particular versions for particular problem domains and compute systems by preparing alternate databases. The databases presently utilized in WebSubmit are Databases are most easily maintained using the GUI manager provided with WebSubmit.  See the document describing this manager for further details.
 

Structure and Implementation

Generic Database Structure
The databases used have three different modes of storage used at different points in their cycle of existence.

Consider the following ASCII version of a database that contains names, ages, and identification numbers for people in an organization.

::DB_ATTRIBUTES:: key:first key:last age idNumber
john : doe : 35 : 0
jane : doe : 42 : 1

The key attributes are first and last, and the non-key attributes are age and idNumber.  Once this database is read by WebSubmit, it is stored internally in an array called DB as follows:

DB(john_doe,age) = 35
DB(john_doe,idNumber) = 0
DB(jane_doe,age) = 42
DB(jane_doe,idNumber) = 1

The keys are concatenated with underscore to form the complete key used for storage; the key and attribute are concatenated with comma to form the index used to access individual attributes for database records.
 

Specific Form of ASCII Database Files
All ASCII WebSubmit databases possess the same basic structure. Each database is a simple text file that contains
not only the records of the database, but comments and a header specifying the list of key and attribute names. In this way, the database is self-contained and does not require reference to other objects.

Comments
Comment lines begin with # and are ignored by the database parser. Blank lines are also ignored by the parser.

Attributes
The attributes for each database are specified on a single line within the database file. For ease of reading, it is suggested (although not necessary) that this line be placed before all records within the database. The line to specify attributes has the form

::DB_ATTRIBUTES:: a_1 a_2 a_3 ... a_n

where a_i represents the name given to attribute i.  If an attribute (or group of attributes) is to be used as a key (unique identifier) for the record, then it must be preceded by the modifier key:. Otherwise, the name of an
attribute can be any ASCII string that does not contain white-space. Avoiding the use of non-standard or control characters for attributes (or values, for that matter) is recommended, since this may create problems with the parser.
Simple names should be chosen for attributes. For example, attributes for a birth record database might look like

::DB_ATTRIBUTES:: key:SSN name age dob address

where the key:SSN represents the user's Social Security Number.  As indicated, it is possible to have multiple key fields in a database.  The actual key that results (from the standpoint of WebSubmit internal data structures) is constructed by concatenating the several key values, separating them with an underscore.
 

Data Records
Each entry within the database is referred to hereafter as a data record or simply a record. Each record is distinguished by a key or set of keys that are unique for that record. If a database has multiple records with identical keys, a warning will be issued by the database parser.

The structure of each record is simple: a colon-separated list of attribute values. Each attribute value must lie within a specified domain (see individual specifications below). Attribute values cannot contain colon (:) characters, since this
character acts as the internal field separator for the database.

If there is no value for a given attribute in a given record, then the value must be specified as *, as opposed to just leaving the field empty. The format for a given record is very important, because the database parser allows for records that span multiple lines. In order to achieve this flexibility, the structure within each record must conform to the following guidelines:

Sample Database
The following will serve as an example of a simple database that contains all of the features mentioned in the description above. A comment is given above each relevant entry to indicate its purpose. The database is an employee telephone database for a small company. A numeric identifier in conjunction with a Division act as a composite key, and there are three additional attributes (Name, Extension, Office).

# The following is a sample database for an employee
# telephone list

# Attributes for the database
::DB_ATTRIBUTES:: key:index key:group name extension office

##################
# Database records
##################

# A simple record
0001 : adm : Ralph Warren : x5893 : 112 Admin

# A record that spans two lines with initial white-space
 0001 : res : Jane Doe
  : x4120 : 356 Research

# Another simple record
0002 : res : Amir Gupta : x8473 : B-225 Research

# A three-line record with variable formatting
0002 : adm : Pamela Wen
: x2991
  : 820 Admin
 

In practice, the lines would probably not be split as they are in the above example. This was merely done for illustrative purposes. Also, the number of comments in this database is probably excessive and unecessary, since individual records will rarely need comment. It is recommended, however, that some explanatory information about the purpose of the database be placed at the top.
 
 

Database Specifications

Master Database Specification
The master page is described by the master database, and built by the master CGI script. The master database specifies the layout of the master page, by indicating the hierarchy of modules to be reflected in the master page (main WebSubmit page).

Key Attributes
moduleName

Ordering
The database is semi-ordered: structure is imposed by the modulePath attribute, but order at the same level of modulePath (same pathname header) is defined by order in database.

Attributes and Domains

 
 
Authentication Database Specification
The authentication database contains information about valid certificate issuers, administrative users, and regular users of the WebSubmit system. Each user has a unique WebSubmit identification number and a state that determines whether the user is currently being granted access to WebSubmit facilities. This database also contains login name information for each remote host that is acting as a compute system. In this sense, it possesses a variable length attribute list that indicates all compute systems in the current WebSubmit network.

Key Attributes
wsID

Ordering
Not ordered.

Attributes and Domains

Additional Notes
A word of explanation is merited at this point, since the attributes in this database are somewhat different from others. Since there will, in general, be a variable number of WebSubmit compute systems, there will also be a variable number of attributes in this database.  For each compute system, there will be a hostName attribute, with the
value of this attribute corresponding to the login name of userName on hostName. A sample record for wsID ws000, userName John Q. Public, Email jqp@random.site.gov, status active, and remoteHost(login) pairs danube.nist.gov(jqp), tiber.nist.gov(john), granta.nist.gov(johnqp):

::DB_ATTRIBUTES:: key:wsID userName Email status danube.nist.gov granta.nist.gov tiber.nist.gov
ws000 : John Q. Public : jqp@random.site.gov : active : jqp : john : johnqp

This enables the WebSubmit system to perform remote actions on compute systems via the secure scp and ssh protocols.  More information about authentication in WebSubmit can be found in the section on Authentication.
 
 

Object Serialization

The process of reading databases can be time-consuming, and performance is a consideration when working with CGI applications.  In an effort to reduce the overhead associated with reading databases, a method of object serialization (similar in spirit to that done in Java) was adopted.  A serialized object is essentially a representation of an internal Tcl data structure using a series of Tcl commands.  Tcl provides a mechanism for loading Tcl code from within a Tcl script (via the source command).  Hence, a serialized object can fill one or more Tcl variables or data structures simply by invoking the source command.  For example, after a database is read, all of its data can be serialized; the next time the database needs to be read, the serialized version is loaded via source rather than reading the database.  The serialized version is only loaded if it is newer than the actual database it represents; in this way, changes to the true database are reflected properly.   One important note about object serialization: source'ing random Tcl files can be dangerous, since these files could potentially contain commands damaging to the system.  For this reason, all serialized files are sourced within a safe Tcl interpreter.  The data from this interpreter is then passed into the main interpreter, assuming no problems were encountered.

As an example, consider the simple database given at the beginning of this document:

::DB_ATTRIBUTES:: key:first key:last age idNumber
john : doe : 35 : 0
jane : doe : 42 : 1

The serialization of this database would look like the following:

# Serialization of simple database on Thu Apr 30 16:41:21 EDT 1998
namespace eval webSubmit::foo {
    set DB(john_doe,age) 35
    set DB(john_doe,idNumber) 0
    set DB(jane_doe,age) 42
    set DB(jane_doe,idNumber) 1
    set dbKeyList [list john_doe jane_doe]
    set attrList [list age idNumber]
}
return 0

Data is encapsulated inside a namespace (here given as webSubmit::foo) to avoid interfering with other databases stored in arrays like DB.  dbKeyList and attrList are additional properties of the database that are carried with it.  Simply source-ing the Tcl file that contains this information effectively creates the information stored in webSubmit::foo::DB, webSubmit::foo::dbKeyList, and webSubmit::foo::attrList.